Publications

You can also find my articles on my Google Scholar profile.

Exploring The landscape of Large Language Models In Medical Question Answering: Observations and Open Questions

Submitted to NEJM AI, 2024

Large Language Models (LLMs) have shown promise in medical question answering by achieving passing scores in standardised exams and have been suggested as tools for supporting healthcare workers. Deploying LLMs into such a high-risk context requires a clear understanding of the limitations of these models. With the rapid development and release of new LLMs, it is especially valuable to identify patterns which exist across models and may, therefore, continue to appear in newer versions. In this paper, we evaluate a wide range of popular LLMs on their knowledge of medical questions in order to better understand their properties as a group. From this comparison, we provide preliminary observations and raise open questions for further research.

Download here

Direct LiDAR-based object detector training from automated 2D detections

NeurIPS Workshop on Machine Learning for Autonomous Driving (ML4AD), 2022

3D Object detection (3DOD) is an important component of many applications, however existing methods rely heavily on datasets of depth and image data which require expensive annotation in 3D thus limiting the ability of a diverse dataset being collected which truly represents the long tail of potential scenes in the wild. In this work we propose to utilise a readily available robust 2D Object Detector and to transfer information about objects from 2D to 3D, allowing us to train a 3D Object Detector without the need for any human annotation in 3D. We demonstrate that our method significantly outperforms previous 3DOD methods supervised by only 2D annotations, and that our method narrows the accuracy gap between methods that use 3D supervision and those that do not.

Download here

Lifting 2D Object Locations to 3D by Discounting LiDAR Outliers across Objects and Views

ICRA, 2022

We present a system for automatic converting of 2D mask object predictions and raw LiDAR point clouds into full 3D bounding boxes of objects. Because the LiDAR point clouds are partial, directly fitting bounding boxes to the point clouds is meaningless. Instead, we suggest that obtaining good results requires sharing information between \emph{all} objects in the dataset jointly, over multiple frames. We then make three improvements to the baseline. First, we address ambiguities in predicting the object rotations via direct optimization in this space while still backpropagating rotation prediction through the model. Second, we explicitly model outliers and task the network with learning their typical patterns, thus better discounting them. Third, we enforce temporal consistency when video data is available. With these contributions, our method significantly outperforms previous work despite the fact that those methods use significantly more complex pipelines, 3D models and additional human-annotated external sources of prior information.

Download here

Calibrating Self-supervised Monocular Depth Estimation

NeurIPS Workshop on Machine Learning for Autonomous Driving (ML4AD), 2021

In the recent years, many methods demonstrated the ability of neural networks to learn depth and pose changes in a sequence of images, using only self-supervision as the training signal. Whilst the networks achieve good performance, the often over-looked detail is that due to the inherent ambiguity of monocular vision they predict depth up to an unknown scaling factor. The scaling factor is then typically obtained from the LiDAR ground truth at test time, which severely limits practical applications of these methods. In this paper, we show that incorporating prior information about the camera configuration and the environment, we can remove the scale ambiguity and predict depth directly, still using the self-supervised formulation and not relying on any additional sensors.

Download here

Real Time Monocular Vehicle Velocity Estimation using Synthetic Data

IEEE Intelligent Vehicles, 2021

Vision is one of the primary sensing modalities in autonomous driving. In this paper we look at the problem of estimating the velocity of road vehicles from a camera mounted on a moving car. Contrary to prior methods that train end-to-end deep networks that estimate the vehicles’ velocity from the video pixels, we propose a two-step approach where first an off-the-shelf tracker is used to extract vehicle bounding boxes and then a small neural network is used to regress the vehicle velocity from the tracked bounding boxes. Surprisingly, we find that this still achieves state-of-the-art estimation performance with the significant benefit of separating perception from dynamics estimation via a clean, interpretable and verifiable interface which allows us distill the statistics which are crucial for velocity estimation. We show that the latter can be used to easily generate synthetic training data in the space of bounding boxes and use this to improve the performance of our method further.

Download here

Monocular Depth Estimation with Self-supervised Instance Adaptation

ArXiv, 2020

Recent advances in self-supervised learning have demonstrated that it is possible to learn accurate monocular depth reconstruction from raw video data, without using any 3D ground truth for supervision. However, in robotics applications, multiple views of a scene may or may not be available, depending on the actions of the robot, switching between monocular and multi-view reconstruction. To address this mixed setting, we proposed a new approach that extends any off-the-shelf self-supervised monocular depth reconstruction system to usemore than one image at test time. Our method builds on a standard prior learned to perform monocular reconstruction, but uses self-supervision at test time to further improve the reconstruction accuracy when multiple images are available. When used to update the correct components of the model, this approach is highly-effective. On the standard KITTI bench-mark, our self-supervised method consistently outperforms all the previous methods with an average 25% reduction in absolute error for the three common setups (monocular, stereo and monocular+stereo), and comes very close in accuracy when compared to the fully-supervised state-of-the-art methods.

Download here