Sitemap
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Pages
Posts
publications
Monocular Depth Estimation with Self-supervised Instance Adaptation
ArXiv, 2020
Recent advances in self-supervised learning have demonstrated that it is possible to learn accurate monocular depth reconstruction from raw video data, without using any 3D ground truth for supervision. However, in robotics applications, multiple views of a scene may or may not be available, depending on the actions of the robot, switching between monocular and multi-view reconstruction. To address this mixed setting, we proposed a new approach that extends any off-the-shelf self-supervised monocular depth reconstruction system to usemore than one image at test time. Our method builds on a standard prior learned to perform monocular reconstruction, but uses self-supervision at test time to further improve the reconstruction accuracy when multiple images are available. When used to update the correct components of the model, this approach is highly-effective. On the standard KITTI bench-mark, our self-supervised method consistently outperforms all the previous methods with an average 25% reduction in absolute error for the three common setups (monocular, stereo and monocular+stereo), and comes very close in accuracy when compared to the fully-supervised state-of-the-art methods.
Download here
Real Time Monocular Vehicle Velocity Estimation using Synthetic Data
IEEE Intelligent Vehicles, 2021
Vision is one of the primary sensing modalities in autonomous driving. In this paper we look at the problem of estimating the velocity of road vehicles from a camera mounted on a moving car. Contrary to prior methods that train end-to-end deep networks that estimate the vehicles’ velocity from the video pixels, we propose a two-step approach where first an off-the-shelf tracker is used to extract vehicle bounding boxes and then a small neural network is used to regress the vehicle velocity from the tracked bounding boxes. Surprisingly, we find that this still achieves state-of-the-art estimation performance with the significant benefit of separating perception from dynamics estimation via a clean, interpretable and verifiable interface which allows us distill the statistics which are crucial for velocity estimation. We show that the latter can be used to easily generate synthetic training data in the space of bounding boxes and use this to improve the performance of our method further.
Download here
Calibrating Self-supervised Monocular Depth Estimation
NeurIPS Workshop on Machine Learning for Autonomous Driving (ML4AD), 2021
In the recent years, many methods demonstrated the ability of neural networks to learn depth and pose changes in a sequence of images, using only self-supervision as the training signal. Whilst the networks achieve good performance, the often over-looked detail is that due to the inherent ambiguity of monocular vision they predict depth up to an unknown scaling factor. The scaling factor is then typically obtained from the LiDAR ground truth at test time, which severely limits practical applications of these methods. In this paper, we show that incorporating prior information about the camera configuration and the environment, we can remove the scale ambiguity and predict depth directly, still using the self-supervised formulation and not relying on any additional sensors.
Download here
Lifting 2D Object Locations to 3D by Discounting LiDAR Outliers across Objects and Views
ICRA, 2022
We present a system for automatic converting of 2D mask object predictions and raw LiDAR point clouds into full 3D bounding boxes of objects. Because the LiDAR point clouds are partial, directly fitting bounding boxes to the point clouds is meaningless. Instead, we suggest that obtaining good results requires sharing information between \emph{all} objects in the dataset jointly, over multiple frames. We then make three improvements to the baseline. First, we address ambiguities in predicting the object rotations via direct optimization in this space while still backpropagating rotation prediction through the model. Second, we explicitly model outliers and task the network with learning their typical patterns, thus better discounting them. Third, we enforce temporal consistency when video data is available. With these contributions, our method significantly outperforms previous work despite the fact that those methods use significantly more complex pipelines, 3D models and additional human-annotated external sources of prior information.
Download here
Direct LiDAR-based object detector training from automated 2D detections
NeurIPS Workshop on Machine Learning for Autonomous Driving (ML4AD), 2022
3D Object detection (3DOD) is an important component of many applications, however existing methods rely heavily on datasets of depth and image data which require expensive annotation in 3D thus limiting the ability of a diverse dataset being collected which truly represents the long tail of potential scenes in the wild. In this work we propose to utilise a readily available robust 2D Object Detector and to transfer information about objects from 2D to 3D, allowing us to train a 3D Object Detector without the need for any human annotation in 3D. We demonstrate that our method significantly outperforms previous 3DOD methods supervised by only 2D annotations, and that our method narrows the accuracy gap between methods that use 3D supervision and those that do not.
Download here
Exploring The landscape of Large Language Models In Medical Question Answering: Observations and Open Questions
Submitted to NEJM AI, 2024
Large Language Models (LLMs) have shown promise in medical question answering by achieving passing scores in standardised exams and have been suggested as tools for supporting healthcare workers. Deploying LLMs into such a high-risk context requires a clear understanding of the limitations of these models. With the rapid development and release of new LLMs, it is especially valuable to identify patterns which exist across models and may, therefore, continue to appear in newer versions. In this paper, we evaluate a wide range of popular LLMs on their knowledge of medical questions in order to better understand their properties as a group. From this comparison, we provide preliminary observations and raise open questions for further research.
Download here
Instruction tuning for large language models: the impact of human-inspired learning strategies
To be submitted to COLM, 2024
teaching
Artificial Intelligence
Teaching Assistant, University of Oxford, Department of Computer Science, 2019
This course is offered to undergraduates and MSc students in computer science. It covered the following topics:
- Introduction to AI
- Search
- Games
- Constraint Satisfaction Problems
- Machine Learning
- Neural Networks
Data Structures and Algorithms
Teaching Assistant, University of Oxford, Department of Computer Science, 2019
This course is offered to undergraduates in computer science. It covered the following topics:
- Introduction to Data Structures
- Arrays and Linked Lists
- Stacks and Queues
- Trees
- Graphs
- Sorting and Searching
- Algorithm Analysis
- Recursion
- Dynamic Programming
- Greedy Algorithms
- Divide and Conquer
- Backtracking
- Branch and Bound
Functional Programming
Teaching Assistant, University of Oxford, Department of Computer Science, 2019
This course is offered to undergraduates in computer science. It covered the following topics:
- Introduction to Functional Programming
- Haskell
- Scheme
Computer Vision and Machine Learning
Teaching Assistant, University of Oxford, Department of Engineering Science and Department of Computer Science, 2019
This course is offered to DPhil (PhD) students in computer science and engineering science as part of EPSRC Centre for Doctoral Training in Autonomous Intelligent Machines and Systems. It covered the following topics:
- Introduction to Computer Vision
- Image Formation and Camera Models
- Image Processing
- Feature Detection and Matching
- Image Segmentation
- Object Recognition
- Object Detection
- Object Tracking
- 3D Reconstruction
- Deep Learning for Computer Vision
- Visual SLAM
Advanced Language Modelling Methods
Teaching Assistant, University of Oxford, Oxford Internet Institute, 2024
This course is offered to MSc and DPhil students in computer science and social sciences. It covers the following topics:
- Introduction to Language Modelling
- N-gram Language Models
- Neural Language Models
- Transformer Models
- Attention Mechanism
- Self-attention Mechanism
- Positional Encoding
- Multi-head Attention
- Masked Self-attention
- Encoder-Decoder Architecture
- BERT and GPT
- Training Language Models
- Fine-tuning Language Models
- Language Model Evaluation
- Applications of Language Models