Vision-Based Aerial Object Detection and Classification
There has been a proliferation of Unmanned Aerial Systems (UAS) applied to multitudes of areas including agriculture, delivery, logistics, disaster relief, surveillance, etc. It can be imagined in near future, as depicted in some science fiction movies, that there will be a swarm of UASs operating in high density over well populated areas, such as in urban environments. As such, safe and reliable UAS operation is crucial for public safety. Collision of UASs over areas occupied by people is of primary concern here. There are a variety of scenarios that may result such catastrophes, including collision with a manned aircraft, other UASs, birds, other floating objects, tall buildings, powerlines, natural landscape, etc. Therefore, a UAS has to be able to accurately and timely detect these objects such that the onboard navigation and control system can fly the UAS to avoid any potential collision. For a drone to be able to fly safely, it must perceive nearby objects, both dynamic and stationary, and estimate their future states to avoid collisions in its flight path. These state estimates are then fed to the onboard navigation and control system for safe flight trajectory computation. Our research here is to estimate limits and failures of computer vision, of both current and future systems, so that appropriate FAA guidelines can be established for successful UAS operations over populated areas. Specifically we review relevant computer vision related publications, and also develop in-house vision algorithms to accurately estimate current and future UAS computer vision performance limitations.


We’ve also introduced two versions of a dataset for long-distance UAV detection called the Long-Range Drone Detection Dataset (LRDD), and we are actively expanding it with the goal of creating one of the largest and most diverse drone detection datasets in the world. You can access it from here.
Scene State Modeling
Automated mobile systems such as robots and drones need to keep accurate models of the state of their surroundings, including features such as object locations, distances, and motion, in order to avoid collisions and to navigate effectively. To effectively accomplish this, we are developing methods to improve Monocular Depth Estimation (the prediction of pixel distances from single camera images) through the use of synthetic data creation, as well as the prediction of object movement through dynamic map states using computer vision techniques to process LIDAR sensor data.


Acoustic Scene Understanding
Underwater Target Detection
There are abundant acoustic signals both above and underwater generated by animals, vessels, and other natural phenomena. Underwater acoustics have been studied for a variety of purposes, including surveillance, communications, biological monitoring, environmental research, etc. iMaPLe focuses on exploiting underwater sound for detecting and classifying various sound sources by exploring and developing new concepts and methodologies in machine learning.


One of the projects at iMaPLe is to understand the nature of acoustic signals and develop deep learning based architectures to detect and classify different underwater sounds that may include, dolphins, seagulls, sharks, vessels, ships, and other sound sources under the water.

Different acoustic pre-processing techniques are explored such as, Mel Frequency Cepstral Coefficients (MFCC), Constant Q-Transforms (CQT), Mel Spectrograms, Oscillograms etc. Various Deep learning architectures are exploited in order to classify acoustic source and its location. These architectures include, but not limited to Convolutional Neural Networks (CNN), Long-Short Term Memory Architectures (LSTMs), Gated Recurrent Units (GRU), and Attention-based transformer networks.
Visual Scene Understanding
Learning Scene Context
Teaching machines scene contextual knowledge would enable them to interact more effectively with the environment. Although deep learning-based computer vision methods have made great progress over the last decade, they have yet difficulties in learning scene context. With scene contextual knowledge, humans interact with an environment seamlessly even when not all the objects are visible. In this project, we are exploring and developing various concepts to teach a machine of scene context so that it can anticipate and effectively interact with unseen objects.