Our Research

Vision-Based Aerial Object Detection and Classification

There has been a proliferation of Unmanned Aerial Systems (UAS) applied to multitudes of areas including agriculture, delivery, logistics, disaster relief, surveillance, etc. It can be imagined in near future, as depicted in some science fiction movies, that there will be a swarm of UASs operating in high density over well populated areas, such as in urban environments. As such, safe and reliable UAS operation is crucial for public safety. Collision of UASs over areas occupied by people is of primary concern here. There are a variety of scenarios that may result such catastrophes, including collision with a manned aircraft, other UASs, birds, other floating objects, tall buildings, powerlines, natural landscape, etc. Therefore, a UAS has to be able to accurately and timely detect these objects such that the onboard navigation and control system can fly the UAS to avoid any potential collision. For a drone to be able to fly safely, it must perceive nearby objects, both dynamic and stationary, and estimate their future states to avoid collisions in its flight path. These state estimates are then fed to the onboard navigation and control system for safe flight trajectory computation. Our research here is to estimate limits and failures of computer vision, of both current and future systems, so that appropriate FAA guidelines can be established for successful UAS operations over populated areas. Specifically we review relevant computer vision related publications, and also develop in-house vision algorithms to accurately estimate current and future UAS computer vision performance limitations.

We’ve also introduced a dataset for long-distance UAV detection called the Long-Range Drone Detection Dataset (LRDD). You can access it from here.


3D Scene Understanding

3D Reconstruction with Minimum Views via Reinforcement Learning

Recovering the 3D shape of an object from single or multiple images with deep neural networks has been attracting increasing attention. This study explores ways of letting machines know which views of images will be needed for reconstruction. In this way, we will be able to let machines understand the features of objects with fewer images required, and we will be also informed in which angle should we provide images to machines for better reconstruction.

Coherent 3D Scene Reconstruction with Multiple Objects

We propose to reconstruct a 3D scene with multiple objects with only a single-view image or a limited number of images. This project aims to provide holistic information of the physical state in a view and ultimately guide robotic manipulators to perform optimized path planning or active interactions with the objects in a scene.

Ray-induced Geometry Regression for High-Fidelity 3D Character Reconstruction

The proposed framework leverages key features of volumetric representation with tri-directional implicit function-based regression. Three orthogonal rays are cast that extract geometrically-aligned voxel features to infer the dense surface of a character from continuously sampled query points in space. This results in a coarse level of reconstruction that generalizes well to the unseen shape and pose features which is further refined using high-resolution features to capture local details.


Acoustic Scene Understanding

Underwater Target Detection

There are abundant acoustic signals both above and underwater generated by animals, vessels, and other natural phenomena. Underwater acoustics have been studied for a variety of purposes, including surveillance, communications, biological monitoring, environmental research, etc. iMaPLe focuses on exploiting underwater sound for detecting and classifying various sound sources by exploring and developing new concepts and methodologies in machine learning. 

One of the projects at iMaPLe is to understand the nature of acoustic signals and develop deep learning based architectures to detect and classify different underwater sounds that may include, dolphins, seagulls, sharks, vessels, ships, and other sound sources under the water.  

Different acoustic pre-processing techniques are explored such as, Mel Frequency Cepstral Coefficients (MFCC), Constant Q-Transforms (CQT), Mel Spectrograms, Oscillograms etc. Various Deep learning architectures are exploited in order to classify acoustic source and its location. These architectures include, but not limited to Convolutional Neural Networks (CNN), Long-Short Term Memory Architectures (LSTMs), Gated Recurrent Units (GRU), and Attention-based transformer networks.


Visual Scene Understanding

Learning Scene Context

Teaching machines scene contextual knowledge would enable them to interact more effectively with the environment. Although deep learning-based computer vision methods have made great progress over the last decade, they have yet difficulties in learning scene context. With scene contextual knowledge, humans interact with an environment seamlessly even when not all the objects are visible. In this project, we are exploring and developing various concepts to teach a machine of scene context so that it can anticipate and effectively interact with unseen objects.