Skip to content. | Skip to navigation
Research
Computer vision finds itself at an exciting stage of its development.
Many of its areas are approaching sufficiently high performance
levels to become useful for real-world applications, and numerous
interesting connections are opening up to related fields such as
machine learning, graphics, and mobile robotics. Moreover, instead of only focusing on a
single area and advancing it through step-by-step progress, it now
becomes possible for the first time to combine many different vision
capabilities and explore the benefits that can be reaped in through
their close integration.
Our research aims exactly at this interface. The central theme of our work is the connection of different areas of computer vision and graphics into so-called "cognitive loops", collaborative feedback cycles in which multiple vision modalities mutually support each other in order to solve a bigger task than any could do on its own. Object recognition takes a key role in this integration, since it can deliver a semantic interpretation of the image content, which considerably simplifies other tasks such as segmentation, 3D reconstruction, and tracking. In some cases, such connections are even required in order to render complex applications possible in the first place. In return, those other vision capabilities deliver additional information which again constrains and improves the recognition results.
The main application area of our work are vision services for mobile devices and robotic or automotive platforms. Imagine being able to use your cell phone's camera as an interface to the real world, pointing it to objects and buildings of interest in your surroundings and directly getting back target-specific information from internet sources without having to type a query. Or imagine your car being able to sense other traffic participants in its surroundings, enhancing safety from collisions or even taking over the driving task entirely. In our group, we are developing core technologies for such applications. Concretely, we have already made the following important steps towards this goal.
-
Interleaved Object Detection and Segmentation.
We have developed a local-feature based recognition approach which considers object categorization and figure-ground segmentation as two interleaved processes that closely collaborate towards a common goal. The resulting Implicit Shape Model (ISM) algorithm can learn the characteristics of a new object categoriy already from a relatively small number of training examples, recognize previously unseen instances of that category in novel test images, and automatically segment them from the background, even under scale changes and partial occlusion. In more recent work, we improved this approach through numerous extensions for multi-cue integration, recognition from multiple viewpoints, and multi-category discrimination.
-
Pedestrian Detection in Crowded Scenes.
Pedestrian detection has become a key technology for many applications and is therefore attracting great commercial interest. In our research, we have developed a state-of-the-art pedestrian detector based on the ISM representation that is particularly suitable for detection in crowded scenes and under strong occlusion.
-
Coupled Detection and Tracking.
We have developed a novel approach for multi-object tracking that connects object detection and spacetime trajectory estimation in a coupled optimization framework motivated by MDL model selection. Building on the output of an object detector, our approach searches at each time instant for the optimal set of spacetime trajectories which provides the best explanation for the current image and for all evidence collected so far, while satisfying the physical constraint that no two objects may occupy the same space at the same time. The resulting approach can initialize automatically and track a large and varying number of objects through complex scenes with clutter, occlusions, and large-scale background changes. In addition, the model selection framework allows our approach to retrospectively adapt the data association and thus recover from mismatches and temporarily lost tracks.
-
Object Detection and Tracking for Mobile Applications.
Combining the above capabilities, we have developed a mobile vision system for localizing other traffic participants (cars, pedestrians, bicyclists) in a vehicle’s field-of-view and for tracking them over time. Particular emphasis went into making the system robust for operation in highly dynamic inner-city environments, such as a busy pedestrian zone. Mounted onto a child stroller or robotic platform, our system can reliably detect and track a large number of interacting pedestrians for dynamic obstacle avoidance. Mounted on a car, the system could one day be used in driver assistance systems for car safety.
-
Combining Recognition and Reconstruction for 3D City Modeling.
In order to support the large-scale reconstruction of entire city areas for visualization purposes, we developed methods for exact localization of parked and moving cars from recorded video streams of a survey vehicle. Such objects pose a problem for automatic 3D city modeling, since they are difficult to reconstruct due to dynamic motion, specularities on their surfaces, and partial occlusion. In addition, they defy the simplifying geometry assumptions that can else be used in order to speed up large-scale reconstruction. Through the combination with object recognition and 3D pose estimation, our approach could automatically remove the disturbing objects from the reconstruction and replace them by virtual placeholder models, leading to a considerable reduction in the number of visible reconstruction artifacts and thus a visually more pleasing look of the final model
-
Large-Scale Mining of Landmark Buildings from Community Photo Collections.
One of our target applications is visual search from mobile phones. A user sends pictures from his mobile phone camera to an automatic recognition server in order to obtain additional information about objects and buildings in his sourroundings. For such an application to become practical, it becomes important to tap the vast amount of publicly available data available from internet sources and mine it for usable content. Our approach mines community photo collections (such as Flickr) for geotagged photos of entire cities at a time, extracts landmark buildings and places through visual recognition techniques, and automatically links them to corresponding Wikipedia pages. The resulting annotated clusters then serve as a basis for automatically geolocating additional, novel images.
In addition, we have made important contributions in the following areas.
- Fast 3D Scanning and Surface Registration.
- Efficient Feature Mining, Clustering, and Matching for Object Recognition.
- Object Categorization.
- Object Recognition from Range Images.

