3D object identification with time-invariant features
The study and the analysis of the visual information coming from an image can be tackled with different approaches: to global image description we preferred the local approach since recent research has demonstrated that it leads to a more compact and robust representation of the image even when there are major changes in the object appearance.
We model 3D objects using a visual vocabulary whose words represent the most meaningful component of the object: the description obtained is complete and compact and is capable to describe the object when it is seen from different points of view. The robustness of this approach is remarkable also when the object is in a very cluttered scene and it is partially occluded.
Our modeling and matching method exploits temporal coherence both in training and test.
Our method is based on describing objects, no matter how complex, by means of local image structures. Starting from this local information we find descriptors of the objects that are characteristic and meaningful.
The content of an image sequence is redundant both in space and time, thus we obtain compressed descriptions for the purpose of recognition, extracting a collection of trains of features and discarding all the other information. We call this collection model or visual vocabulary of the sequence.
Since the 3D object of interest is described with an image sequence we can sketch the procedure by the following steps:
We propose a two-steps matching procedure that exploits the richness of our temporal features. To achieve such a compromise the steps are:
The recognition phase is based on the one-class recognition approach: the higher the number of matches, the higher the probability the sequence contains a given object model.
Here we present some of the experiments and the results that we have obtained with our two-stage matching procedure for 3D object recognition.
We test the system with simple changes in imaging conditions: illumination and
scale variations, background clutter and occlusions of the objects.
Circles: dewey’s features; squares: book, crosses: winnie; X’s: goofy.
We test the system with sequences of objects placed in a real scene environment: the background is highly cluttered and there are several object appearing together.
When the number of trajectories decreases the number of matches decreases. The following graph shows the number of matches per frame computed in a sequence in which the objects appear as in a tracking shot.
We test the system when the number of objects increases. We compare recognition performances when the number of objects passes from 5 to 10 and finally to 20. Results are reported in the following video and tables.
Number of recognition experiments: 840 Precision=59% Recall= 84% Specificity=5%
People: E. Delponte, N. Noceti, F. Odone, A. Verri