Object tracking, re-identification and registration Flashcards
How does MDnet work?
Trains shared network layers and specific layer in the specific training sequence. Learns general tracked objects through this process. During testing, the last layer is discarded and reinitialized and trained online on the specific tracking sequence.
How does ADnet work?
Predicts the action to take to move the bounding box so that the tracked object is completely encapsulated.
What is the difference between ADnet and MDnet?
AD: Predicts action to nudge bounding box in the correct position:
MD: Samples N bounding boxes around the suggested center.
What are some of the challenges in object tracking? Mention an overall idea of how to fix this
If there is only one instance of our class we track based on some strong class features. If there are several instances of the same class (i.e two faces) we can easily start tracking another example of the two classes. This can be handled by forcing our network to learn more specific features within our object to track.
What does it mean to do tracking by learning transitions?
You basically do reinforcement learning. The input would be the cropped image of the object you want to track. The output would be which action to take to “nudge” the bounding box in the direction where it captures the object you want to track.
Describe the training process of tracking network by learning transitions.
The training data is a state-action pair. You start with a object to track from an image. Then you offset the cropped region, thus generating training data with offset cropped image, and the inverse offset as action.
There is also possible to do online training where when you mark your target to track, can generate traning data (~300) and do online training on the specific object you want to track.
Describe the network architecture of the MDNet
Shared network except for a final layer which is specific to each sequence. During training, multiple last layers are trained, one for each sequence. Then, during online training, a new last layer is initilized and trained on the “real” data.
Describe two ways of implementing attention in tracking based applications
Reciprocative Learning: Put high importance on feature inside the tracking box.
VITAL: Remove the most prominent features in bounding box, forcing the network to learn less “important”, not so general, but target specific features.
Explain the main idea behind the SiamFC
One network extracts a representation of a tracking target. Another extracts a representation of different region-sizes of the next frame. A comparison of the target representation and the regions from the frame is done, giving a matching score.