11-12 3D Classification and Tracking Flashcards
Explain the Mean Shift Tracking Pipeline
- Initialization: the initial track has to be defined by the user
- Representation:
a descriptor could be a hue histogram (convert RGB to HSV, make 180bin hue histogram that counts pixels with the different hue values) - Matching: Histogram Backprojection.
Convert any new image to HSV, for each pixel get the value of the bin in the histogram and save that value for the pixel. This results in a likelihood map for the object in the input image. - Shifting:
Now do standart mean shift. The individual particles atre the pixel confidence values and the mean shifts to the pixel position with the highes density around it.
What are the Limits to mean shift tracking?
Window size is not adapted to changes in scale and orientation
What does SORT stand for?
Simple Online Realtime Tracker
How does the SORT tracker use an external detector?
The SORT algorithm itself is designed to track objects, it does not have an own detection part. It uses an external detector instead. The detector finds all objects in each frame. In the article they use Faster Region CNN
How bounding boxes are parameterized and used in the Kalman filter (SORT)
the bounding box parameters are the center coords, the area and the aspect ratio. The aspect ratio is constant, for the other values the velocity is found as well.
The state os each target is modeles as: $x = [u,v,s,r,\dot{u},\dot{v},\dot{s}]^T$
→ detect in one frame, and then predict for the next frame with the kalman filter
How is the assignment problem formulated between the predictions and the detections? (SORT)
The assignment cost matrix is computed as the intersection-over-union (area of overlap/area of union) distance between each detection and all existing target predictions. The assignment is solved optimally using the Hungarian algorithm. A minimum IOU is set to reject assignments if no good pair is found.
→ it is computationally heavy but several thousand detections need to be there for this become a problem, so this often is no problem
How are new tracks created and old ones deleted (SORT)?
If a new, unassigned detection has successfully been detected in a number of frames in a row, it becomes a new track
If a detection cannot be tracked anymore it will be deleted (there can be a buffer for how many frames it can be lost in but in the article they delete immideately because it increases efficiency and )
What is the difference between invariance and equivariance
Invariance:
- if something is invariant it stays unchanged under a specific transformation
Equivariance
- a little more loose
- Means something like “ the output follows the disturbance applied to the input”
- So it does not stay unchanged, but it just follows the transformation in a similar matter
What is a set?
- collection of distinct objects
- unordered → the structure is lost
How does PointNet achieve invariance?
To achieve permutation invariance: apply same vector of weights to each 3D vector to get a single scalar response pr point. If a symmetric function i applied to these numbers, you achieve invariance.
- Deep Sets use sum-pooling
- collapse responses to a single sum
- PointNet uses max-pooling
- collapse responses to a single value (the max)
To achieve rotational invariance: PointNet includes additional components called “feature transform networks.” These networks learn to transform the input features in a way that aligns the features with the canonical coordinate system. By doing this, PointNet ensures that the network’s output is invariant to different rotations and translations of the input point cloud.
What is the difference between pointnet and deep sets?
The key difference between Deep Sets and PointNet is their focus on different types of data. Deep Sets work with sets and pointNet with individual points in point clouds