L12: 3D classification Flashcards
What is 3D Classification?
3D classification refers to the task of classifying or categorizing three-dimensional (3D) objects or scenes into different classes or categories. It involves analyzing the spatial structure and characteristics of 3D data to make predictions about their class labels.
How does Deep Sets and PointNet achieve invariance?
By pooling
Deep Sets: sum-pooling - collapse the responses to a single sum
PointNet: max-pooling - collapse the responses to a single value, the maximum
What does the two method takes as input?
A point cloud of an object
What does equivariance mean?
“loose definition: the output “follows” the disturbance applied to the input
What are Point Clouds?
A datastructure of unordered 3D points
How can Deep Sets be used for 3D Classification?
Deep Sets can be used to process 3D point clouds by treating each point as an element in the set. The encoder network processes each point’s features, and the pooling function aggregates the point features into a global representation. This global representation can then be fed into a fully connected layer for classification.
How can PointNet be used for 3D Classification?
PointNet can be used by treating the input point cloud as a set of points. The shared MLPs process the features of each point independently, and the max pooling operation aggregates the point features into a global representation. The global feature vector can then be passed through fully connected layers for classification.
What do equivariance and invariance mean?
Equivariance refers to the property where the output of a neural network transforms in a corresponding way when the input data undergoes a transformation. In other words, if the input is transformed, the output is transformed in the same way.
Invariance, on the other hand, refers to the property where the output of a neural network remains unchanged or invariant to certain transformations applied to the input data. In other words, regardless of the transformation applied to the input, the output remains the same.
When do we want equivariance?
Equivariance is desirable in certain cases where the transformation information is meaningful and should be preserved in the output. It allows the network to learn features and patterns that are invariant to specific transformations.
When do we want invariance?
Invariance is useful when the specific transformation applied to the input is considered irrelevant or when the desired output should not depend on that transformation. It allows the network to learn higher-level features that are invariant to certain transformations, leading to more robust and generalizable representations.
what does it mean that PointNet takes a permutation-invariant approach?
It means that the architecture is designed to process unordered point clouds without relying on any specific order or permutation of the points. It treats each point independently and aggregates their features to obtain a global representation of the entire point cloud.
PointNet: How can we ignore translations?
By centering (demeaning) the inputs
PointNet: How can we ignore the rotations?
Two possibilities
Augmentation of data input: create a bunch of duplicates of a point cloud object that are rotated from each other.
- DRAWBACK: many many many iterations are needed
Or do as the PointNet do, add a Transform Net (T-net) into the layers
The T-Net has 9 output neurons. Reshape these 9 numbers to a 3x3 matrix. Multiply with the input point cloud. The hope is that the T-Net learns to “transform” every point cloud into a new space where the rotation doesn’t matter (much)
PointNet: What is the purpose of segmentation in it?
Using a segmentation network within the PointNet makes it able to seperate different parts of a point cloud.
So if you have a knife, it would be able to segment out what points adds up to the blade and what points adds up to the shaft.