M06 - Multimodel Learning, Interaction and Communication Flashcards
What is machine learning?
“The field of machine learning is concerned with the question of how to construct computer programs that automatically improve with experience.”
What is the definition of machine learning with E,T and P?
“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”
Define E, T and P for a robot learning project.
Task: object recognition with color and depth data
Experience: the iCub multisensor dataset
Performance measure: accuracy, precision, recall, F1 score
What is modality?
Sensory data that are associated with different aspects of the observed phenomena
Why do we need multimodel learning/integration?
- To form a robust sensory representation
- To leverage complementary characteristics of modalities
How do you count modalities in a robot?
The number of types of data = The number of modalities
What are the 5 challenges in Multimodel Machine Learning?
- Representation: how to represent multimodal data [pixels, signals, symbols, etc.]
- Translation: how to map data from one modality to another
- Alignment:how to identify direct relations between modalities
- Fusion:how to join information from two or more modalities [data level, decision level, intermediate]
- Co - learning: how to transfer knowledge between modalities
What are the steps in a machine learning pipeline?
- Preprocessing (dimensionality reduction, features extraction, selection, scaling, sampling, denoising)
- Learning (Initializing, Optimizing, Cross-Validation)
- Evaluation (New model)
What is the problem between color and depth data?
There is a huge semantic gap between color depth data and raw data matrices
How can we extract representations?
- hand-crafted features
- automatic feature learning
Which feature selection is quicker?
Automatic feature learning usually finds better solutions than hand-designed ones
What are desired specifications of the representations?
- Similar representation should indicate similar concepts
[if you visualize the representation space, the distances for different carrots should be close to each other but far from the cars] - Representations should be robust
[The extracted representations should be robust to deal with the noise] - We should know how to handle missing data in one modality
What is unimodal learning?
Discrete probability distribution of objects
What are the problems with unimodal learning?
External reasons:
- noise in the environment
- miscalibrated sensors
Model-related reasons
- wrong model selection
- non-regularized weights
- using raw data as input
What are the goals of multimodal learning/integration?
- To form a robust sensory representations
- To leverage complementary characteristics of modalities