M06 - Multimodel Learning, Interaction and Communication Flashcards

1
Q

What is machine learning?

A

“The field of machine learning is concerned with the question of how to construct computer programs that automatically improve with experience.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the definition of machine learning with E,T and P?

A

“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define E, T and P for a robot learning project.

A

Task: object recognition with color and depth data
Experience: the iCub multisensor dataset
Performance measure: accuracy, precision, recall, F1 score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is modality?

A

Sensory data that are associated with different aspects of the observed phenomena

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why do we need multimodel learning/integration?

A
  1. To form a robust sensory representation
  2. To leverage complementary characteristics of modalities
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you count modalities in a robot?

A

The number of types of data = The number of modalities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 5 challenges in Multimodel Machine Learning?

A
  1. Representation: how to represent multimodal data [pixels, signals, symbols, etc.]
  2. Translation: how to map data from one modality to another
  3. Alignment:how to identify direct relations between modalities
  4. Fusion:how to join information from two or more modalities [data level, decision level, intermediate]
  5. Co - learning: how to transfer knowledge between modalities
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the steps in a machine learning pipeline?

A
  1. Preprocessing (dimensionality reduction, features extraction, selection, scaling, sampling, denoising)
  2. Learning (Initializing, Optimizing, Cross-Validation)
  3. Evaluation (New model)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the problem between color and depth data?

A

There is a huge semantic gap between color depth data and raw data matrices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can we extract representations?

A
  • hand-crafted features
  • automatic feature learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which feature selection is quicker?

A

Automatic feature learning usually finds better solutions than hand-designed ones

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are desired specifications of the representations?

A
  1. Similar representation should indicate similar concepts
    [if you visualize the representation space, the distances for different carrots should be close to each other but far from the cars]
  2. Representations should be robust
    [The extracted representations should be robust to deal with the noise]
  3. We should know how to handle missing data in one modality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is unimodal learning?

A

Discrete probability distribution of objects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the problems with unimodal learning?

A

External reasons:
- noise in the environment
- miscalibrated sensors

Model-related reasons
- wrong model selection
- non-regularized weights
- using raw data as input

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the goals of multimodal learning/integration?

A
  1. To form a robust sensory representations
  2. To leverage complementary characteristics of modalities
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the main characteristics of deep multimodel learning?

A
  • both modality-wise representations (features) and shared (fused) representations are learned from data
  • requires little or not preprocessing of input data (end-to-end training)
  • deeper, complex networks typically require large amounts of training data (if trained from scratch)
17
Q

What are the main characteristics of conventional multimodel learning?

A
  • features are manually designed and require prior knowledge about the underlying problem and data
  • some techniques, like early fusions, may be sensitive to data preprocessing
  • may not require as much training data
18
Q

What the levels of multimodel fusion techniques?

A
  • data level
  • decision level
  • intermediate fusion
19
Q

What happens on the data level of multimodal techniques?

A

Fuse the inputs before performing machine learning

20
Q

What happens on the decision level of multimodal techniques?

A

Fuse the decisions of each model i.e. outputs of the machines learning algorithm

21
Q

What happens on the intermediate fusion level of multimodal techniques?

A

Fuse the representations at different levels of the model (you need to understand which part of network encodes what parts of your data) i.e. intermediate layers of the convolutional neural networks

22
Q

What are the assumptions of data level fusion?

A

conditional independence among modalities, depth and color

23
Q

What does data level fusion do?

A
  • concentrate raw inputs
  • reduce dimensions of input
  • hand-crafted features
  • observing same phenomena but each sensor observes different types
24
Q

What is the output of data level fusion?

A

Decide on output by looking at the majority or mean or use an algorithm

25
Q

What does decision level fusion do?

A
  • employs different or the same model for different modalities
  • collect decisions form separate models trained on different modalities
    fuse them by averaging, sum-them-maximg, or meta learning
26
Q

When do you use decision level fusion?

A
  • the modalities are uncorrelated
  • the modalities have different dimensions
  • exploit different machine learning model for different modalities
    (CNNs for image, SVM for depth, MLP for touch, etc.)
27
Q

What are fusion methods?

A
  • multimodal deep learnig for robust object recognition
  • deep learning-based image segmentation on multimodal medical imaging
  • multimodal representation models for prediction and control from information
28
Q

What is intermediate fusion?

A
  • non-hand-crafted features
  • fuse similar modalities together
  • multi-modal architecture
29
Q

When is it best to use CNN?

A

CNN is best when you are planning to fuse the output of features

30
Q

What are the implications of multimodal learning results?

A
  1. We form a robust sensory representation
  2. We leverage complementary characteristics of modalities