M06 - Multimodel Learning, Interaction and Communication Flashcards by Johana Trdlicová

What is machine learning?

“The field of machine learning is concerned with the question of how to construct computer programs that automatically improve with experience.”

How well did you know this?

Not at all

Perfectly

What is the definition of machine learning with E,T and P?

“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”

How well did you know this?

Not at all

Perfectly

Define E, T and P for a robot learning project.

Task: object recognition with color and depth data
Experience: the iCub multisensor dataset
Performance measure: accuracy, precision, recall, F1 score

How well did you know this?

Not at all

Perfectly

What is modality?

Sensory data that are associated with different aspects of the observed phenomena

How well did you know this?

Not at all

Perfectly

Why do we need multimodel learning/integration?

To form a robust sensory representation
To leverage complementary characteristics of modalities

How well did you know this?

Not at all

Perfectly

How do you count modalities in a robot?

The number of types of data = The number of modalities

How well did you know this?

Not at all

Perfectly

What are the 5 challenges in Multimodel Machine Learning?

Representation: how to represent multimodal data [pixels, signals, symbols, etc.]
Translation: how to map data from one modality to another
Alignment:how to identify direct relations between modalities
Fusion:how to join information from two or more modalities [data level, decision level, intermediate]
Co - learning: how to transfer knowledge between modalities

How well did you know this?

Not at all

Perfectly

What are the steps in a machine learning pipeline?

Preprocessing (dimensionality reduction, features extraction, selection, scaling, sampling, denoising)
Learning (Initializing, Optimizing, Cross-Validation)
Evaluation (New model)

How well did you know this?

Not at all

Perfectly

What is the problem between color and depth data?

There is a huge semantic gap between color depth data and raw data matrices

How well did you know this?

Not at all

Perfectly

How can we extract representations?

hand-crafted features
automatic feature learning

How well did you know this?

Not at all

Perfectly

Which feature selection is quicker?

Automatic feature learning usually finds better solutions than hand-designed ones

How well did you know this?

Not at all

Perfectly

What are desired specifications of the representations?

Similar representation should indicate similar concepts
[if you visualize the representation space, the distances for different carrots should be close to each other but far from the cars]
Representations should be robust
[The extracted representations should be robust to deal with the noise]
We should know how to handle missing data in one modality

How well did you know this?

Not at all

Perfectly

What is unimodal learning?

Discrete probability distribution of objects

How well did you know this?

Not at all

Perfectly

What are the problems with unimodal learning?

External reasons:
- noise in the environment
- miscalibrated sensors

Model-related reasons
- wrong model selection
- non-regularized weights
- using raw data as input

How well did you know this?

Not at all

Perfectly

What are the goals of multimodal learning/integration?

To form a robust sensory representations
To leverage complementary characteristics of modalities

How well did you know this?

Not at all

Perfectly

What are the main characteristics of deep multimodel learning?

Study These Flashcards

both modality-wise representations (features) and shared (fused) representations are learned from data
requires little or not preprocessing of input data (end-to-end training)
deeper, complex networks typically require large amounts of training data (if trained from scratch)

What are the main characteristics of conventional multimodel learning?

Study These Flashcards

features are manually designed and require prior knowledge about the underlying problem and data
some techniques, like early fusions, may be sensitive to data preprocessing
may not require as much training data

What the levels of multimodel fusion techniques?

Study These Flashcards

data level
decision level
intermediate fusion

What happens on the data level of multimodal techniques?

Study These Flashcards

Fuse the inputs before performing machine learning

What happens on the decision level of multimodal techniques?

Study These Flashcards

Fuse the decisions of each model i.e. outputs of the machines learning algorithm

What happens on the intermediate fusion level of multimodal techniques?

Study These Flashcards

Fuse the representations at different levels of the model (you need to understand which part of network encodes what parts of your data) i.e. intermediate layers of the convolutional neural networks

What are the assumptions of data level fusion?

Study These Flashcards

conditional independence among modalities, depth and color

What does data level fusion do?

Study These Flashcards

concentrate raw inputs
reduce dimensions of input
hand-crafted features
observing same phenomena but each sensor observes different types

What is the output of data level fusion?

Study These Flashcards

Decide on output by looking at the majority or mean or use an algorithm

What does decision level fusion do?

- employs different or the same model for different modalities - collect decisions form separate models trained on different modalities fuse them by averaging, sum-them-maximg, or meta learning

When do you use decision level fusion?

- the modalities are uncorrelated - the modalities have different dimensions - exploit different machine learning model for different modalities (CNNs for image, SVM for depth, MLP for touch, etc.)

What are fusion methods?

- multimodal deep learnig for robust object recognition - deep learning-based image segmentation on multimodal medical imaging - multimodal representation models for prediction and control from information

What is intermediate fusion?

- non-hand-crafted features - fuse similar modalities together - multi-modal architecture

When is it best to use CNN?

CNN is best when you are planning to fuse the output of features

What are the implications of multimodal learning results?

1. We form a robust sensory representation 2. We leverage complementary characteristics of modalities

M06 - Multimodel Learning, Interaction and Communication Flashcards

(30 cards)