Representation Learning Flashcards

Question 1

Q

What is representation learning? Why is it important?

Answer

A

learning representations of input data
- generally transforms it
- makes it easier to perform a task
the performance of any machine learning model is critically dependent on the representations it uses or learns

Question 2

Q

What is Principal Component Analysis (PCA)? What does it do?

Answer

A

method that aims at re-expressing a given dataset using a linear transformation
center the data by subtracting off the mean of each measurement
compute the eigenvectorsof XX^T, i.e., EDE^T and posing P = E
select the k &laquo_space;m most important components from P according to D

Question 3

Q

What transformations give PCA a problem?

Answer

A

searching for a linear transformation, contamination of data given by:
- noise (errors, interferences on the data that deviates from the norm)
- redundacy (multiple variables that can be reduced intoa single one)
we aim at finding a trasformation that minimizes both

Question 4

Q

How can we estimate noise and redundacy in data?

Answer

A

both can be estimated using measures related to the variance of the data
- signal-to-noise-ratio to measure noise
- the covariance matrix can measure dedundacy between features

Question 5

Q

How does the covariance matrix work?

Answer

A

the dataset must be centered (mean 0)
the covariance matrix is computed as
- S_X = 1/n(X*X^T)

Question 6

Q

How does PCA reduce features covariance?

Answer

A

we are aiming at diagonalizing the covariance matrix

- find some orthonormal matrix P where X’ = PX so that the covariance matrix of X’ is diagonal

Question 7

Q

How does PCA perform dimensionality reduction?

Answer

A

the most important components are the first ones
- data presents the most variability
ignoring the less important dimensions can help in simplyfing the data
- less important

Question 8

Q

What is kernel PCA?

Answer

A

extension of conventional PCA to deal with non-linear correlations using the
kernel trick
1- data not used directly but mapped implicitly to some nonlinear feature space
2- center data in feature space
3- apply PCA in the feature space
4- obtained non-linear transformations in the original data space

Question 9

Q

What can PCA be used for?

Answer

A

apply lossy compression
visualize multi-dimensional data in 2D or 3D
reduce the number of dimensions to discard noisy features
perform a change of representation in order to make analysis of the
data at hand

Question 10

Q

What is an autoencoder?

Answer

A

unsupervised learning technique based on feed forward neural networks
learns a representation for a set of data
- generally for dimensionality reduction
- can be used for learning generative models of data

Question 11

Q

How is the autoencoder composed? What do the components do?

Answer

A

encoder
- creates a new representation (the code) of the input
decoder
- reconstructs the input starting from the code
bottleneck layer
- compress the data and make the task harder
can have more hidden layers (simple vs deep)
- non-linear activation functions

Question 12

Q

What loss is generally used in an autoencoder? What is the learning procedure?

Answer

A

the standard loss function is a mean squared error loss
- called reconstruction loss
backpropagation and SGD generally used for training
- no specialized algorithm

Question 13

Q

What types of autoencoders are there?

Answer

A

regularized autoencoders
- loss function that encourages sparsity in the representation and robustness to noise
sparse autoencoders
- sparsity in the activation of hidden units -> L1/L2 regularization
denoising autoencoders
- remove noise and corruption

Question 14

Q

What are CNN? How do they work?

Answer

A

Convolutional Neural Networks
- specialized kind of NN for processing data that has a known grid-like topology (such as images)
employes convolution, mathematical operation
earns different levels of abstraction of the input
- fist hl detects general patterns (edges)
- deeper hl more specific abstractions (textures, patterns)

Question 15

Q

What is a typical CNN architecture?

Answer

A

Several CL + pooling (subsampling) followed by a fully connected network with ReLu

Question 16

Q

What is neural words embedding?

Answer

A

refers to technique in the text preprocessing phase

- transforms text into a vector of numbers

Question 17

Q

What is Word2Vec?

Answer

A

performs word embedding
based feed-forward fully connected architecture
- encoding each word in a vector
- aim is to represent words so to capture semantic and syntactic word similarity
similar to AE, trained against context (neighboring words)
- CBOW -> context to predict a target word
- Skip-gram -> target word to predict a target context

Question 18

Q

What is a knowledge graph? What is it used for?

Answer

A

multi-relational graph
- defines entities relationship
edges are facts, triples (head, kind, tail)
effective in representing structured data
- hard to manipulate (symbolic nature of triples)

Question 19

Q

What is Knowledge Graph Embedding and what is it used for?

Answer

A

embeds KG components into continuous vector spaces
- simplify manipulation while preserving structure
can be used for various tasks such as Knowledge Graph Completion

Question 20

Q

What is knowledge graph completion?

Answer

A

KGs are typically incomplete or incorrect
- perform knowledge base completion (link prediction)
,predict edges or probability of correctness in the graph