College 3 Flashcards

Question 1

Q

Define: Occam’s razor

Answer

A

Prefer the simplest hypothesis that fits the data.

‘all thing being equal, the simplest solution tends to be the best one’

Question 2

Q

x1= 1
x2= 2
x3=3

w0= 0.7,	
w1=  0.5	
w2= -0.5 	
w3 = 1

what will be the output of the perceptron?

Answer

A

0.5 -1 + 3 +0.7 = 3.2 > 0 so 1

Question 3

Q

What are the main drivers of the breakthrough in deep learning?

Answer

A

data
computation
machine learning

Question 4

Q

What are flaws in deep learning?

Answer

A

Adding carefully crafted noise to a picture can create a new image that people would see as identical, but which a DNN sees as utterly different. In this way, any starting image can be tweaked so a DNN misclassifies it as any target image a researcher chooses.

Question 5

Q

What are strengths in deep learning?

Answer

A

ability to integrate information from huge heterogeneous sources
ability to predict
ability to detect / recognise
ability to discover patterns

Question 6

Q

What are weaknesses in deep learning?

Answer

A

DL algorithms lack ‘common sense’
DL cannot put events into their context
DL depends critically on the quality of the underlying statistics/data
DL is opaque
DL is ill-understood

Question 7

Q

Define: Amara’s law

Answer

A

We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run.

Question 8

Q

How to deal with AI?

Answer

A

ignore the hype, focus on the concrete impact for your domain
without human, AI algorithms are of no value
Many applications will see a performance boost
Many domains/jobs will transform dramatically (also in science)
AI algorithms offer tremendous powerful tools but lack proper understanding.

Question 9

Q

What does Tishby’s information plane say?

Answer

A

The first hidden layer has a lot of information about the input as well as the output and the last hidden layer has little information on both (at initialisation)

Question 10

Q

What are the two phases in deep learning mechanics?

Answer

A

mapping input to output (fast).

2. getting rid of noise (slow) compressing representations

Question 11

Q

What is the relationship between deep learning and gaussian processes.

Answer

A

Dropout – a technique that’s been in use in deep learning – can give us principled uncertainty estimates. These uncertainty estimates basically approximate those of our Gaussian process.

Question 12

Q

Define: lottery ticket hypothesis

Answer

A

If you want to win the lottery, you should buy all the tickets
There are many pathways from input to output, there is always one winning ticket, so try all the options.
This is an explanation why DL is successful and why you use so many parameters.

Question 13

Q

How does network pruning work (standard approach)?

Answer

A

Train the network (randomly initialise weights: Wr)
Remove superfluous network (i.e., small weights)
Fine-tune the network

option: repeat steps 2 and 3

Question 14

Q

What is step 3 in the alternative approach to network pruning?

Train the network
Remove superfluous network (i.e., small weights)
?

Answer

A

Retrain the network by initialising with Wr

Question 15

Q

After pruning, subnetworks can do the job. What is the advantage of using subnetworks?

Answer

A

The subnetworks are typically between 1% - 15% of the original size
They require considerably less training time.
they perform at or around the same level

Question 16

Q

Why is overparameterization useful?

Answer

A

You need to keep the original initialisation when pruning because random reinitialisation destroys the advantage of pruning.

Question 17

Q

What does new work say about overparameterization?

Answer

A

Given a randomly initialised overparameterized network, find subnetworks that perform that task. Such subnetworks (composed of random weights can even outperform trained ones.

Question 18

Q

What are the conclusions?

Answer

A

deep learning networks are overparameterized.
Their learning dynamics consist of two phases
large depth may provide a computational, rather than a representational advantage
Overparameterization may provide a sampling rather than a representational advantage
random subnetworks may outperform trained dense ones.