College 3 Flashcards
Define: Occam’s razor
Prefer the simplest hypothesis that fits the data.
‘all thing being equal, the simplest solution tends to be the best one’
x1= 1
x2= 2
x3=3
w0= 0.7, w1= 0.5 w2= -0.5 w3 = 1
what will be the output of the perceptron?
0.5 -1 + 3 +0.7 = 3.2 > 0 so 1
What are the main drivers of the breakthrough in deep learning?
- data
- computation
- machine learning
What are flaws in deep learning?
Adding carefully crafted noise to a picture can create a new image that people would see as identical, but which a DNN sees as utterly different. In this way, any starting image can be tweaked so a DNN misclassifies it as any target image a researcher chooses.
What are strengths in deep learning?
- ability to integrate information from huge heterogeneous sources
- ability to predict
- ability to detect / recognise
- ability to discover patterns
What are weaknesses in deep learning?
- DL algorithms lack ‘common sense’
- DL cannot put events into their context
- DL depends critically on the quality of the underlying statistics/data
- DL is opaque
- DL is ill-understood
Define: Amara’s law
We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run.
How to deal with AI?
- ignore the hype, focus on the concrete impact for your domain
- without human, AI algorithms are of no value
- Many applications will see a performance boost
- Many domains/jobs will transform dramatically (also in science)
- AI algorithms offer tremendous powerful tools but lack proper understanding.
What does Tishby’s information plane say?
The first hidden layer has a lot of information about the input as well as the output and the last hidden layer has little information on both (at initialisation)
What are the two phases in deep learning mechanics?
- mapping input to output (fast).
2. getting rid of noise (slow) compressing representations
What is the relationship between deep learning and gaussian processes.
Dropout – a technique that’s been in use in deep learning – can give us principled uncertainty estimates. These uncertainty estimates basically approximate those of our Gaussian process.
Define: lottery ticket hypothesis
If you want to win the lottery, you should buy all the tickets
There are many pathways from input to output, there is always one winning ticket, so try all the options.
This is an explanation why DL is successful and why you use so many parameters.
How does network pruning work (standard approach)?
- Train the network (randomly initialise weights: Wr)
- Remove superfluous network (i.e., small weights)
- Fine-tune the network
option: repeat steps 2 and 3
What is step 3 in the alternative approach to network pruning?
- Train the network
- Remove superfluous network (i.e., small weights)
- ?
Retrain the network by initialising with Wr
After pruning, subnetworks can do the job. What is the advantage of using subnetworks?
- The subnetworks are typically between 1% - 15% of the original size
- They require considerably less training time.
- they perform at or around the same level