ML Flashcards

1
Q

tf: In tensorflow you imagine

A
everything you are computing as a graph
nodes are the transformations on the data, or functions you are running on the data. These can have multiple inputs and outputs.
The edges (the things connecting nodes) are the data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

dl: back propogation is

A

looking at the output of a deep neural network model and comparing it to the desired output. Based on the difference between the correct answer and the prediction you adjust the layer right before to create the correct answer. Then based on the error in the second last layer, you adjust the third last layer and so on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

ml: The points plotted near the decision boundary are called support vectors because

A

the are the ones that force the decision boundary to be where it is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

ml: A Support Vector Machine is similar to a Nearest Neighbors because

A

An SVM only keeps the points that define the decision boundary, while NN keeps the points that do not influence the decision boundary as well as points that do.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

tf: A tensor is

A

a typed ndarray

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

tf: A one hot vector is

A

a vector with zero in all columns besides one. The column that one is in represents the class it belongs to.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

ml: MNIST is a

A

computer vision dataset with images of handwritten digits and their labels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

ml: softmax is a

A

multinomial logistic regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

ml: Softmax is good for

A

when you need the probabilities of a record belong to classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

tf: A bias is used to

A

tell the algorithm that a certain class is more frequent in general

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

tf: To create the table that hold all your samples, type

A

x = tf.placeholder(tf.float32, [None, 784])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

tf: in x = tf.placeholder(tf.float32, [None, 1000]), None means

A

that that dimension can vary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

tf: in x = tf.placeholder(tf.float32, [None, 784]), 1000 is

A

The number of columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

tf: To create the weights variable, type

A

W = tf.Variable(tf.zeros([1000, 10]))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

tf: To create the biases variable, type

A

b = tf.Variable(tf.zeros([10]))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

tf: To create a softmax model, type

A

y = tf.nn.softmax(tf.matmul(x, W) + b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

tf: a good cost function is called

A

cross-entropy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

tf: Gradient descent is a simple procedure, where

A

TensorFlow shifts each variable a little bit in the direction that reduces the cost

19
Q

ml: Using small batches of random data for taining is called

A

stochastic training

20
Q

tf: To create your cross entropy cost function, type

A
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
21
Q

tf: To initialize your gradient descent optimizer with a learning rate of 0.01 and a cost function called cross entropy, type

A

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

22
Q

tf: To initialize all the variables and then run a session type

A

init = tf.initialize_all_variables()

sess = tf.Session()
sess.run(init)

23
Q

tf: a feed dict is

A

a dict that maps a var full of samples to the x and a var full of labels to the y

24
Q

tf: To connect to tensorflows c++ back end you use

25
ml: A bias
ads a number to the input times the weight
26
ml: An activation function is
a function that takes is all of the inputs and then outputs a value
27
ml: ReLU stands for
Rectified linear units
28
ml: The rectifier functions formula is
f(x) = max(0, x)
29
ml: The rectifier is
the most popular activation function for DNNs
30
ml: A Convolutional Neural network is
a neural network structured in a way that is better for images.
31
tf: The basic procedure for creating a tf model is
``` import the data create the tensors create a session create your softmax layer create your loss function create the train step evaluate ```
32
tf: A more sophisticated optimizer than GradientBoostingOptimizer is
AdamOptimizer
33
tf: keep_prob in the feed_dict
controls the dropout rate
34
ml: A linear function is just a
giant matrix multiply
35
ml: A Logistic Classifier is
a linear classifier
36
tf: A softmax function takes all the scores from the linear functions and
turns them into class probabilities that together add up to one
37
ml: Scores in the context of logistic classifiers are also called
logits
38
ml: When you multiply to increase the size of your logits
the classifier makes the confident logits grow very quickly and becomes very confident
39
ml: When you divide to decrease the size of your logits
the classifier makes the logitsmove closer together and becomes less confident
40
ml: For one hot encoding
you make a vector with the same number of items as there are classes and then give each class one index of the array that represents it by making it's value 1 while the rest of the values are 0.
41
ml: A vector is
an array
42
ml: To make one hot encoding more efficient for models with thousands of classes we use
imbeddings
43
ml: Cross entropy is
the difference between the array of probabilities compared to your one hot encoded vector for the correct class.
44
ml: Multinomial logistic classification
inputs use a linear function to produce logits. Logits go into a softmax function to create probabilities out of 1, and the probabilities are then compared to the one hot encoded vector using cross entropy.