Data Science Flashcards

1
Q

deep learning

A

subset of machine learning that can learn unsupervised from unstructured data (the deep refers to the depth of a networks layers)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

machine learning

A

subset of AI that automates analytical model building. emphasizes systems can learn from data, identify patterns, and make decisions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

artificial intelligence

A

theory and development of computer systems that normally require human intelligence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

3 types of machine learning

A

supervised learning, unsupervised learning, reinforcement learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

data science

A

field of study that combines information technology, modeling, and business management to extract insights from data. Machine learning is a subset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

describe process of node in neural net deciding to fire?

A

takes the number passed from each of its connector nodes below it, multiplies each by its weight, and ]fires outgoing connections to nodes above if sum exceeds threshold val

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

describe neural net training briefly

A

weights and thresholds are set to random vals. training data fed up from bottom layer (input layer) through successive layers (getting multiplied an added), until arrives transformed in output layers. Weights and thresholds continually adjusted until training data with same labels yield similar outputs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Who first proposed neural nets?

A

McCullough and Pitts in 1944 at University of Chicago (later went to MIT to start first cognitive sci dept)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

neural net (and concerns)

A

means of doing machine learning, in which a computer learns to perform some task by analyzing training examples. Usually hand labelled in advance. Concerns are model transparency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Rank decision tree, linear regression, random forest in descending interoperability, accuracy

A
  1. Linear regression. 2. Decision Tree 3. Random forest

1. Random Forest 2. Decision Tree 3. Linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Examples of open source machine learning libraries

A

sklearn, tensorsflow, keras, H20

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

evalML

A

featureLabs product that finds best model from many popular libraries to use

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

featuretools

A

featureLabs product for constructing single table of features from multiple tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

overfitting

A

capturing noise and patterns that don’t generalize well to unseen data (opposite is underfitting)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

supervised learning

A

uses labelled data and you have and training data with the “correct answer” you’re looking for

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

unsupervised learning

A

uses unlabelled data to investigate patterns you’ve never considered

17
Q

decision surface

A

separates one class from another in a scatterplot

18
Q

Naive Bayes

A

common supervised classification algorithm.
Pros: easy, fast, handles tons of features (great for text classification)
Cons: doesn’t care about word order

19
Q

sensitivity

A

true positive rate from a test (or think if a test comes back positive, how likely is it to be true)?

20
Q

specificity

A

true negative rate from a test

21
Q

Bayes theorem

A

finds the probability of an event occurring given the probability of another event that has already occurred

22
Q

Symbol for ‘mean’ of x values

A

X-bar or X̄

23
Q

symbol for standard deviation of x

A

Sx

24
Q

correlation between x and y (symbol)

A

r

25
Q

number of values in a set (symobl)

A

n

26
Q

symbol for probability

A

P or p

27
Q

conditional probability symbol

A

pipe or | (can also indicate conditional distribution). Read usually as ‘given that’

28
Q

SVM

A

Support vector machine

29
Q

what does svm rely on?

A

a linear separation between the two classes (maximizes margin)

30
Q

how does svm compensate for non-linear space?

A

adds extra dimensions (called the kernel trick)