L3 Flashcards

1
Q

Machine Learning

A

Branch of AI and CS that focuses on the use of data and algorithms to imitate the way humans learn, gradually improving accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Supervised ML

A
  • use of labelled datasets to train algorithms which classify data or predict outcomes
  • classification or regression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Unsupervised ML

A
  • not supervised using training dataset with unlabeled data
  • model find hidden patterns and insights by itself
  • clustering or association (rules)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Reinforcement ML

A
  • simulates an agent that perceives and interprets its environment, takes action and learns through trial and error
  • wants to maximise cumulative reward in environment where each action has reward or penalty
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Workflow ML (7)

A
  1. gather data
  2. prepare data
  3. split into testin, train, valid
  4. train model
  5. test and validate model
  6. deploy model
  7. iteration
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

KNN

pro and con
practical things

what is it

A
  • classifies object based on closest training example in feature space -> nearest neighbour
  • k is number of examples closest to query
  • distances between point and all other points are found, k nearest points are selected, most frequent label is voted (classif) or averaged (regres)
    Pro: simple and usable for regression and classification, achieves high accuracy in wide type of predicition problems
    Con: becomes slow as size of data grows, high computing power needed
  • can be improved with preprocessing: decision trees, PCA
  • most useful when labeled data can’t be obtained
    eg) handwriting detection, image/video recognition, stock prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Decision trees

what is it
how does it work
pros and cons

A
  • supervised learning, can be used for class and regr but it usually used for binary class problems
  • tree-structured classifier, internal nodes are features of dataset, branches are decision rules and leaf nodes are outcome
  • algorithm starts from node, compares values and jumps to next node
  • pro: simple to understand, decision-related problems, helps think about all possible outcomes for problem, less data cleaning required, good preprocessing method
  • con: layers make it complex, computation complexity increases with layers, overfitting (resolved with RF)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Random Forest

what is it
differences from decision trees

A
  • average of several decision trees
  • each is trained with a random sample of data
  • takes majority vote or average (regress) of outcome of each tree
  • less overfitting
  • slower due to more computation
  • doesn’t use a set of formulas but average of many trees
  • much more successful if diverse
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Bootstrap Sampling

A

drawing of samples from data with replacement to estimate population parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Naive Bayes

A
  • uses conditional probability (bayes theorem) to calculate likelihood of a point belonging to a certain class
  • naively assumes that predictors aren’t related
  • used for binary or multiclass classif problems
    posterior = (prior x likelihood)/evidence
  • be able to explain in detail
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Linear Regression

A
  • model that describes relationship between predictors and outcomes
  • simplest linear model
  • key algorithms, commonly used for statistical analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Logistic Regression

A
  • adapts linear regress to classification
  • models the probability of an event by taking the logistic funct of a linear combination of 1 or more independent variables
  • basically puts linear combo into function that is bounded from 0 to 1
  • binary, multinomial and ordinal logistic regression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

K-Means Clustering

what it is
steps
elbow approach

A
  • groups items into clusters without predefined classes
  • each observation belongs to the cluster with the nearest mean
  • tries to keep clusters as small as possible
    process
  1. pick centroids
  2. data forms cluster with nearest points
  3. find new centroids of the cluster
  4. iterate until convergence

Elbow approach: how to chose best number for K, sum of square will go down quickly until its reduction becomes slow -> ideal point with least variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

PCA

A
  • dimensionality reduction, removes data not useful
  • takes attributes and data with most variance/relevance and mapps all data onto less dimentions
  • projection-based method that projects onto set of orthogonal axes
  • useful for exploratory analysis
  • eigenvalues can be used to determine nr of PC
How well did you know this?
1
Not at all
2
3
4
5
Perfectly