topic 1 Flashcards

1
Q

what is AI

A
  • systems that mimic human intelligence, including reasoning, decision-making and problem solving
  • minimal human involvement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is ML

A
  • teaching machines to find patterns in data and use them to make predictions or decisions
  • requires human involvement for data prep, model training and optimisation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is training

A

give a training set of labelled examples, estimate the prediction function by minimising the prediction error on the training set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is prediction

A

applying f to a never seen before x and predict the value of y = f(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is the ground truth

A

refers to the reality you want to model with your machine learning algorithm
- it is the actual correct output associated with a dataset, used as a reference for training and evaluating models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is data splitting

A

splitting the data into training and testing sets, helps the creation of data models and processes that use data models are accurate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the loss (error) function

A

quantifies the difference between predicted outputs of ML algorithm and actual target values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

examples of loss (error) functions in regression

A
  • coefficient of determination (R^2)
  • mean square error
  • root mean square deviation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is overfitting

A

occurs when the machine learning model gives accurate predictions for training data but not for new data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

things that cause overfitting

A
  • data size is too small (doesn’t represent overall data accurately
  • training data contains irrelevant information (noisy data)
  • trains for too long on single sample of data
  • model complexity is high (learns noise within training data)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is underfitting

A

occurs when machine learning model has not learned patterns in training data well

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

reasons for underfitting

A
  • training data not cleaned and contains noise
  • model; has high bias
  • size of training dataset used is not enough
  • model too simple
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is cross validation

A

evaluate the performance of model on unseen data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how does cross validation work

A
  1. data is divided into multiple folds or subsets
  2. one fold = validation set, rest = training set
  3. repeat multiple times
  4. average the results
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is data leakage

A

ML model already has information of test data in training set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is feature performance

A

calculates the score for all input features in machine learning model to establish the importance of each feature, in decision making process.
higher score = larger effect on model prediction

17
Q

sources that lead to garbage in garbage out

A
  • low quality data
  • biased sampling
  • incorrect labels
  • missing values
  • outliers
  • data inconsistences
18
Q

what is supervised learning and what is it useful for

A

-learns input, output relation
useful for fast screening and classification

19
Q

what is unsupervised learning and what is it useful for

A
  • does not require knowledge of outputs, only inputs
  • it finds similarities in complex data
  • requires user to know how many classes to expect
  • useful to reduce data dimensionality
20
Q

name types of supervised learning

A
  • classification (predicts a category)
  • regression ( predicts a value)
21
Q

name types of unsupervised learning

A
  • clustering ( divided by similarity)
  • association (identify sequences)
  • dimension reduction/generalization (find hidden dependencies)
22
Q

what is reinforcement learning

A
  • train model then access, train the model with new data then access again
23
Q

what are the pitfalls of machine learning

A
  • nondeterministic = even for the same input, can exhibit different behaviours on different runs
  • stochastic = use probability distribution to make predictions, they rely on randomness and uncertainty to make predictions and analyse data
  • can be biased