topic 1 Flashcards

Question 1

Q

what is AI

Answer

A

systems that mimic human intelligence, including reasoning, decision-making and problem solving
minimal human involvement

Question 2

Q

what is ML

Answer

A

teaching machines to find patterns in data and use them to make predictions or decisions
requires human involvement for data prep, model training and optimisation

Question 3

Q

what is training

Answer

A

give a training set of labelled examples, estimate the prediction function by minimising the prediction error on the training set

Question 4

Q

what is prediction

Answer

A

applying f to a never seen before x and predict the value of y = f(x)

Question 5

Q

what is the ground truth

Answer

A

refers to the reality you want to model with your machine learning algorithm
- it is the actual correct output associated with a dataset, used as a reference for training and evaluating models

Question 6

Q

what is data splitting

Answer

A

splitting the data into training and testing sets, helps the creation of data models and processes that use data models are accurate

Question 7

Q

what is the loss (error) function

Answer

A

quantifies the difference between predicted outputs of ML algorithm and actual target values

Question 8

Q

examples of loss (error) functions in regression

Answer

A

coefficient of determination (R^2)
mean square error
root mean square deviation

Question 9

Q

what is overfitting

Answer

A

occurs when the machine learning model gives accurate predictions for training data but not for new data

Question 10

Q

things that cause overfitting

Answer

A

data size is too small (doesn’t represent overall data accurately
training data contains irrelevant information (noisy data)
trains for too long on single sample of data
model complexity is high (learns noise within training data)

Question 11

Q

what is underfitting

Answer

A

occurs when machine learning model has not learned patterns in training data well

Question 12

Q

reasons for underfitting

Answer

A

training data not cleaned and contains noise
model; has high bias
size of training dataset used is not enough
model too simple

Question 13

Q

what is cross validation

Answer

A

evaluate the performance of model on unseen data

Question 14

Q

how does cross validation work

Answer

A

data is divided into multiple folds or subsets
one fold = validation set, rest = training set
repeat multiple times
average the results

Question 15

Q

what is data leakage

Answer

A

ML model already has information of test data in training set

Question 16

Q

what is feature performance

Answer

Study These Flashcards

A

calculates the score for all input features in machine learning model to establish the importance of each feature, in decision making process.
higher score = larger effect on model prediction

Question 17

Q

sources that lead to garbage in garbage out

Answer

Study These Flashcards

A

low quality data
biased sampling
incorrect labels
missing values
outliers
data inconsistences

Question 18

Q

what is supervised learning and what is it useful for

Answer

Study These Flashcards

A

-learns input, output relation
useful for fast screening and classification

Question 19

Q

what is unsupervised learning and what is it useful for

Answer

Study These Flashcards

A

does not require knowledge of outputs, only inputs
it finds similarities in complex data
requires user to know how many classes to expect
useful to reduce data dimensionality

Question 20

Q

name types of supervised learning

Answer

Study These Flashcards

A

classification (predicts a category)
regression ( predicts a value)

Question 21

Q

name types of unsupervised learning

Answer

Study These Flashcards

A

clustering ( divided by similarity)
association (identify sequences)
dimension reduction/generalization (find hidden dependencies)

Question 22

Q

what is reinforcement learning

Answer

Study These Flashcards

A

train model then access, train the model with new data then access again

Question 23

Q

what are the pitfalls of machine learning

Answer

Study These Flashcards

A

nondeterministic = even for the same input, can exhibit different behaviours on different runs
stochastic = use probability distribution to make predictions, they rely on randomness and uncertainty to make predictions and analyse data
can be biased

topic 1 Flashcards

(23 cards)