theory lecture 5 Flashcards

1
Q

statistics

A

more theory-based and top-down ideas. it is more model based and focuses on testing hypotheses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

machine learning

A

more heuristic and focused on improving performance of a learning agent. it also looks at real-time learning and robotics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

data mining and knowledge discovery

A

integrates both theory and heuristics. the focus is on the entire process of knowledge discovery, including data cleaning, learning, and integration and visualisation of results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

test data

A

shows how well the machine is learning after the training in supervised learning systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

regression

A

a machine learning model where you try to predict a score.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

association

A

a type of unsupervised learning where you try to see the data types and how well they associate with each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

clustering

A

a type of unsupervised learning where you eg. try to differentiate dogs and cats.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

ANN

A

the data is split into three subsets for classification; ~60% training, ~20% validation, and ~20% testing. it is a prediction model that is inspired by the way a brain works with neurons. it is what deep learning is based on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

overtraining

A

when you use too much data for training and the algorithm knows everything about the sample, but it may not recognise anything outside of the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

target variable

A

the variable we are trying to predict based on the attributes in the columns of a table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

dimensionality of a data set

A

the sum of the dimensions of the features/attributes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

curse of dimensionality

A

when you have too many dimensions and it becomes hard to predict a value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

CRISP-DM

A

a model used to show the knowledge discovery process flow. the process is highly repetitive and experimental. you may have to back in steps, eg. if your model is different in practice.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

C&RT

A

a prediction model. it stands for Classification and Regression Trees.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Random Forest

A

a prediction model that combines different trees.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Boosted Tree

A

a prediction model that combines trees in a boosting way.

17
Q

Fusion

A

a prediction model that combines different algorithms.

18
Q

1-Away

A

means the accuracy including a prediction of 1 class away. eg. it predicts 4, but it is actually 5.

19
Q

SVM

A

a prediction model, using a line that divides your data.

20
Q

linear regression

A

a method used for classification with the formula w0 + w1x + w2y >= 0. it computes w1 from the data to minimise the squared error to ‘fit’ the data. it uses a line to classify data into a class.

21
Q

decision trees

A

a method for classification that splits data by drawing multiple horizontal and vertical lines.

22
Q

confusion matrix

A

the primary source for accuracy estimation in classification problems. it shows how confused your model is between two classes. you can put your testing data into the matrix to see how many are correct.

23
Q

precision

A

given something is positive in a predicted class, how often do you predict it right?

24
Q

recall

A

given that the true class is positive, how often do you predict it right?

25
Q

decision tree

A

puts your data in a format to split it up. the higher attributes in the tree are more important.