Lecture 3 Flashcards

1
Q

What are the 5 steps of the data preparation process?

A

Collect data Prepare data Build model Evaluate model Deploy model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name 7 data preprocessing techniques

A

Feature subset selection Discretization and binarization Dimensionality reduction Aggregation Attribute creation and transformation Sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain the preprocessing technique, aggregation.

A

Combining two or more attributes (or objects) into a single attribute (or object).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are two techniques of aggregation?

A

Data reduction: reduce the number of attributes or objects Change of scale: cities are aggregated into countries, etc…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe the 4 sampling techniques used in sampling for preprocessing data?

A

Simple Random Sampling - each object is selected with equal probability) Sampling without Replacement - remove object from sample if selected Sampling with Replacement - do not remove object from sample if selected Stratified Sampling - split the data into several partitions and take random samples from each partition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the two techniques of feature subset selection?

A

Remove redundant features - duplicate much or all of the information contained in one or more other attributes eg. purchase price of a product and the amout of sales text paix Remove irrelevant features - contain no informatino that is useful for the datamining task at hand eg. student id’s is often irrelevant to the task of predicting students’ GPAs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 3 techniques of discretization without using class labels?

A

Equal interval width Equal frequency K-means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe the curse of dimensionality

A

When the dimensionality increases, data becomes increasingly sparse in the space that it occupies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Big O’ notation?

A

Measure to evaluate the performance of an algorithm with respect to space or time Used to describe how an algorithm behaves with respect to run time and/or space as the problem size grows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the name of this type of Big O notation? O ( 1 )

A

Constant

  • something can be done in constant time
    eg. pick a number from a data set and return it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the name of this type of Big O notation? O ( n )

A

Linear eg. Finding an item in an unsorted list

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the name of this type of Big O notation? O ( n log n )

A

Linearthimic

eg. performing a mergesort or heapsort

and used by decision trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the name of this type of Big O notation?

O (n2)

A

Quadratic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the name of this type of Big O notation?

O (n3)

A

Cubic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the name of this type of Big O notation?

O ( 2n)

A

Exponential

eg. used for feature subset selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the name of this type of Big O notation?

O ( n! )

A

Factorial

  • used for permutation based algorithms