Introduction and Applications Flashcards

1
Q

Why would data-driven prediction be useful?

A

Finding non-obvious patterns (Hurricane Frances)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is:

Data science?

A

It’s a set of fundamental principles that guide the extraction of knowledge from data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is:

Data science?

A

It’s a set of fundamental principles that guide the extraction of knowledge from data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is:

Data mining?

A

It is the automatic extraction of patterns from data (via tools/technologies that incorporate the principles)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is:

Big Data?

A

Big Data is data that is so large that traditional data processing systems are unable to deal with it (both storage and analysis component)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is:

Querying?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is:

OLAP?

A

Short for “On-Line Analytical Processing”, OLAP is an advanced uery and reporting technique where there is a multidimentional analysis of data input. But is does not automatically extract patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is:

Business Intelligence?

A

Getting the right information to the right person at the right
time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is:

Data warehousing?

A

Data warehousing is the collection and coalescence of data from across an enterprise, often from multiple transaction-processing systems, each with its own database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is:

An ‘instance’?

A

It is a vector of size [#amount of input variables or features]
There are [#observations in data set] data instances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is:

A feature?

A

Also called input variable
It is a vector of size [#number of data instances]
There are [#number of input variables] features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is:

Machine Learning?

A

It is improving the knowledge of a learning agent by providing data to it, it transcends data mining, since it uses robotics…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is:

Artificial Intelligence?

A

It is the automatic extraction of patterns from large amounts of data. E.g. the computer than can interact due to data (Big Data + Machine Learning = Artificial Intelligence)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is:

The difference between ‘data mining’ and ‘using data mining results’?

A
[1] Is the mining of historical data to produce a model that tries to predict a target variable (for supervised learning)
[2] The phase where the extracted model is applied to new data for which the class value is unknown
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is:

CRISP-DM?

A

Short for ‘Cross Industry Standard Process for Data Mining’, CRISP-DM means a process for data mining consisting of (1) Business understanding, (2) Data understanding, (3) Data preparation, (4) Modeling, (5) Evaluation, and (6) Deployment
It is a process that should go through multiple iterations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly