Introduction and Applications Flashcards
Why would data-driven prediction be useful?
Finding non-obvious patterns (Hurricane Frances)
What is:
Data science?
It’s a set of fundamental principles that guide the extraction of knowledge from data
What is:
Data science?
It’s a set of fundamental principles that guide the extraction of knowledge from data
What is:
Data mining?
It is the automatic extraction of patterns from data (via tools/technologies that incorporate the principles)
What is:
Big Data?
Big Data is data that is so large that traditional data processing systems are unable to deal with it (both storage and analysis component)
What is:
Querying?
What is:
OLAP?
Short for “On-Line Analytical Processing”, OLAP is an advanced uery and reporting technique where there is a multidimentional analysis of data input. But is does not automatically extract patterns.
What is:
Business Intelligence?
Getting the right information to the right person at the right
time
What is:
Data warehousing?
Data warehousing is the collection and coalescence of data from across an enterprise, often from multiple transaction-processing systems, each with its own database.
What is:
An ‘instance’?
It is a vector of size [#amount of input variables or features]
There are [#observations in data set] data instances
What is:
A feature?
Also called input variable
It is a vector of size [#number of data instances]
There are [#number of input variables] features
What is:
Machine Learning?
It is improving the knowledge of a learning agent by providing data to it, it transcends data mining, since it uses robotics…
What is:
Artificial Intelligence?
It is the automatic extraction of patterns from large amounts of data. E.g. the computer than can interact due to data (Big Data + Machine Learning = Artificial Intelligence)
What is:
The difference between ‘data mining’ and ‘using data mining results’?
[1] Is the mining of historical data to produce a model that tries to predict a target variable (for supervised learning) [2] The phase where the extracted model is applied to new data for which the class value is unknown
What is:
CRISP-DM?
Short for ‘Cross Industry Standard Process for Data Mining’, CRISP-DM means a process for data mining consisting of (1) Business understanding, (2) Data understanding, (3) Data preparation, (4) Modeling, (5) Evaluation, and (6) Deployment
It is a process that should go through multiple iterations