Terms Flashcards

1
Q

What are Common tasks?

A

Tasks that Data mining algorithms address

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the types of Common tasks?

A
Classification
Regression
Similarity matching
Clustering
Co-occurrence grouping
Profiling
Link prediction
Data reduction
Causal modelling.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some techniques for data analytics tasks?

A
Statistics
Database query
Data warehouse
Machine learning
Data mining
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Big Data?

A

Most used definition of big data: Volume,
Velocity, Variety

Big data is high-volume, high-velocity and/or high-variety information assets that demand
cost-effective, innovative forms of information processing that enable enhanced insight,
decision making, and process automation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does Big Data consist of?

A
Web data
Text data
Time/Location data
Smart grid and sensor data
Social network data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is Big Data different from traditional data?

A

(1) Big Data can be an entirely new source of data
(2) The speed of data feed has increase to such an extent that it qualifies as a new data source
(3) Increasingly more semi-structured and unstructured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the Paradigm shift in terms of analytic focus?

A

From descriptive to predictive and prescriptive analytics when using Big Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the Business value of Big Data?

A

(1) To draw insight from data
(2) To make better decision based on the insight
(3) To automate the decision and bake it into a business process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are some applications of Big Data across industry sectors?

A
Segmentation and prediction
Churn prediction
Recommender systems and targeted marketing
Sentiment analysis
Operational analytics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a Data warehouse?

A

Data warehouse collect and combine data from across an enterprise, often from multiple processing systems, each with its own database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you store Big Data

A

Hadoop framework. Traditional databases and warehouses fall short when dealing with big data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Machine learning?

A

Is computers to learn and act like humans do, and improve their learning over time in autonomous fashion, by feeding them data and information in the form of observations and real-world interactions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Data mining?

A

Data mining is to extracting knowledge from a large amount of data. It spun off from Machine Learning. We often describe data mining as the process of building models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Classification algorithms

A

Classification is the most frequently used data mining method for real world problems to create models from

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Cluster analysis

A

Is a data mining method for grouping items to create models from

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Association rule (Co-occurrence grouping)

A

Is a data mining method widely used in retail industry. Association rule mining aims to find interesting relationship between items in large datasets to create models from.

17
Q

Classification matrix

A

Estimating the true accuracy of classification models. True Positive/Negative rate, Accuracy etc

18
Q

Classification algorithms

A

A number of algorithms are used for classification modelling, fex KNN

19
Q

KNN

A

K Nearest Neighbour is a data mining algorithm mainly used for classification task.

20
Q

K Means

A

The k-means algorithm (where k stands for the predetermined number of clusters) is one of
the most referenced clustering algorithms.

21
Q

Supervised learning (methods)

A

We have X (data) and we use this in a calculation in order to get Y

22
Q

Unsupervised learning (methods)

A

We have X (data) but we don’t have any predictions (Y) about what the answer will be. It is up to the algorithm to come up with new data and an answer we can’t predict.

23
Q

Support, Confidence, Lift

A

Association rules provide information in the form of if-then statements. The S, C and L calculation gives a percentage of the S, C, L of the information