TOPIC 3 - BIG DATA, DATA ANALYTICS, AND MACHINE LEARNING Flashcards

1
Q

How can data help businesses

A

Smarter and faster decisions
Accurate predictions
Sorting the signal from the noise
Efficient operations including real-time changes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data is often viewed as what by companies

A

An asset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the cross industry standard process for data mining

A

An iterative process, that often involves going back-and-forth between stage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In practice what should happen in cross industry standard process for data mining

A

Shortcuts from each stage back to the prior one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does CRISP-DM stand for

A

Cross industry standard process for data mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Whats the starting point and goal of CRISP-DM

A

Solve a business problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What should the business problem

A

Important and solvable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How will the solution to the business problem be built

A

By using data as the raw material

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What do you need to understand

A

The strengths and weaknesses of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Crisp-dm needs to weigh up what

A

Benefits and costs of aquiring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What happens in the preparation stage

A

Clean the data
Decide on which variables you require

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What happens when you “clean” the data

A

Convert data to different types
Deal with missing values
Normalize or scale variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how do you decide on which variables you require

A

Can be guided by theory
In machine learning known as “feature engineering”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What happens in the modelling and evaluation stages

A

May use various tools to help model the data
Need to evaluate our model rigorously
Important that your model is comprehensible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

why do we evaluate our model

A

Beware of correlations by chance, p-hacking and overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What happens in the deployment stage

A

Important to understand the benefits and risks of deployment
Continuous monitoring is often required

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How is continuous monitoring required

A

Such monitoring detects worsening or unexpected model performance
Allows timely remediation actions such as adding new variables or retraining your model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Where does big data come from

A

Everywhere

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Specific examples of big data

A

Internet interactions
Text documents
images and videos

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Whats the widely used definition of big data

A

Big data is any set of data that is too large or too complex to be handled using conventional data-processing techniques

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Whats a synonym for big data

A

Alternative data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the 4 v’s

A

Volume
Velocity
variety
varacity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is Volume in 4 Vs of big data

A

terabytes to exabytes of existing data to process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is Velocity in 4 Vs of big data

A

Streaming data, milliseconds to seconds to respond

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is Variety in 4 Vs of big data
Structured, unstructured, text multimedia
26
What is Varacity in 4 Vs of big data
Uncertainty due to data inconsistency and incompleteness
27
How is data when talking about volume
data at rest
28
How is data when talking about velocity
Data in motion
29
How is data when talking about variety
Data in many forms
30
How is data when talking about varacity
Data in doubt
31
What are the additional Vs
Variability Value proposition
32
What does the Vs of big data mean is not possible
To store and process all the data on a single computer
33
What can big data help with
Process efficiency Finding new connections in data Improving predictions Needs to be analysed to have value
34
Data analytics is the process of what
Discovering patterns and relationships in data
35
What are the 4 types of analytics
Descriptive analytics Diagnostic analytics Predictive analytics Prescriptive analytics
36
What is descriptive analytics and an example
What has happened -> describes something that happened -> 50% returns in a month
37
What is diagnostic analytics
Why has something happened -> Describes the reason for the historical results -> Customers often return as not what they expected
38
Predictive analytics and an example
What will happen if? -> determines what will happen by analyzing past data -> next quarter looks like a decline
39
What is prescriptive analytics
What to do to make it happen -> Use info from other 3 to suggest a decision
40
What is artificial intelligence
Computer models or systems that exhibit intelligent behavior like humans
41
We currently have what
narrow ai systems
42
What is Narrow AI
Specialize in specific tasks
43
What is machine learning
Study of algorithms that: - improve their performance - at some task - with experience
44
Example of machine learning
T: Playing chess P: Percent games won against an opponent E: Playing games against itself
45
What are the 2 types of machine learning
Supervised Unsupervised
46
What is supervised machine learning
have training data with desired outputs (labels) needs a stable environment Focus on prediction
47
What is unsupervised machine learning
Only have training data without labels No feedback Focus on finding groups of similar items based on the data
48
What are the 5 approaches of data analysis
Traditional econometrics Supervised learning Unsupervised Traditional programming machine learning programs
49
What approaches have labeled data
Traditional econometrics Supervised learning
50
What has unlabeled data
Unsupervised learning
51
What method is traditional econometrics
Linear regression
52
What methods are supervised learning and unsupervised learning
Supervised Machine learning Unsupervised machine learning
53
Results in traditional econometrics
Explanatory model and statistical significance
54
what is the results of supervised learning
Prediction model and prediction performance
55
What is the results for unsupervised learning
Data structure model and data structure characteristics
56
what is traditional programming
Write a program with explicit rules to follow
57
What is machine learning programs
write a computer program to learn from examples
58
Supervised machine learning uses data for what
To learn a hypothesis to predict
59
Supervised machine learning uses what
classification models Regression models
60
When do you use classification models vs Regression models
Class -> Target variable categorical Reg -> target variable cont
61
Whats optimization in Supervised machine learning
How is the model trained on the data
62
whats representation for Supervised machine learning
How is the data specified What is the form of the model
63
whats evaluation in Supervised machine learning
How are we assessing if model is successful Whats the performance measure
64
AI wins when info is what
More transparent and voluminous
65
Humans win when institutional knowledge is what
Crucial
66
Performance edge of Ai what over time
Declines over time when alternative data is found
67
Combing ai and main produces what
The most accurate forecasts
68
Applications of AI and ML in finance
asset management call centres credit and insurance
69
When will larger training datasets improve prediction accuracy
if given X a human can confidently predict Y then yes
70
ML techniques are valuable when:
Have lots of features and training examples Impact of features is highly nonlinear prediction is more important that inference
71
Some ML approaches require alot of what
Computing power
72
Whats one solution to ML needing high computing power
Cloud computing services
73