TOPIC 3 - BIG DATA, DATA ANALYTICS, AND MACHINE LEARNING Flashcards

1
Q

How can data help businesses

A

Smarter and faster decisions
Accurate predictions
Sorting the signal from the noise
Efficient operations including real-time changes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data is often viewed as what by companies

A

An asset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the cross industry standard process for data mining

A

An iterative process, that often involves going back-and-forth between stage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In practice what should happen in cross industry standard process for data mining

A

Shortcuts from each stage back to the prior one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does CRISP-DM stand for

A

Cross industry standard process for data mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Whats the starting point and goal of CRISP-DM

A

Solve a business problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What should the business problem

A

Important and solvable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How will the solution to the business problem be built

A

By using data as the raw material

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What do you need to understand

A

The strengths and weaknesses of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Crisp-dm needs to weigh up what

A

Benefits and costs of aquiring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What happens in the preparation stage

A

Clean the data
Decide on which variables you require

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What happens when you “clean” the data

A

Convert data to different types
Deal with missing values
Normalize or scale variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how do you decide on which variables you require

A

Can be guided by theory
In machine learning known as “feature engineering”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What happens in the modelling and evaluation stages

A

May use various tools to help model the data
Need to evaluate our model rigorously
Important that your model is comprehensible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

why do we evaluate our model

A

Beware of correlations by chance, p-hacking and overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What happens in the deployment stage

A

Important to understand the benefits and risks of deployment
Continuous monitoring is often required

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How is continuous monitoring required

A

Such monitoring detects worsening or unexpected model performance
Allows timely remediation actions such as adding new variables or retraining your model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Where does big data come from

A

Everywhere

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Specific examples of big data

A

Internet interactions
Text documents
images and videos

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Whats the widely used definition of big data

A

Big data is any set of data that is too large or too complex to be handled using conventional data-processing techniques

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Whats a synonym for big data

A

Alternative data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the 4 v’s

A

Volume
Velocity
variety
varacity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is Volume in 4 Vs of big data

A

terabytes to exabytes of existing data to process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is Velocity in 4 Vs of big data

A

Streaming data, milliseconds to seconds to respond

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is Variety in 4 Vs of big data

A

Structured, unstructured, text multimedia

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is Varacity in 4 Vs of big data

A

Uncertainty due to data inconsistency and incompleteness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How is data when talking about volume

A

data at rest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How is data when talking about velocity

A

Data in motion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How is data when talking about variety

A

Data in many forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

How is data when talking about varacity

A

Data in doubt

31
Q

What are the additional Vs

A

Variability
Value proposition

32
Q

What does the Vs of big data mean is not possible

A

To store and process all the data on a single computer

33
Q

What can big data help with

A

Process efficiency
Finding new connections in data
Improving predictions
Needs to be analysed to have value

34
Q

Data analytics is the process of what

A

Discovering patterns and relationships in data

35
Q

What are the 4 types of analytics

A

Descriptive analytics
Diagnostic analytics
Predictive analytics
Prescriptive analytics

36
Q

What is descriptive analytics and an example

A

What has happened -> describes something that happened -> 50% returns in a month

37
Q

What is diagnostic analytics

A

Why has something happened -> Describes the reason for the historical results -> Customers often return as not what they expected

38
Q

Predictive analytics and an example

A

What will happen if? -> determines what will happen by analyzing past data -> next quarter looks like a decline

39
Q

What is prescriptive analytics

A

What to do to make it happen -> Use info from other 3 to suggest a decision

40
Q

What is artificial intelligence

A

Computer models or systems that exhibit intelligent behavior like humans

41
Q

We currently have what

A

narrow ai systems

42
Q

What is Narrow AI

A

Specialize in specific tasks

43
Q

What is machine learning

A

Study of algorithms that:
- improve their performance
- at some task
- with experience

44
Q

Example of machine learning

A

T: Playing chess
P: Percent games won against an opponent
E: Playing games against itself

45
Q

What are the 2 types of machine learning

A

Supervised
Unsupervised

46
Q

What is supervised machine learning

A

have training data with desired outputs (labels)
needs a stable environment
Focus on prediction

47
Q

What is unsupervised machine learning

A

Only have training data without labels
No feedback
Focus on finding groups of similar items based on the data

48
Q

What are the 5 approaches of data analysis

A

Traditional econometrics
Supervised learning
Unsupervised
Traditional programming
machine learning programs

49
Q

What approaches have labeled data

A

Traditional econometrics
Supervised learning

50
Q

What has unlabeled data

A

Unsupervised learning

51
Q

What method is traditional econometrics

A

Linear regression

52
Q

What methods are supervised learning and unsupervised learning

A

Supervised Machine learning
Unsupervised machine learning

53
Q

Results in traditional econometrics

A

Explanatory model and statistical significance

54
Q

what is the results of supervised learning

A

Prediction model and prediction performance

55
Q

What is the results for unsupervised learning

A

Data structure model and data structure characteristics

56
Q

what is traditional programming

A

Write a program with explicit rules to follow

57
Q

What is machine learning programs

A

write a computer program to learn from examples

58
Q

Supervised machine learning uses data for what

A

To learn a hypothesis to predict

59
Q

Supervised machine learning uses what

A

classification models
Regression models

60
Q

When do you use classification models vs Regression models

A

Class -> Target variable categorical
Reg -> target variable cont

61
Q

Whats optimization in Supervised machine learning

A

How is the model trained on the data

62
Q

whats representation for Supervised machine learning

A

How is the data specified
What is the form of the model

63
Q

whats evaluation in Supervised machine learning

A

How are we assessing if model is successful
Whats the performance measure

64
Q

AI wins when info is what

A

More transparent and voluminous

65
Q

Humans win when institutional knowledge is what

A

Crucial

66
Q

Performance edge of Ai what over time

A

Declines over time when alternative data is found

67
Q

Combing ai and main produces what

A

The most accurate forecasts

68
Q

Applications of AI and ML in finance

A

asset management
call centres
credit and insurance

69
Q

When will larger training datasets improve prediction accuracy

A

if given X a human can confidently predict Y then yes

70
Q

ML techniques are valuable when:

A

Have lots of features and training examples
Impact of features is highly nonlinear
prediction is more important that inference

71
Q

Some ML approaches require alot of what

A

Computing power

72
Q

Whats one solution to ML needing high computing power

A

Cloud computing services

73
Q
A