Big Data Analytics Flashcards

1
Q

What is the promise of AI?

A

Unbiased consistent decision making, leverage data optimally and consistently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the key benefits of AI for financial managers?

A

-Data Processing: using both structured and unstructured data
-Improving efficiency: reduce costs by automating day-to-day assistance in risk management
Real-time and predicting
-Business decisions: greater predictive insight, visibility of risk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the HIPPO principle?

A

That AI attempts to objectify decisions by making them data-driven and not simply the Highest Paid Person’s Opinion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does HIPPO stand for?

A

Highest Paid Person’s Opinion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the key (“oil”) to AI?

A

Data: brings key advantages expressed in the 5Vs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the 5Vs?

A

Volume, Velocity, Value, Veracity, Variety

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Where is the Value Creation in Data?

A

GOOGLE, AMAZON, FACEBOOK
ARE INTEGRATED MODELS
WHO OWN THEIR
COMMUNITIES

PURE “PIPES” ARE WORTH 10x
LESS THAN COMMUNITIES

PURE PLAYERS ARE NICHE
PROVIDERS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the nº1 data business?

A

Advertisement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the valuation drivers?

A

The GAFA (?) => mostly Google and Amazon

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the Gartner data value ladder?

A

It’s a graph describing different business analytics types and their impact on corporate culture.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the 4 types of analytics in the Gartner value ladder?

A

Descriptive: what happened?
Diagnostic: Why did it happen?
Predictive: What will happen?
Prescriptive: Will it happen again?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the key success factors of AI projects?

A

Human Acceptance: 70%
Tech: 20%
Algorithms: 10%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the discriminatory risk in AI?

A

AI is a clustering tool. Consumer clustering is inherently discriminatory

e.g. if it sees the main demographic buying beer is men, it’s only gonna promote beer discounts to men

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the ethical problem with AI?

A

How can AI make moral trade-offs, how can we agree on a code of morals that all AI should follow (law, ethics, “societal value”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the steps of the step data process?

A

Collect data

Clean and format data

Store data

Transform data

Use data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Where do you collect data?

A

Databases

Internet

Social Networks

Files (JSON)

17
Q

How do you gather data from the internet?

A

Web scraping consists in collecting data directly published on websites

18
Q

What do you need for internet scraping?

A

It requires interpreting the content of HTML pages in order to extract the content fields

19
Q

When does internet scrapping work best?

A

It works well with structured content such as product catalogues, CMS systems and similar things.

20
Q

What are the benefits of the JSON format?

A

Very Flexible
Very Easy to Parse
Strong Momentum

21
Q

What is data cleaning?

A

Its the process of detecting
and correcting (or removing) corrupt or inaccurate records

22
Q

What is the Tidy Data principle?

A

Tidy datasets are easy to manipulate, model and visualize, and have a specific structure.

23
Q

How do you apply the Tidy Data principle?

A
  1. One variable; one column.
  2. Each observation; one row.
  3. There should be one table for each “kind” of variable.
  4. If you have multiple tables, they should include a column in the table
    that allows them to be linked.
24
Q

Why do we store data?

A

Big data storage enables the storage and sorting of big data in such a way that it can easily be accessed using the right tools.

25
Q

Which are the storage options for databases?

A

SQL

NOSQL

26
Q

What is the storage option for flat files?

A

Data lakes

27
Q

What are the pros of SQL?

A

You get relational databases.

Good at structured data and high performance workloads.

very wide tools.

28
Q

What are the cons of SQL?

A

Difficult to scale

fixed schema

29
Q

What are the pros of NOSQL?

A

They are good for non relational data.

Flexible structure.

Easily scalable, it runs well on the cloud.

30
Q

What are the cons of NOSQL?

A

Installation

Management is rough

Not enough tools developed to use it.

Slower response time

31
Q

What is a data lake?

A

Its usually a single store of all enterprise data including raw
copies of source system data and transformed data

32
Q

What is an example of a data lake tool?

A

Hadooop

33
Q

State examples of SQL tools.

A

MySQL

Oracle Database

34
Q

State examples of NOSQL tools.

A

Apache Hbase

MongoDB

35
Q

What can you do with Python?

A
  1. Read and write files (Excel, CSV, JSON)
  2. Perform quick and complex computations
  3. Make arrays and pivot table (faster than Excel)
  4. Use machine learning and AI libraries (Scikit Learn, Tensor Flow)
  5. Visualize data (MathPlotLib)
36
Q

What does SQL ?

A

Structured Query Language (program language to store and process info in a rational database)