Introduction Flashcards

1
Q

What launch the Big Data era ?

A

The combination of growing data and on demand cloud computing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What steps are required for processing unstructured data ?

A

Data Acquisition, Storage, Retrieval, Cleaning, Processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 4 technologies helping handle unstructured data ?

A

Hadoop, Storm, Spark, NoSQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Hadoop ?

A

Open source framework designed to handle big amount of unstructured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are Storm and Spark ?

A

Frameworks for real time processing of a big amount of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Neo4J ?

A

A graph database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Cassandra ?

A

A Key-Value Pairs database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Where does the value of Big Data come ?

A

Value comes from integrating different types of data source and analysing them at scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are 3 advantages of integrating data sources leading to an increased data collaboration ?

A

It reduces complexity, it increases data availability, it unifies the data systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the 5 Vs - characteristics of Big Data ?

A

The 5 Vs are Volume, Variety, Velocity, Veracity, Valence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What could be the 6th V completing the 5 characteristics of Big Data coined by Doug Laney of Gartner?

A

It could be Value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the original 3 Vs - Characteristics of Big Data ?

A

The 3 first V’s are Volume, Variety, Velocity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does Valence refer to ?

A

This refers to how big data can be bond with each other, forming connections between otherwise disparate datasets. It also refers to the connectiveness of big data in the form of Graphs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does Volume refer to ?

A

This refers to the vast amounts of data that is generated every second/minute/hour/day in our digitized world. Dimension of Big Data related to its size and its exponential growth.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does Variety refers to ?

A

This refers to the ever-increasing different forms that data can come in, e.g., text, images, voice, geospatial. The variety refer to the additional complexity related to different kinds of data that needed to store, combine and process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does Velocity refer to ?

A

This refers to the speed at which data is being generated and the pace at which data moves from one point to the next.

17
Q

What does Veracity refer to ?

A

This refers to the quality of the data, which can vary greatly. It sometimes gets referred to as validity or volatility referring to the lifetime of the data.

18
Q

What does Value refer to ?

A

Processing big data must bring value from insights gained to support decision-making.

19
Q

What are the challenges related to the Volume ?

A

The challenges of the Volume include the costs, scalability and performance related to there storage, access and processing

20
Q

What should be considered to assess a situtation ?

A

Risks, Benefits, Contigencies, Regulations, Resources, Requirements

21
Q

How can you define goals ?

A

Define objectives and success criteria

22
Q

What are the 5 P’s ?

A

Purpose, People, Process, Platforms, Programmability

23
Q

What are the 5 steps in the data science process?

A

Acquire, prepare, analyse, report, act

24
Q

Give 5 graph types for visualizing data

A

Heat map, histogramm, Boxplot, line graphs, scatter plots

25
Q

What is YARN

A

The ressource manager for Hadoop. Yet another ressource negociator.

26
Q

What is HDFS

A

The distributed data storage.