4.11 Big Data Flashcards

1
Q

What is big data?

A

Big data is the term for data that does not fit the usual containers.
It encompasses data that is too large or complex to be handled by conventional data-processing software.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the three defining features of big data?

A

Volume, velocity, variety

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does ‘volume’ refer to in big data?

A

Too much data to fit on a conventional hard drive or server.
This requires data to be stored over multiple servers, each composed of many hard drives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is meant by ‘velocity’ in the context of big data?

A

Data on the servers are created and modified rapidly.
Servers must respond to frequently changing data in a matter of milliseconds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does ‘variety’ mean when discussing big data?

A

Data held on servers consist of many different types of data.
Eg from binary files, photos, videos, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is big data difficult to analyze?

A

The lack of structure makes it difficult to analyze the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why don’t conventional databases scale well for big data?

A

Conventional databases require data to fit into a row-and-column format.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What techniques must be used to extract useful information from big data?

A

Machine learning techniques.
These techniques help to discern patterns in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Give examples of big data sources.

A

Data from networked sensors, smartphones, video surveillance, mouse clicks.
These are continuously streamed data sources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a challenge when processing data stored across multiple servers?

A

Data processing must be split across multiple machines.
This is difficult with conventional programming paradigms as machines must be synchronized.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does functional programming help with big data processing?

A

It makes it easier to write correct and efficient, distributed code.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does it mean for functional programs to be stateless?

A

They have no side effects.
This characteristic contributes to their reliability in distributed computing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What type of data structures do functional programs use?

A

Immutable data structures.
This means that data cannot be changed once created.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a fact-based model in data storage?

A

Each individual piece of data is stored as a fact, which is immutable and can’t be overwritten.
Each fact also includes a timestamp to indicate when the information was stored.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What happens when multiple facts for the same item are retrieved?

A

Timestamps are compared, and the most recent fact is returned.
This reduces the risk of accidentally losing data due to human error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does a graph schema represent?

A

It uses graphs consisting of nodes and edges to graphically represent the structure of a dataset.
Nodes represent entities and contain properties, while edges represent relationships between entities.

17
Q

Are timestamps included in graph schemas?

A

Timestamps are rarely included.
It is assumed that each node contains the most recent information available.