4.11 Big Data Flashcards

Question 1

Q

What is big data?

Answer

A

Big data is the term for data that does not fit the usual containers.
It encompasses data that is too large or complex to be handled by conventional data-processing software.

Question 2

Q

What are the three defining features of big data?

Answer

A

Volume, velocity, variety

Question 3

Q

What does ‘volume’ refer to in big data?

Answer

A

Too much data to fit on a conventional hard drive or server.
This requires data to be stored over multiple servers, each composed of many hard drives.

Question 4

Q

What is meant by ‘velocity’ in the context of big data?

Answer

A

Data on the servers are created and modified rapidly.
Servers must respond to frequently changing data in a matter of milliseconds.

Question 5

Q

What does ‘variety’ mean when discussing big data?

Answer

A

Data held on servers consist of many different types of data.
Eg from binary files, photos, videos, etc.

Question 6

Q

Why is big data difficult to analyze?

Answer

A

The lack of structure makes it difficult to analyze the data.

Question 7

Q

Why don’t conventional databases scale well for big data?

Answer

A

Conventional databases require data to fit into a row-and-column format.

Question 8

Q

What techniques must be used to extract useful information from big data?

Answer

A

Machine learning techniques.
These techniques help to discern patterns in the data.

Question 9

Q

Give examples of big data sources.

Answer

A

Data from networked sensors, smartphones, video surveillance, mouse clicks.
These are continuously streamed data sources.

Question 10

Q

What is a challenge when processing data stored across multiple servers?

Answer

A

Data processing must be split across multiple machines.
This is difficult with conventional programming paradigms as machines must be synchronized.

Question 11

Q

How does functional programming help with big data processing?

Answer

A

It makes it easier to write correct and efficient, distributed code.

Question 12

Q

What does it mean for functional programs to be stateless?

Answer

A

They have no side effects.
This characteristic contributes to their reliability in distributed computing.

Question 13

Q

What type of data structures do functional programs use?

Answer

A

Immutable data structures.
This means that data cannot be changed once created.

Question 14

Q

What is a fact-based model in data storage?

Answer

A

Each individual piece of data is stored as a fact, which is immutable and can’t be overwritten.
Each fact also includes a timestamp to indicate when the information was stored.

Question 15

Q

What happens when multiple facts for the same item are retrieved?

Answer

A

Timestamps are compared, and the most recent fact is returned.
This reduces the risk of accidentally losing data due to human error.

Question 16

Q

What does a graph schema represent?

Answer

Study These Flashcards

A

It uses graphs consisting of nodes and edges to graphically represent the structure of a dataset.
Nodes represent entities and contain properties, while edges represent relationships between entities.

Question 17

Q

Are timestamps included in graph schemas?

Answer

Study These Flashcards

A

Timestamps are rarely included.
It is assumed that each node contains the most recent information available.

4.11 Big Data Flashcards

(17 cards)