11.1 Big Data Flashcards

Question 1

Q

What are the 3 defining features of big data?

Answer

A

Volume, velocity, variety

Question 2

Q

What does volume mean in big data?

Answer

A

There is too much data for it all to fit on a conventional hard drive or even a server. Data has to be stored over multiple servers, each of which is composed of many hard drives.

Question 3

Q

What does velocity mean in big data?

Answer

A

Data on the servers is created and modified rapidly. The servers must respond to frequently changing data within a matter of milliseconds.

Question 4

Q

What does variety mean in big data?

Answer

A

The data held on the servers consists of many different types of data from binary files to multimedia files.

Question 5

Q

What problems come from the way big data is structured?

Answer

A

Big data is unstructured making it difficult to analyse the data. Conventional databases are not suited to storing big data because they require the data to conform to a row and column structure. Conventional databases do not scale well across multiple servers.

Question 6

Q

How is useful information extracted from big data?

Answer

A

to extract useful information machine learning is used to discern patterns in the data.

Question 7

Q

What are some examples of big data?

Answer

A

Continuously monitored banking interactions and data from surveillance systems.

Question 8

Q

Why can’t big data use conventional programming paradigms?

Answer

A

When data is stored over multiple servers, as is the case with big data, the processing associated with using the data must also be split across multiple machines. This would be incredible difficult with conventional programming paradigms as the machines would all have to be synchronised to ensure that no data is overwritten or otherwise damaged.

Question 9

Q

What is functional programming?

Answer

A

A solution to the problem of processing data over multiple machines. The programs are stateless (meaning that they have no side effects) and make use of immutable data structures. The programming paradigm also supports higher-order functions.

Question 10

Q

What are the benefits of functional programming?

Answer

A

easier to write correct, efficient, distributed code than with procedural programming techniques.

Question 11

Q

Why is the fact-based model used to represent data?

Answer

A

Because big data doesnt conform to the row and column format typically used to represent data.

Question 12

Q

How is data represented in the fact-based model?

Answer

A

each individual piece of information is stored as a fact. Facts are immutable and can’t be overwritten.

Question 13

Q

What does immutable mean?

Answer

A

they never change once created

Question 14

Q

What are timestamps in big data?

Answer

A

They are stored with each fact to indicate the date and time at which piece of information was recorded.

Question 15

Q

Why are timestamps in big data used?

Answer

A

Seeing as facts are never deleted or overwritten, multiple different values could be held for the same attribute. So they are used to allow a computer to discern which value is most recent.

Question 16

Q

What are the benefits of the fact-based model?

Answer

Study These Flashcards

A

As the facts are immutable using the model for storing big data reduces the risk of accidentally losing data due to human error. Also the model gets rid of an index for the data and instead appends new data to the dataset as it’s created.

Question 17

Q

How does representing big data using graph schema work?

Answer

Study These Flashcards

A

This model uses graphs consisting of nodes and edges to graphically represent the structure of a dataset. Nodes in a graph represent entities and can contain the properties of the entity. Edges are used to represent relationships between entities and are labelled with a brief description of the relationship.

Question 18

Q

Why are timestamps not used in graph schema models?

Answer

Study These Flashcards

A

It is assumed that each node contains the most recent information.

Question 19

Q

Explain the 2 ways in which a graph schema model would look using 3 entities?

Answer

Study These Flashcards

A

The graph schema is represented as 3 circles. The properties of each entity are listed inside of the circles. Arrows linking the circles represent the relationships between the nodes.

An alternative is to represent the entity’s properties in rectangles and joined to the entities with a dashed line. The dashed lines do not represent relationships, just that the property belongs to the entity.

11.1 Big Data Flashcards

(19 cards)