11.1 Big Data Flashcards

1
Q

What are the 3 defining features of big data?

A

Volume, velocity, variety

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does volume mean in big data?

A

There is too much data for it all to fit on a conventional hard drive or even a server. Data has to be stored over multiple servers, each of which is composed of many hard drives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does velocity mean in big data?

A

Data on the servers is created and modified rapidly. The servers must respond to frequently changing data within a matter of milliseconds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does variety mean in big data?

A

The data held on the servers consists of many different types of data from binary files to multimedia files.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What problems come from the way big data is structured?

A

Big data is unstructured making it difficult to analyse the data. Conventional databases are not suited to storing big data because they require the data to conform to a row and column structure. Conventional databases do not scale well across multiple servers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is useful information extracted from big data?

A

to extract useful information machine learning is used to discern patterns in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are some examples of big data?

A

Continuously monitored banking interactions and data from surveillance systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why can’t big data use conventional programming paradigms?

A

When data is stored over multiple servers, as is the case with big data, the processing associated with using the data must also be split across multiple machines. This would be incredible difficult with conventional programming paradigms as the machines would all have to be synchronised to ensure that no data is overwritten or otherwise damaged.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is functional programming?

A

A solution to the problem of processing data over multiple machines. The programs are stateless (meaning that they have no side effects) and make use of immutable data structures. The programming paradigm also supports higher-order functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the benefits of functional programming?

A

easier to write correct, efficient, distributed code than with procedural programming techniques.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is the fact-based model used to represent data?

A

Because big data doesnt conform to the row and column format typically used to represent data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is data represented in the fact-based model?

A

each individual piece of information is stored as a fact. Facts are immutable and can’t be overwritten.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does immutable mean?

A

they never change once created

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are timestamps in big data?

A

They are stored with each fact to indicate the date and time at which piece of information was recorded.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why are timestamps in big data used?

A

Seeing as facts are never deleted or overwritten, multiple different values could be held for the same attribute. So they are used to allow a computer to discern which value is most recent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the benefits of the fact-based model?

A

As the facts are immutable using the model for storing big data reduces the risk of accidentally losing data due to human error. Also the model gets rid of an index for the data and instead appends new data to the dataset as it’s created.

17
Q

How does representing big data using graph schema work?

A

This model uses graphs consisting of nodes and edges to graphically represent the structure of a dataset. Nodes in a graph represent entities and can contain the properties of the entity. Edges are used to represent relationships between entities and are labelled with a brief description of the relationship.

18
Q

Why are timestamps not used in graph schema models?

A

It is assumed that each node contains the most recent information.

19
Q

Explain the 2 ways in which a graph schema model would look using 3 entities?

A

The graph schema is represented as 3 circles. The properties of each entity are listed inside of the circles. Arrows linking the circles represent the relationships between the nodes.

An alternative is to represent the entity’s properties in rectangles and joined to the entities with a dashed line. The dashed lines do not represent relationships, just that the property belongs to the entity.