Big Data Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

State the three Vs of big data.

A

Volume
Velocity
Variety

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define variety

A

The data held on the servers consists of many different types of data. It ranges from binary files to photos and videos.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define volume

A

There is too much data to fit onto a single server. Data must be stored across multiple servers, each composed of many hard drives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define velocity

A

Data on the servers is created and modified rapidly. The servers must
respond to frequently changing data within a matter of milliseconds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is big data

A

The large volume of data - both structured and unstructured - that inundates a business on a day-to-day basis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the most challenging attribute of big data?

A

A lack of structure. Unstructured data is difficult to analyse, and conventional databases are not suited to storing it as it doesn’t conform to a row and column structure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is useful information extracted from big data?

A

Machine-learning techniques are used to discern patterns in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which programming paradigm is well suited to processing big data?

A

Functional programming

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which property of data structures in functional programs means that their value doesn’t change after instantiation?

A

Immutable
This means something can never change once created

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the fact-based model for representing data.

A

Each individual piece of information is stored as a fact.
Facts are immutable and can’t be overwritten.
Therefore, each fact is stored with a timestamp. This allows multiple values to be held for the same attribute.
In the event of a query, the timestamps are compared and the most recent is returned.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

State two advantages of using the fact-based model for storing data?

A

It reduces the risk of accidentally losing data due to human error.
New data is simply appended to the dataset as it is created, therefore an index is not required.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is represented by nodes in graph schema?

A

Entities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How are relationships between entities represented in graph schema?

A

Arrows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is an assumption made about graph schema?

A

Each node contains the most recent information available.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly