Big Data Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Define

Big Data

A

A broad term for datasets so large or complex that traditional data processing applications are inadequate, and the data must be stored on multiple servers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define

Volume

The Three V’s

A

The capacity required to store the data exceeds a single server.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define

Velocity

The Three V’s

A

The data is produced and/or processed at very high speed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define

Variety

The Three V’s

A

The data is very diverse; data can appear in different types (eg text, video, images) and forms (eg structured, unstructured, semi-structured)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define

Structured Data

A

Data that can be stored in a traditional system such as a relational database or spreadsheet, as they can be defined using fields and records.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define

Unstructured Data

A

Data that cannot be defined in columns or rows (text documents, PDFs, voice messages, emails). It makes it difficult to anlayse the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Identify

Issues With Big Data

A
  • Data sets so large they are difficult to store and analyse.
  • Data is constantly changing, so it is difficult to keep track of changes.
  • Massive storage and processing power required.
  • Specialised software required to manage and extract meaningful info from the data.
  • Data is unstructured so makes it very difficult to analyse.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe

Data Mining

A

The use of a variety of statistical analysis tools to uncover previously unknown patterns in the data stored in databases or relationships among variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe

Predictive Analysis

A

The use of data warehouses and complex algorithms to forecast future events, based on historical trends and calculated probabilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe

Data Warehousing

A

The process of bringing together data from various sources into one place so that meaningful data analysis can take place, such as data mining and predictive analytics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe

Fact-Based Model

A

Used to represent, model, and query data sets at the scale of Big Data. It is similar to entity relationship models used in databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define

Fact

Fact-Based Model

A

A piece of data that cannot be decomposed any further, and is forever true. The data:
- Must not include reduntant information.
- Must be specific to a particular point in time.
- Cannot be changed or deleted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe

Graph Schema

A

A method of defining a structure of a big dataset using the fact-based model, as a graph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe

Node

A

Represents a core entity in a data set. Depicted with an oval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe

Edge

A

Represents the relationships between entities (nodes). Depicted using solid lines linking nodes togethor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define

Property

A

Defines information about a node. Depicted with a rectangular box.

17
Q

Define

Distributed Processing

A

The principle of dividing processing work between two or more computers, linked together in a network.

18
Q

Define

Functional Programming

A

A type of programming paradigm that is mainly used for calculations and distributed processing, as the code can be proven correct and can be distributed across multiple devices without fear of erroneous results Some characteristics are:
- Immutable Data Structures
- Statelessness
- Higher-Order Functions

19
Q

Define

Immutable Data Structure

A

A data structure in which one cannot insert, remove, or replace the values contained therein.

20
Q

Define

Statelessness

A

A given program does not change its state during execution (data structures don’t change and variables are not used).

21
Q

Define

Higher-Order Function

A

A function that can use functions as parameters and return functions as a result.

22
Q

Define

Map Function

A

A higher-order function which applies a function to each element of a list and returns a new list.

23
Q

Define

Fold/Reduce Function

A

Applies a function recursively over a list and returns a value.