Big Data Flashcards
Define
Big Data
A broad term for datasets so large or complex that traditional data processing applications are inadequate, and the data must be stored on multiple servers.
Define
Volume
The Three V’s
The capacity required to store the data exceeds a single server.
Define
Velocity
The Three V’s
The data is produced and/or processed at very high speed.
Define
Variety
The Three V’s
The data is very diverse; data can appear in different types (eg text, video, images) and forms (eg structured, unstructured, semi-structured)
Define
Structured Data
Data that can be stored in a traditional system such as a relational database or spreadsheet, as they can be defined using fields and records.
Define
Unstructured Data
Data that cannot be defined in columns or rows (text documents, PDFs, voice messages, emails). It makes it difficult to anlayse the data.
Identify
Issues With Big Data
- Data sets so large they are difficult to store and analyse.
- Data is constantly changing, so it is difficult to keep track of changes.
- Massive storage and processing power required.
- Specialised software required to manage and extract meaningful info from the data.
- Data is unstructured so makes it very difficult to analyse.
Describe
Data Mining
The use of a variety of statistical analysis tools to uncover previously unknown patterns in the data stored in databases or relationships among variables.
Describe
Predictive Analysis
The use of data warehouses and complex algorithms to forecast future events, based on historical trends and calculated probabilities.
Describe
Data Warehousing
The process of bringing together data from various sources into one place so that meaningful data analysis can take place, such as data mining and predictive analytics.
Describe
Fact-Based Model
Used to represent, model, and query data sets at the scale of Big Data. It is similar to entity relationship models used in databases.
Define
Fact
Fact-Based Model
A piece of data that cannot be decomposed any further, and is forever true. The data:
- Must not include reduntant information.
- Must be specific to a particular point in time.
- Cannot be changed or deleted.
Describe
Graph Schema
A method of defining a structure of a big dataset using the fact-based model, as a graph.
Describe
Node
Represents a core entity in a data set. Depicted with an oval.
Describe
Edge
Represents the relationships between entities (nodes). Depicted using solid lines linking nodes togethor.