4.11 Big Data Flashcards
What is Big Data?
Big Data is a catch term phrase for all things that won’t fit their usual containers (biggest difficulty lacks structure)
What are the three main features of Big Data?
1) Volume: Ammount of data won’t fit in a single server
2) Variety: Data is in many different forms such as structured, unstructured, text multimedia etc
3) Velocity: Streaming data milliseconds to seconds to respond
What is Structured Data?
Structured Data is data that can be defined using traditional database techniques using fields and records
What is Unstructured Data?
Data that can not be defined in columns and rows e.g., multimedia files web pages and the contents of emails
Why is Machine Learning Used?
Machine learning can be used here as it is good at looking at Qualitative (non-numerical) data in an automated way
How is Big data modelled?
Big Data is modelled using a graph schema which can be created using the graph data type
What are the Properties of a graph schema?
A Graph schema is made up of nodes, properties and edges
1) Node: An entity such as a customer, product or picker
2) Properties: Relevant data relating to that node
3) Edges: Shows the link and describes the relationship between two nodes
What is distributed processing?
Distributed processing refers to how work is split over several computers by adding more servers or workstations
What is Functional Programming?
Functional programming uses functions (a subroutine that returns a value) to create programs
What are the benefits of Functional programming?
1) Functional programming doesn’t make use of variables which means that it is immune to side effects that impact how the code runs.
2) Functional programs rely on just the function and it isn’t dependant on other variables avoids concurrence
3) Functional Programs code can be distributed across multiple servers and used multiple users
4) Easier to write ‘correct code’
Why can’t relational databases be used with Big Data?
Relational databases cant be used as they require the data to fit into a row and column format
What is a fact-based model?
1) A fact-based model is used to represent, model and query data sets at the scale of Big Data
2) Structured around ‘facts’ instead of entities with attributes
3) Data in a fact-based model cant be deleted or changed (immutable)