Big Data Flashcards
What is big data?
Data (which is evergrowing list of immutable, atomic facts) that is so large that using traditional relational databases will not work
What is fact-based modeling?
- Where big data is deconstructed
- Into facts (fundemental Units)
- Where the facts are then placed into a master dataset
What is a master dataset?
- Contains facts
- Which together create a dataset
- Which is evergrowing list of immutable, atomic facts
What is the first principle of fact-based modeling?
- Raw data sotred as atomic facts
- Only one fact is stored never two together
What is the second principle of fact-based modeling?
- Facts capture one single peice of information
What is the third principle of fact-based modeling?
- Facts are immutable and eternally true due to a timestamp
- AKA Timestamps prevent changing of data upon being entered
What is the final principle of fact-based modeling?
- Each is identifiable so querying can identify duplicates
What are the advantages of fact based modelling?
- Simplicity no need for index
- Only add data (appended meaning no resorted)
- Data is immutable and true
- Easy error corrections as facts can be returned to earlier good facts
- Historical querys are easier to perform as facts are timestamped
Why cant you use relational databases for big data on social media sites?
- Data volume is too complex
- Data can have many connections
- Meaning querying would be difficult and time consuming
What are the components of a graph schema?
How is big data processed?
- Large amounts of data
- Is ditrabuted and processed on different computers
- Meaning that traditional programming methods are problematic
What are the benefits of functional programming that alllows for easier debugging?
- Statlessness
- Zero dependency on how often a function is called
- Or what order its called
- Allowing for easier understanding, prediction and therefore debugging of code
What are the benefits of functional programing that involves parallellisation?
- Higher-Order functions
- Which does one more more function as inputs
- And then outputs a function
- Easy parallelisation allowing for more thant one processor to work on parts of a large data set at a time without changing any other part
- Allowing for large volumes of data to be processed
What is Big Data?
- Extremly lare data sets
- Analysed computationally to revel patterns,trends and associations relating to human behaviour and interactions
- Astronomy and the human genome data collected too much data making relational databses impractical
What is the workflow of big data?
- Collect the vast amounts of data
- Analyse the data through data mining
- Discover trends and patterns