Big Data Flashcards

1
Q

What is big data?

A

Data (which is evergrowing list of immutable, atomic facts) that is so large that using traditional relational databases will not work

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is fact-based modeling?

A
  • Where big data is deconstructed
  • Into facts (fundemental Units)
  • Where the facts are then placed into a master dataset
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a master dataset?

A
  • Contains facts
  • Which together create a dataset
  • Which is evergrowing list of immutable, atomic facts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the first principle of fact-based modeling?

A
  • Raw data sotred as atomic facts
  • Only one fact is stored never two together
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the second principle of fact-based modeling?

A
  • Facts capture one single peice of information
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the third principle of fact-based modeling?

A
  • Facts are immutable and eternally true due to a timestamp
  • AKA Timestamps prevent changing of data upon being entered
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the final principle of fact-based modeling?

A
  • Each is identifiable so querying can identify duplicates
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the advantages of fact based modelling?

A
  • Simplicity no need for index
  • Only add data (appended meaning no resorted)
  • Data is immutable and true
  • Easy error corrections as facts can be returned to earlier good facts
  • Historical querys are easier to perform as facts are timestamped
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why cant you use relational databases for big data on social media sites?

A
  • Data volume is too complex
  • Data can have many connections
  • Meaning querying would be difficult and time consuming
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the components of a graph schema?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How is big data processed?

A
  • Large amounts of data
  • Is ditrabuted and processed on different computers
  • Meaning that traditional programming methods are problematic
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the benefits of functional programming that alllows for easier debugging?

A
  • Statlessness
  • Zero dependency on how often a function is called
  • Or what order its called
  • Allowing for easier understanding, prediction and therefore debugging of code
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the benefits of functional programing that involves parallellisation?

A
  • Higher-Order functions
  • Which does one more more function as inputs
  • And then outputs a function
  • Easy parallelisation allowing for more thant one processor to work on parts of a large data set at a time without changing any other part
  • Allowing for large volumes of data to be processed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Big Data?

A
  • Extremly lare data sets
  • Analysed computationally to revel patterns,trends and associations relating to human behaviour and interactions
  • Astronomy and the human genome data collected too much data making relational databses impractical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the workflow of big data?

A
  • Collect the vast amounts of data
  • Analyse the data through data mining
  • Discover trends and patterns
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the key characteristics of big data?

A
  • Volume - Too big to fit on a single server
  • Velocity - The response time in seconds or milliseconds when querying
  • Variety - A wide amount of formats the data can take