11: Big Data Flashcards
Characteristics of Big Data (3)
- Volume: too big to fit into a single server
- Velocity: streaming data, milliseconds to seconds to respond
- Variety: data in many forms such as structured, unstructured, text, multimedia
Fact
Each fact in a fact-based model captures a single piece of information and is timestamped
Graph Schema (4)
- Graph schemas are graphs that depict the structure of a data set that is stored using a fact-based model
- Nodes are used to represent the core entities in the data set. They are depicted with ovals
- Edges are used to represent the relationships between nodes. They are depicted with directed or undirected solid lines
- Properties are used to represent information about nodes. They are depicted with rectangles
When Data Sizes are so Big as not to Fit on a Single-Server (2)
- The processing may be distributed across more than one machine
- Functional programming is a solution, because it makes it easier to write correct and efficient distributed code
Features of Functional Programming (3)
- Immutable data structures
- Statelessness
- High-order functions
Immutable Data Structures
They cannot be changed during program execution. This eliminates errors caused when data is overwritten by another server
Statelessness
Data structures are immutable and variables are not used so the program state does not change in execution. This means that a function with the same inputs will always produce the same output
High-Order Functions
They can take other functions as parameters and return functions as a result. They can run in parallel systems without disturbing other parts of the data sets