Big Data Flashcards
What is Big Data?
Big data is a term for data that doesn’t fit the usual containers.
It is characterized by the three Vs: Volume, Velocity, and Variety.
What are the three Vs of Big Data?
Volume refers to the sheer amount of data that doesn’t fit on a conventional hard drive or server.
Velocity refers to the speed at which data is created and modified.
Variety refers to the different types of data from binary files to multimedia files like photos and videos.
Why is the structure of Big Data challenging?
Big data’s unstructured nature makes it difficult to analyze.
Conventional databases are not suited to storing big data because they require the data to conform to a row and column structure.
What techniques are used to extract useful information from Big Data?
Machine learning techniques are used to discern patterns in the data.
What is Functional Programming?
Functional programming is a solution to the problem of processing data over multiple machines.
Functional programs are stateless and make use of immutable data structures.
What is the fact-based model for representing data?
In the fact-based model, each piece of information is stored as a fact.
Facts are immutable and can’t be overwritten.
Stored with each fact is a timestamp, indicating the date and time at which a piece of information was recorded.
What is Graph Schema?
Graph schema uses graphs consisting of nodes and edges to graphically represent the structure of a dataset.
Nodes in a graph represent entities and can contain the properties of the entity.
Edges are used to represent relationships between entities.
What does stateless mean in the context of programming?
Stateless means that a system or process does not store any information about past requests or sessions.
Each request is processed independently, without any knowledge of previous requests.
What does immutable mean in the context of programming?
Immutable means that once a variable is assigned a value, it cannot be changed.
Any operation that appears to modify the variable actually creates a new variable with a new value.
Advantages of Fact-Based Modelling?
Simple (no indexing)
Historical queries are easy to run.
New items are simply appended to the dataset.
Errors are easy to correct through rollback.
Data is true forever.
What are the principles of fact-based modelling?
Raw data is stored as atomic facts
Each fact captures a single piece of information
Facts are immutable and eternally true via the use of timestamps
Each fact is made identifiable so query processing can easily identify duplicates