01 - Explore Core Data Concepts Flashcards
What is Data
Collection of facts, numbers, descriptions, objects, stored in a structured, semi-structured, unstructured way
Online Transactional Processing (OLTP)
Data is stored one transaction at a time
Online Analytical Processing (OLAP)
Data is periodically loaded, aggregated and stored in a cube
Transactional Workloads
Atomicity
Consistency
Isolation
Durability
What is Atomicity
Each transaction is treated as a single unit, which success completely or fails completely
What is Consistency
Transactions can only take the data in the database from one valid state to another
What is Isolcation
Concurrent execution of transactions leave the database in the same state
What is Durability
Once a transaction has been committed, it will remain committed
What are Analytical Workloads
Used for data analysis and decision making
- Summaries
- Trends
- Business information
What is Data Processing
Convert Raw Data to Meaningful Information
What is Batch Processing
Data elements are collected into a group. Whole group is then processed at a future time as a batch.
What is Stream Processing
Each new piece of data is processed when it arrives
Online Transaction Processing (OLTP)
For example order systems that perform many small transactional updates
Data Warehousing
Large amount of
fill in
Tables
Data is stored in a table
Table consists of rows and columns
All rows have same # of columns
Each column is defined by a datatype
Entity
Representation of an item which can be physical (such as a customer or a product), or virtual (such as an order).
Connected by relations
fill in
What is Normalization
Data is normalized to
Reduce storage
Avoid data duplication
Improve data quality
Normalized database schema
Primary Key and Foreign keys are used to define relationships
No data duplication exists (other than key values in 3rd Normal Form (3NF))
Data is retrieved by joining tables together in a query
Relational Database
Type of DB that uses the relational data model
SQL
Structured Query Language
Index
Optimizes search queries for faster data retrieval
Reduces the amount of data pages that need to be read to retrieve the data in an SQL Statement
Data is retrieved by joining tables together in a query
View
View is a virtual table based on the result set of query
Views are created to simplify the query
Combine relational data into a single pane view
Restrict access to table while allowing users to access non-confidential data
Non-relational collections can have
Multiple entities in the same collection or container with different fields
Have a different, non-tabular schema
Are often defined by labeling each field with the name it represents
What is semi-structured data
Data structure defined within the actual data by fields. Format/file types include
JSON
AVRO
ORC
Parquet