Data Engineering Fundamentals - Data Sources Flashcards
_____________ Java Database Connectivity, Platform indepedant, language depedent.
a)odbc
b)jdbc
b) JDBC
___________________ Open Database Connectivity
Platform depedent, languae indepedent.
a)odbc
b)jdbc
ODBC
Text based format that represents data in a tabular form where each line corresponds to a rowand values within a row are seperated by commas.
a) CSV
b) Extended events file
CSV
For small to medium datasets.
For data interchange between systems with different technologies.
For human readable and editbale data storage
Importing/Exporting data from databases or spreadsheets.
Light weight, text based and human readable data interchange format that represents structured or semi structured data based on key-value pairs.
a) CSV
b) JSON
JSON
Data interchangeable between a web server and a web client.
Cnfigurations and settings for software applications
Use cases that need a flexible schema of nested data structures.
____________________ Columnar storage format optimised for analytics. Allows for efficient compression and encoding schemas.
a) CSV
b) JSON
c)Parquet
Parquet
When to use it:
- Analyxing large datsets with analytics engines
- Use cases where reading specific columns instead of entire records is beneficial.
Storing data on distrubuted systems where i/o operations and storage need optimisation.
What is a star schema consist of:
1) _____ tables
2) D_______sions
3) P______/F______ Keys
Fact tables
Dimensions
Primary /Foreign Keys
________________ a visual representation that traces the flow and transformation of data through ts lifecycle from its source to its final destination.
a) Fact tables
b) CSV
c) Data Lineage
C) Data Lineage
Helps in tracking errors back to their source.
Ensures compliance with regulations.
Provides a clear understanding of how data is moved, transformed and consumed within systems.
Example:
Uses Spline agent for spark attached to GLue.
Dumps lineage data into Neptune via Lambda.
_____________ Evolution.
The ability to adapt and change the schema of a dataset over time without disrupting existing processes or systems.
a) Export
b) transform
c) Schema
Schema Evolution
Ensures data systems can adapt to changing business requirements.
Allows for the addition, removal or modification of columns/fields in a dataset.
Maintains backward compatibility with older records.
What AWS service would you use:
Glue schema Registry