Data Engineering Fundamentals - Data Sources Flashcards

1
Q

_____________ Java Database Connectivity, Platform indepedant, language depedent.

a)odbc
b)jdbc

A

b) JDBC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

___________________ Open Database Connectivity
Platform depedent, languae indepedent.

a)odbc
b)jdbc

A

ODBC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Text based format that represents data in a tabular form where each line corresponds to a rowand values within a row are seperated by commas.

a) CSV
b) Extended events file

A

CSV

For small to medium datasets.
For data interchange between systems with different technologies.
For human readable and editbale data storage
Importing/Exporting data from databases or spreadsheets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Light weight, text based and human readable data interchange format that represents structured or semi structured data based on key-value pairs.
a) CSV
b) JSON

A

JSON

Data interchangeable between a web server and a web client.

Cnfigurations and settings for software applications

Use cases that need a flexible schema of nested data structures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

____________________ Columnar storage format optimised for analytics. Allows for efficient compression and encoding schemas.

a) CSV
b) JSON
c)Parquet

A

Parquet

When to use it:
- Analyxing large datsets with analytics engines
- Use cases where reading specific columns instead of entire records is beneficial.
Storing data on distrubuted systems where i/o operations and storage need optimisation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a star schema consist of:
1) _____ tables
2) D_______sions
3) P______/F______ Keys

A

Fact tables
Dimensions
Primary /Foreign Keys

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

________________ a visual representation that traces the flow and transformation of data through ts lifecycle from its source to its final destination.

a) Fact tables
b) CSV
c) Data Lineage

A

C) Data Lineage

Helps in tracking errors back to their source.
Ensures compliance with regulations.
Provides a clear understanding of how data is moved, transformed and consumed within systems.

Example:

Uses Spline agent for spark attached to GLue.
Dumps lineage data into Neptune via Lambda.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

_____________ Evolution.

The ability to adapt and change the schema of a dataset over time without disrupting existing processes or systems.

a) Export
b) transform
c) Schema

A

Schema Evolution

Ensures data systems can adapt to changing business requirements.

Allows for the addition, removal or modification of columns/fields in a dataset.

Maintains backward compatibility with older records.

What AWS service would you use:

Glue schema Registry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly