chp1 Fundamentals of data engineering Flashcards
What questions should I be asking if I want to display information to my end user
Who will consume the data? , What data sources should I use? , where should I store the data? , When should the data arrive? , Why does the data need to be stored in this place? , How should the data be processed?
What is the first thing to learn as a data engineer
The data lifecycle
What is it called when data is stored in different places
data silos
what is a data silo
When data is stored in different and independent places
What are the four logical building blocks in a data warehouse?
sql interface, schema, compute, and storage
what is a data lake
a centeralized repository that allows you to store all your stuctured and unstructured data
Is a schema madatory for a data lake
no. Why?
Where is a schema mandiotry ? Data lake or datawarehouse
data warehouse
What are two benefits of a data lake?
Scalability and cheap storage
Data is a data warehouse is modeled for a ….?
business purpose
What is one of the key requirements for building a data warehouse?
know the business requirements
The frontend applications in most cases acts as what?
the first data upstream
What is a data mart?
A data mart is an area for storing data that serves specific user groups
Each data mart is usually under control of ….?
each deparment within an organization
what are the three most common usages for the last stage in the data lifecycle?
Reporting and Dashboard, Ad hoc query, Machine Learning
data should only be stored where? based on business needs ?
data warehouse
Where is a schema manditory?
data warehouse
A data engineer is someone that…?
designs and builds data pipelines
What is a Job Orchestrator?
Design and build jobs dependancy and scheduler that runs data movement from upstream to downstream. Why?
What is ETL ?
Extract Transform Load. Why?