introduction Flashcards
what is data
anything that can be represented in binary
why do we collect data
so that we can retrieve information and understand it
what is data engineering
the process of designing and building systems that allow people to collect manage and analyse data
what do data engineers do
make raw data useable for data scientists and third parties in general
which 6 things are data engineers responsible for
pipelines
integration
quality
analysis
security
automation
data pipelines
flows that manage and process large data sets
data integration
ensuring that data from different sources is integrated seamlessly
data quality
making sure the data infrastructure is reliable efficient and of high quality
data analysis
analysing raw data to show trends and provide predictive models
data security
protecting data against loss/theft
automation
automating tasks within the data pipeline which improves efficiency
what is a database
structured systems for storing retrieving and managing data
what is raw data
data kept in an excel file
how is data stored in a database
organised in a structured format using data models
data model
defines how data will be related and stored