Chp.5 Data Generation in Source systems Flashcards
Draw the data engineering lifecycle
A file is a ..?
sequence of bytes stored on a disk
source systems produce..?
data in several ways
Get familiar with your source system and how
it generates data
files may store…?
local parameters , events, logs, images, and audio.
Elaborate
files are the universal medium of …?
data exchange
What are the major file formats you will come across?
excel, csv, txt, json, xml
What are standard ways for exchanging data between systems.
API’s
typically an application database is an …?
online transaction processing system - OLTP
OLTP are referred to as ….?
transactional databases. Why?
OLTP db’s work well as …?
application backends when thousands or even millions or users might be interacting. Why ?
What does ACID stand for?
atomicity, constancy, isolation, and durability
With respect to ACID what does consistency relate to?
Consistency means that any database read will return the last written version of the retrieved item. Why?
what is a atomic transaction?
it is a set of several changes that are committed as a unit. Why?
in the data engineering : fundamentals data application stands for what?
applications that hybridize transactional and analytics workloads. Why?
What does CDC stand for?
Change Data Capture
What is CDC ?
it is method for extracting each change event. (insert, update, delete). Why?
CDC is often used to … ?
replicate between databases in near real time or create an event stream for downstream processing. Why?
What does OLAP stand for?
online analytical processing system. Why ?
What is the difference between an OLAP and OLTP?
OLAP is for doing large scale analytics and OLTP is for doing large scale reads and write of individual records. Why?
Typically OLAP are …?
inefficient in handing look ups of individual records. Why?
A log captures ….?
information about events that occur in systems.
A log captures ….?
information about events that occur in systems. Why?
Logs are a …?
rich data source, potentially valuable for downstream data analysis.
What are three common ways logs are encoded?
- Binary-encoded logs
- Semistructured logs
- Plain-text (unstructured ) logs
Relational DB’s often store…?
event log stored directly on the database server that can be processed to create a stream.
All logs track …>?
events and metadata
What is log resolution?
it referred to the amount of event data stored/captured in the log.
At a minimum a log should capture…?
who , what, and when.
describe binary-encoded logs:
These logs encode data in a custom compact format for space efficiency and fast I/O. Why?
Tables are typically indexed by a ..?
primary key
what is a primary key?
a unique field for each row of the table
What does RDBMS stand for?
relational database management system
what is the most common db for application backends?
relational database management system