Chp.5 Data Generation in Source systems Flashcards

1
Q

Draw the data engineering lifecycle

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A file is a ..?

A

sequence of bytes stored on a disk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

source systems produce..?

A

data in several ways

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Get familiar with your source system and how

A

it generates data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

files may store…?

A

local parameters , events, logs, images, and audio.

Elaborate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

files are the universal medium of …?

A

data exchange

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the major file formats you will come across?

A

excel, csv, txt, json, xml

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are standard ways for exchanging data between systems.

A

API’s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

typically an application database is an …?

A

online transaction processing system - OLTP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

OLTP are referred to as ….?

A

transactional databases. Why?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

OLTP db’s work well as …?

A

application backends when thousands or even millions or users might be interacting. Why ?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does ACID stand for?

A

atomicity, constancy, isolation, and durability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

With respect to ACID what does consistency relate to?

A

Consistency means that any database read will return the last written version of the retrieved item. Why?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is a atomic transaction?

A

it is a set of several changes that are committed as a unit. Why?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

in the data engineering : fundamentals data application stands for what?

A

applications that hybridize transactional and analytics workloads. Why?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does CDC stand for?

A

Change Data Capture

17
Q

What is CDC ?

A

it is method for extracting each change event. (insert, update, delete). Why?

18
Q

CDC is often used to … ?

A

replicate between databases in near real time or create an event stream for downstream processing. Why?

19
Q

What does OLAP stand for?

A

online analytical processing system. Why ?

20
Q

What is the difference between an OLAP and OLTP?

A

OLAP is for doing large scale analytics and OLTP is for doing large scale reads and write of individual records. Why?

21
Q

Typically OLAP are …?

A

inefficient in handing look ups of individual records. Why?

22
Q

A log captures ….?

A

information about events that occur in systems.

23
Q

A log captures ….?

A

information about events that occur in systems. Why?

24
Q

Logs are a …?

A

rich data source, potentially valuable for downstream data analysis.

25
What are three common ways logs are encoded?
1. Binary-encoded logs 2. Semistructured logs 3. Plain-text (unstructured ) logs
26
Relational DB's often store...?
event log stored directly on the database server that can be processed to create a stream.
27
All logs track ...>?
events and metadata
28
What is log resolution?
it referred to the amount of event data stored/captured in the log.
29
At a minimum a log should capture...?
who , what, and when.
30
describe binary-encoded logs:
These logs encode data in a custom compact format for space efficiency and fast I/O. Why?
31
Tables are typically indexed by a ..?
primary key
32
what is a primary key?
a unique field for each row of the table
33
What does RDBMS stand for?
relational database management system
34
what is the most common db for application backends?
relational database management system