Chapters 1&2 Knowledge Testers Flashcards

1
Q

Edgar Codd?

A

Introduced Data Independence -> revolutionized data storage
did work on relational algebra

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data Independence

A

Seperation of physical and logical representation of data<br></br>Make physical simple and clear for human understanding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data Shapes

A

trees, cubes, tables, vectors (text), graphs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data Model

A

What data looks like and what you can do with it
How much data? What shape? How data is organized?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Table Synonyms

A

Collection, Relation, Relational Table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Row Synonyms

A

Business Object, Item, Entity, Document, Record, Tuple

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Attribute Synonyms

A

Column, Field, Property, Key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Primary Key Synonyms

A

Row ID, Name, Key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Value Synonyms

A

Scalar, Cell, Characteristic, Fact

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Relational Tables have

A

set of attributes (schema) and set/bag/list of tuples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Atomic Integrity

A

All values are atomic (string, number), NOT ARRAY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Relational Integrity

A

all its records have identical
support. All elements have all attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Sketch the history of databases (ancient and
modern) to a colleague in a few minutes?

A

DNA - first data storage
Brain - First human controlled data storage
Humans told stories->ISSUE: not reliable, story changes over time
Writing - clay tablets - tables -> ISSUE: how to make copies??
Printing Press-> easily make copies and mass produce/distribute
Computers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Difference between data, information and knowledge

A

Data -> numbers<br></br>Information -> Meaning from data, processed data<br></br>Knowledge -> meaning from information, interpreting information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can structured data can be characterized?

A

Order and organization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Do you know the standard prefixes of the International System of
Units (when the exponent in base 10 is a positive multiple of 3)?

A

Karl Marx gave the proletariat eleven zeppelins, yo
Kilo, Mega, Giga, Tera, Peta, Exa, Zeta, Yotta, Ronna, Quetta

17
Q

4 technologies commonly referred to as
NoSQL

A

key-value, document, column family, graph

18
Q

3 Vs

A

Volume - Amount of data
Velocity - Capacity, latency, throughput
Variety - Shapes

19
Q

Define capacity, throughput and latency with units

A

Capacity: how much data per volume (bytes)
Latency: Wait time to read data (miliseconds)
Throughput: Data read per time (byte/sec)(Not sure what is standard units)

20
Q

Can you explain why and how the evolution of capacity, throughput
and latency over the last few decades has influenced the design of
modern database systems?

A

Capacity expanded a lot more than the other 2. Need to use parallelization and batch processing to improve latency and throughput (scale out)

21
Q

Scale out vs Scale up

A

scale out - more machines
scale up - more powerful machines

22
Q

Name a few big players in the industry that accumulate and
analyze massive amounts of data?

23
Q

bit vs byte

A

a bit is 0 or 1, a byte is a collection of 8 bits

24
Q

Name a few concrete examples that illustrate the various
orders of magnitude of amounts of data?

A

Files Kb, Movies Gb

25
Q

Why is it important to consider whether a use case is read-intensive, or write-intensive, or in-between?

A

Guess:Read intensive -> benefit from denormalized data to reduce query complexity and latency
Write intensive -> benefit from normalized data to avoid redundancy (redundant writes) and maintain data integrity

26
Q

Why normal forms are important?

A

Can prevent deletion, insertion and update errors

27
Q

first normal form in simple terms?

A

Atomic integrity -> all values are atomic (simple) -> no nesting

28
Q

Describe in simple terms how higher normal forms (like Boyce-Codd) are
related to joins?

A

NF are like opposite of joins - seperate big table into small tables

29
Q

Why is it common, for large amounts of data, to
drop several levels of normal form, and denormalize data instead?

A

GUESSING: preventing expensive joins, simpler queries

30
Q

Declarative language

A

User specifies what they want - not how to compute it
- up to system to figure out how to execute the query

31
Q

Functional language

A

Nesting - expressions can nest in each other
Queries are like lego - building blocks - you can change order and such
But changing order can change outcome

32
Q

Why design query languages that are
declarative and functional?

A

GUESS:
declarative: focus on what rather than how (easy for users, leave how to machine)
functional: modularity, can move around pieces

33
Q

Describe the major relational algebra operators: select,
project, aggregate, sort, Cartesian product, join?

A

Select -> Choose rows
Project -> Choose columns
Aggregate -> Combine - group by cols -> aggregate other columns
Sort -> order by a specified column value
Cartesian Product -> multiply tables (all rows)
Join -> multiply tables based on certain matching values eg A=B

34
Q

The names of the basic components of the tabular shape at an abstract level (table, row, column, primary key) as well
as the names of the most common corresponding counterparts in the
NoSQL world?

A

Relation/Collection, Record, Attribute/Field, Id

35
Q

ACID

A

Atomicity, Consistency, Isolated, Dependable
A - either an update (called a transaction if it consists of several updates) is applied to the database completely, or not at all;
C - before and after the transactions, the data is in a consistent state (e.g., some values sum to another value, another value is positive, etc);
I - the system “feels like” the user is the only one using the system, where in fact maybe thousands of people are using it as well concurrently;
D - any data written to the database is durably stored and will not be lost (e.g., if there is an electricity shortage or a disk crash).

36
Q

Describe the following SQL terms:
SELECT, FROM, WHERE, GROUP BY, HAVING, JOIN, ORDER BY, LIMIT, OFFSET

A

Select - projection - selecting columns
from - table
where - selection of certain rows
group by - aggregate information
having - selection
join - combine 2 tables
order by - sorting
limit - number of rows to display eg. 10 rows
offset - start from nth row