Chapters 1&2 Knowledge Testers Flashcards
Edgar Codd?
Introduced Data Independence -> revolutionized data storage
did work on relational algebra
Data Independence
Seperation of physical and logical representation of data<br></br>Make physical simple and clear for human understanding
Data Shapes
trees, cubes, tables, vectors (text), graphs
Data Model
What data looks like and what you can do with it
How much data? What shape? How data is organized?
Table Synonyms
Collection, Relation, Relational Table
Row Synonyms
Business Object, Item, Entity, Document, Record, Tuple
Attribute Synonyms
Column, Field, Property, Key
Primary Key Synonyms
Row ID, Name, Key
Value Synonyms
Scalar, Cell, Characteristic, Fact
Relational Tables have
set of attributes (schema) and set/bag/list of tuples
Atomic Integrity
All values are atomic (string, number), NOT ARRAY
Relational Integrity
all its records have identical
support. All elements have all attributes
Sketch the history of databases (ancient and
modern) to a colleague in a few minutes?
DNA - first data storage
Brain - First human controlled data storage
Humans told stories->ISSUE: not reliable, story changes over time
Writing - clay tablets - tables -> ISSUE: how to make copies??
Printing Press-> easily make copies and mass produce/distribute
Computers
Difference between data, information and knowledge
Data -> numbers<br></br>Information -> Meaning from data, processed data<br></br>Knowledge -> meaning from information, interpreting information
How can structured data can be characterized?
Order and organization
Do you know the standard prefixes of the International System of
Units (when the exponent in base 10 is a positive multiple of 3)?
Karl Marx gave the proletariat eleven zeppelins, yo
Kilo, Mega, Giga, Tera, Peta, Exa, Zeta, Yotta, Ronna, Quetta
4 technologies commonly referred to as
NoSQL
key-value, document, column family, graph
3 Vs
Volume - Amount of data
Velocity - Capacity, latency, throughput
Variety - Shapes
Define capacity, throughput and latency with units
Capacity: how much data per volume (bytes)
Latency: Wait time to read data (miliseconds)
Throughput: Data read per time (byte/sec)(Not sure what is standard units)
Can you explain why and how the evolution of capacity, throughput
and latency over the last few decades has influenced the design of
modern database systems?
Capacity expanded a lot more than the other 2. Need to use parallelization and batch processing to improve latency and throughput (scale out)
Scale out vs Scale up
scale out - more machines
scale up - more powerful machines
Name a few big players in the industry that accumulate and
analyze massive amounts of data?
S3, Azure
bit vs byte
a bit is 0 or 1, a byte is a collection of 8 bits
Name a few concrete examples that illustrate the various
orders of magnitude of amounts of data?
Files Kb, Movies Gb
Why is it important to consider whether a use case is read-intensive, or write-intensive, or in-between?
Guess:Read intensive -> benefit from denormalized data to reduce query complexity and latency
Write intensive -> benefit from normalized data to avoid redundancy (redundant writes) and maintain data integrity
Why normal forms are important?
Can prevent deletion, insertion and update errors
first normal form in simple terms?
Atomic integrity -> all values are atomic (simple) -> no nesting
Describe in simple terms how higher normal forms (like Boyce-Codd) are
related to joins?
NF are like opposite of joins - seperate big table into small tables
Why is it common, for large amounts of data, to
drop several levels of normal form, and denormalize data instead?
GUESSING: preventing expensive joins, simpler queries
Declarative language
User specifies what they want - not how to compute it
- up to system to figure out how to execute the query
Functional language
Nesting - expressions can nest in each other
Queries are like lego - building blocks - you can change order and such
But changing order can change outcome
Why design query languages that are
declarative and functional?
GUESS:
declarative: focus on what rather than how (easy for users, leave how to machine)
functional: modularity, can move around pieces
Describe the major relational algebra operators: select,
project, aggregate, sort, Cartesian product, join?
Select -> Choose rows
Project -> Choose columns
Aggregate -> Combine - group by cols -> aggregate other columns
Sort -> order by a specified column value
Cartesian Product -> multiply tables (all rows)
Join -> multiply tables based on certain matching values eg A=B
The names of the basic components of the tabular shape at an abstract level (table, row, column, primary key) as well
as the names of the most common corresponding counterparts in the
NoSQL world?
Relation/Collection, Record, Attribute/Field, Id
ACID
Atomicity, Consistency, Isolated, Dependable
A - either an update (called a transaction if it consists of several updates) is applied to the database completely, or not at all;
C - before and after the transactions, the data is in a consistent state (e.g., some values sum to another value, another value is positive, etc);
I - the system “feels like” the user is the only one using the system, where in fact maybe thousands of people are using it as well concurrently;
D - any data written to the database is durably stored and will not be lost (e.g., if there is an electricity shortage or a disk crash).
Describe the following SQL terms:
SELECT, FROM, WHERE, GROUP BY, HAVING, JOIN, ORDER BY, LIMIT, OFFSET
Select - projection - selecting columns
from - table
where - selection of certain rows
group by - aggregate information
having - selection
join - combine 2 tables
order by - sorting
limit - number of rows to display eg. 10 rows
offset - start from nth row