Big Data Lecture 02 Lessons Learnt Flashcards
Explain data independence
Logical model (interface) of the data (queries, and displaying) is independent from the physical storage (can be swapped).
What 4 pieces constitute the architecture of data storage?
<ul><li>Language (how you query),</li><li>model (representation, driver of independence),</li><li>compute (execution of computation),</li><li>storage (physical hardware).</li></ul>
What does the data model describe? (2)
<ul><li>What the data looks like,</li><li>what you can do with it (manipulation primitives).</li></ul>
What is a table?
Collection of rows with different attributes.
What is a row?
One record in the table.
What is an attribute?
One column in a table.
What is a primary key?
Unique key that identifies the record in a table.
What is a value?
One input in a row and a column of a table.
What is relational algebra?
Algebra to express operations on a table.
Relation table expressed formally in relational algebra?<br></br><br></br>What are its two components?
Each attribute has its domain, the relation is a subset of cross product of these domains, tuples of which we now put into a table.<br></br><br></br>Components:<br></br>1. set of attributes (schema),<br></br>2. set/bag/list of tuples.
Explain: set, list, and bag.
<ul><li>Set: unordered collection without duplicates,</li><li>list: ordered collection, can have duplicates,</li><li>bag: unordered collection, with duplicates.</li></ul>
How can tuple be seen as a function?
It assigns to each attribute of a table a value.
What is relational integrity?
All the attributes must have a correct reference, meaning that the keys point to valid records in other tables.
Edit: I dont think this is true. The real answer should be that:
All records must have identical support. Eg there cannot be missing values.
What is atomic integrity?
There are no tables in a table, every value is atomic.
When is table 1st normal form?
Table must follow atomic integrity.
What is domain integrity?
All the values must come from the same type, i.e. all are bools, or strings.
What is NoSQL?
When we break all the given constraints, we get outside, that is what we study in Big Data!
What is selection?
Selecting rows of a table.
What is projection?
Selecting columns of a table.
What is grouping?
Merging values of one table on the same attribute or condition.
What is sorting?
Sorting a table based on some order.
What is Cartesian product?
Taking product (each with each) of two tables.
What is join?
Merging two tables on a common attribute.
What are anomalies?
If some data is duplicated, but not properly linked, it might happen that on update/delete/insert there is anomaly.
What is functional dependency?
When one attribute depends on another in a table, can be seen as fully function of the other.
What is superkey?
Value or set of values, such that the values in the row depend on it.
What is candidate key?
Candidate key is any minimal superkey, one of which we can pick to be the primary key.
Define non-prime attribute.
Attribute that is not in any candidate key.
When is table in 2nd normal form?
Proper subset of a candidate key cannot determine a non-prime attribute.
When is table in a 3rd normal form?
Non-superkey cannot determine a non-prime attribute.
When is table in 3.5th (Boyce-Codd) Normal Form?
Non-superkey cannot determine anything else.
What is data denormalization?
Putting data from more advanced normal forms to the 1st normal form, to allow easy parallel querying.
What is the difference between proto-imperative and functional/declarative language?
In proto-imperative language, we have to define everything explicitly, however, in functional/declarative language we just say what we want to happen and it happens.
What kind of language is SQL?
Declarative, we declare what we want and it happens.<br></br><br></br>Functional, we can nest it like math.
Give all SQL clauses and explain them.
SELECT <i>column_name1, column_name2</i><br></br>FROM <i>table_name</i><br></br>WHERE <i>condition</i><br></br>GROUP BY <i>attribute</i><br></br>HAVING <i>condition</i><br></br>ORDER BY <i>attribute and direction</i><br></br>LIMIT <i>number_to_display</i><br></br>OFFSET <i>number_to_skip</i>
What set operations can we use in SQL?
All queries are sets, so we can UNION, UNION ALL (with duplicates), MINUS or INTERSECT them.
What is the difference of theta join, full outer join, right and left join, and natural join?
<ul><li>Theta join only matches on a selected attribute or condition,</li><li>right and left joins, join one table onto the other on matching records and fill in the rest using NULL,</li><li>full outer join does both right and left join,</li><li>natural join joins on matching attribute names.</li></ul>
Explain ACID and all its characteristics.
Good old day of databases gave us a lot of guarantees on transactions, this is not the case in Big Data anymore:<br></br><ul><li>atomicity: either everything or nothing is executed,</li><li>consistency: everytime you update, all the data will be consistent,</li><li>isolation: more people are using the database, but if feels like you are the only one,</li><li>durability: updates that are carried out are persistent.</li></ul>
How can Big Data become big? What do we have more of?
<div>We can have a lot of</div>
<div><ul><li>rows,</li><li>columns,</li><li>nesting.</li></ul></div>