Big Data Lecture 02 Lessons Learnt Flashcards

1
Q

Explain data independence

A

Logical model (interface) of the data (queries, and displaying) is independent from the physical storage (can be swapped).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What 4 pieces constitute the architecture of data storage?

A

<ul><li>Language (how you query),</li><li>model (representation, driver of independence),</li><li>compute (execution of computation),</li><li>storage (physical hardware).</li></ul>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the data model describe? (2)

A

<ul><li>What the data looks like,</li><li>what you can do with it (manipulation primitives).</li></ul>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a table?

A

Collection of rows with different attributes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a row? 

A

One record in the table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an attribute?

A

One column in a table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a primary key?

A

Unique key that identifies the record in a table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a value?

A

One input in a row and a column of a table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is relational algebra?

A

Algebra to express operations on a table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Relation table expressed formally in relational algebra?<br></br><br></br>What are its two components?

A

Each attribute has its domain, the relation is a subset of cross product of these domains, tuples of which we now put into a table.<br></br><br></br>Components:<br></br>1. set of attributes (schema),<br></br>2. set/bag/list of tuples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain: set, list, and bag.

A

<ul><li>Set: unordered collection without duplicates,</li><li>list: ordered collection, can have duplicates,</li><li>bag: unordered collection, with duplicates.</li></ul>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can tuple be seen as a function?

A

It assigns to each attribute of a table a value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is relational integrity?

A

All the attributes must have a correct reference, meaning that the keys point to valid records in other tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is atomic integrity?

A

There are no tables in a table, every value is atomic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When is table 1st normal form?

A

Table must follow atomic integrity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is domain integrity?

A

All the values must come from the same type, i.e. all are bools, or strings.

17
Q

What is NoSQL?

A

When we break all the given constraints, we get outside, that is what we study in Big Data!

18
Q

What is selection?

A

Selecting rows of a table.

19
Q

What is projection?

A

Selecting columns of a table.

20
Q

What is grouping?

A

Merging values of one table on the same attribute or condition.

21
Q

What is sorting?

A

Sorting a table based on some order.

22
Q

What is Cartesian product?

A

Taking product (each with each) of two tables.

23
Q

What is join?

A

Merging two tables on a common attribute.

24
Q

What are anomalies?

A

If some data is duplicated, but not properly linked, it might happen that on update/delete/insert there is anomaly.

25
Q

What is functional dependency?

A

When one attribute depends on another in a table, can be seen as fully function of the other.

26
Q

What is superkey?

A

Value or set of values, such that the values in the row depend on it.

27
Q

What is candidate key?

A

Candidate key is any minimal superkey, one of which we can pick to be the primary key.

28
Q

Define non-prime attribute.

A

Attribute that is not in any candidate key.

29
Q

When is table in 2nd normal form?

A

Proper subset of a candidate key cannot determine a non-prime attribute.

30
Q

When is table in a 3rd normal form?

A

Non-superkey cannot determine a non-prime attribute.

31
Q

When is table in 3.5th (Boyce-Codd) Normal Form?

A

Non-superkey cannot determine anything else.

32
Q

What is data denormalization?

A

Putting data from more advanced normal forms to the 1st normal form, to allow easy parallel querying.

33
Q

What is the difference between proto-imperative and functional/declarative language?

A

In proto-imperative language, we have to define everything explicitly, however, in functional/declarative language we just say what we want to happen and it happens.

34
Q

What kind of language is SQL?

A

Declarative, we declare what we want and it happens.<br></br><br></br>Functional, we can nest it like math.

35
Q

Give all SQL clauses and explain them.

A

SELECT <i>column_name1, column_name2</i><br></br>FROM <i>table_name</i><br></br>WHERE <i>condition</i><br></br>GROUP BY <i>attribute</i><br></br>HAVING <i>condition</i><br></br>ORDER BY <i>attribute and direction</i><br></br>LIMIT <i>number_to_display</i><br></br>OFFSET <i>number_to_skip</i>

36
Q

What set operations can we use in SQL?

A

All queries are sets, so we can UNION, UNION ALL (with duplicates), MINUS or INTERSECT them.

37
Q

What is the difference of theta join, full outer join, right and left join, and natural join?

A

<ul><li>Theta join only matches on a selected attribute or condition,</li><li>right and left joins, join one table onto the other on matching records and fill in the rest using NULL,</li><li>full outer join does both right and left join,</li><li>natural join joins on matching attribute names.</li></ul>

38
Q

Explain ACID and all its characteristics.

A

Good old day of databases gave us a lot of guarantees on transactions, this is not the case in Big Data anymore:<br></br><ul><li>atomicity: either everything or nothing is executed,</li><li>consistency: everytime you update, all the data will be consistent,</li><li>isolation: more people are using the database, but if feels like you are the only one,</li><li>durability: updates that are carried out are persistent.</li></ul>

39
Q

How can Big Data become big? What do we have more of?

A

<div>We can have a lot of</div>

<div><ul><li>rows,</li><li>columns,</li><li>nesting.</li></ul></div>