Big Data Lecture 02 Lessons Learnt Flashcards by Mabel Wylie

Explain data independence

Logical model (interface) of the data (queries, and displaying) is independent from the physical storage (can be swapped).

How well did you know this?

Not at all

Perfectly

What 4 pieces constitute the architecture of data storage?

<ul><li>Language (how you query),</li><li>model (representation, driver of independence),</li><li>compute (execution of computation),</li><li>storage (physical hardware).</li></ul>

How well did you know this?

Not at all

Perfectly

What does the data model describe? (2)

<ul><li>What the data looks like,</li><li>what you can do with it (manipulation primitives).</li></ul>

How well did you know this?

Not at all

Perfectly

What is a table?

Collection of rows with different attributes.

How well did you know this?

Not at all

Perfectly

What is a row?

One record in the table.

How well did you know this?

Not at all

Perfectly

What is an attribute?

One column in a table.

How well did you know this?

Not at all

Perfectly

What is a primary key?

Unique key that identifies the record in a table.

How well did you know this?

Not at all

Perfectly

What is a value?

One input in a row and a column of a table.

How well did you know this?

Not at all

Perfectly

What is relational algebra?

Algebra to express operations on a table.

How well did you know this?

Not at all

Perfectly

Relation table expressed formally in relational algebra? What are its two components?

Each attribute has its domain, the relation is a subset of cross product of these domains, tuples of which we now put into a table. Components: 1. set of attributes (schema), 2. set/bag/list of tuples.

How well did you know this?

Not at all

Perfectly

Explain: set, list, and bag.

<ul><li>Set: unordered collection without duplicates,</li><li>list: ordered collection, can have duplicates,</li><li>bag: unordered collection, with duplicates.</li></ul>

How well did you know this?

Not at all

Perfectly

How can tuple be seen as a function?

It assigns to each attribute of a table a value.

How well did you know this?

Not at all

Perfectly

What is relational integrity?

All the attributes must have a correct reference, meaning that the keys point to valid records in other tables.

Edit: I dont think this is true. The real answer should be that:
All records must have identical support. Eg there cannot be missing values.

How well did you know this?

Not at all

Perfectly

What is atomic integrity?

There are no tables in a table, every value is atomic.

How well did you know this?

Not at all

Perfectly

When is table 1st normal form?

Table must follow atomic integrity.

How well did you know this?

Not at all

Perfectly

What is domain integrity?

Study These Flashcards

All the values must come from the same type, i.e. all are bools, or strings.

What is NoSQL?

Study These Flashcards

When we break all the given constraints, we get outside, that is what we study in Big Data!

What is selection?

Study These Flashcards

Selecting rows of a table.

What is projection?

Study These Flashcards

Selecting columns of a table.

What is grouping?

Study These Flashcards

Merging values of one table on the same attribute or condition.

What is sorting?

Study These Flashcards

Sorting a table based on some order.

What is Cartesian product?

Study These Flashcards

Taking product (each with each) of two tables.

What is join?

Study These Flashcards

Merging two tables on a common attribute.

What are anomalies?

Study These Flashcards

If some data is duplicated, but not properly linked, it might happen that on update/delete/insert there is anomaly.

What is functional dependency?

When one attribute depends on another in a table, can be seen as fully function of the other.

What is superkey?

Value or set of values, such that the values in the row depend on it.

What is candidate key?

Candidate key is any minimal superkey, one of which we can pick to be the primary key.

Define non-prime attribute.

Attribute that is not in any candidate key.

When is table in 2nd normal form?

Proper subset of a candidate key cannot determine a non-prime attribute.

When is table in a 3rd normal form?

Non-superkey cannot determine a non-prime attribute.

When is table in 3.5th (Boyce-Codd) Normal Form?

Non-superkey cannot determine anything else.

What is data denormalization?

Putting data from more advanced normal forms to the 1st normal form, to allow easy parallel querying.

What is the difference between proto-imperative and functional/declarative language?

In proto-imperative language, we have to define everything explicitly, however, in functional/declarative language we just say what we want to happen and it happens.

What kind of language is SQL?

Declarative, we declare what we want and it happens.

Functional, we can nest it like math.

Give all SQL clauses and explain them.

SELECT column_name1, column_name2
FROM table_name
WHERE condition
GROUP BY attribute
HAVING condition
ORDER BY attribute and direction
LIMIT number_to_display
OFFSET number_to_skip

What set operations can we use in SQL?

All queries are sets, so we can UNION, UNION ALL (with duplicates), MINUS or INTERSECT them.

What is the difference of theta join, full outer join, right and left join, and natural join?

Theta join only matches on a selected attribute or condition,
right and left joins, join one table onto the other on matching records and fill in the rest using NULL,
full outer join does both right and left join,
natural join joins on matching attribute names.

Explain ACID and all its characteristics.

Good old day of databases gave us a lot of guarantees on transactions, this is not the case in Big Data anymore:

atomicity: either everything or nothing is executed,
consistency: everytime you update, all the data will be consistent,
isolation: more people are using the database, but if feels like you are the only one,
durability: updates that are carried out are persistent.

How can Big Data become big? What do we have more of?

We can have a lot of

rows,
columns,
nesting.

Big Data Lecture 02 Lessons Learnt Flashcards

(39 cards)