Data model Flashcards

1
Q

Data Model is most important

A

How to model data (as Data structures/objects) is important

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Each “layer” component may model same data differently

each layer hides the complexity of the
layers below it by providing a clean data model

A

For “address”, at high level model as class/object Address, at DB level model as Table “Address” at HTTP layer model as “JSON/XML” etc

Data Model reflects the problem and its solution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Type of Data Models

A

(SQL) Relational Data Model : good for many to one (upside-down tree)
(noSQL) Document Data Model: good for one to many (tree)
(noSQL) Graph Data Model: good for many to many (DAG)
(noSQL) Full-Text Search Mode: good for full text search (log files)

  1. SQL databases target use cases where there is few one to many relationship (mostly many to one) –> SQL
  2. Document databases target use cases where data comes in self-contained documents and relationships between one document and another are rare. (I.e. such as log files) –> SQL, MapReduce Query
  3. Graph databases go in the opposite direction, targeting use cases where anything is potentially related to everything. (social network) —> Graph QL
  4. Full-Text search –> Indexing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

NoSQL

A

Not Only SQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

polyglot persistence (similar to Polygot Programming)

A

Adoption of multiple persistence data models (SQL, noSQL etc) in an application.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
Why NoSQL (Document/JSON model)
--Better for One-to-Many (tree structure)

1 person has many job categories (1 category with 1 category ID)

A

For a complicated object, the “Normalized” SQL split into multiple tables, using Foreign ID to connect the tables (relationships).

To reconstruct the object, a “denormalization” (joining by foreign Ids) need to happen.

If save a object as JSON text inside a column, then the SQL cannot be used to query the content inside JSON, leaving the job to application.

Thus we need NoSQL, that stores data as JSON data model and can use SQL to query

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
Document Database (NoSQL)
 vs
SQL tables (SQL)

SQL supports many-to-one, many-to-many (Normalization, JOIN) well, but do not support one-to-many well -> shifting the query of values inside JSON to application

NoSQL supports one-to-many natively (JSON tree) but does not support JOIN (many-to-one) well –> shifting the JOIN operation to user

A

***PRO
JSON object has better locality. All information about 1 person is in the same row/document.

JSON Document implies one-to-many tree structure

SQL tables: information of 1 person is scattered among multiple tables, needs a JOIN operation

***CON
Normalization avoids duplication, but implies many-to-one relationship.

A JSON document with normalization will have both
One-To-Many and Many-To-One relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Normalization in Database( SQL and NoSQL)

    • avoid duplication
    • single place change
A

Normalization in Database is avoid duplication of common constant values:

similar to C/C++ to do:
#define REGION_ID_192 "Philadelphia"
#define REGION_ID_190 "Seattle"

in C where we can
1) avoid duplication
2) change at 1 place will change all references to
e.e.
#define REGION_ID_190 “Greater Seattle”
will change all ppl referencing 190 to from “Greater Seattle” area.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

To Normalize or not to Normalize?

A

To #define or not #define

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Normalization implies Many-to-One

A

Many ppl reference REGION_ID_192 because many ppl are from Philadelphia

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A JOIN-free document has no reference to entity IDs (no use of #define values)

A

A JOIN-free document today may not be JOIN-free tomorrow, if new feature added forcing normalization (using of entity IDs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Since NoSQL weak at JOIN forced by Normalization, the debate of NoSQL vs SQL is also a debate of Normalization vs No Normalization

A

Normalization is An age old debate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Data Locality: Document Model vs Relational Model

A

Document Model: All data in one (json) document ==> In one storage sector ==> Locality in disk, in memory etc

Relational Model: Data is “shredded” into multiple tables. Need to JOIN to get the full data.
==> Multiple tables ==> Multiple storage sectors ==> Not necessarily in close vicinity in disk or memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Data Locality: Document Model vs Relational Model

A

Document Model: All data in one (json) document ==> In one storage sector ==> Locality in disk, in memory etc ==> less disk seek, page swap etc

Relational Model: Data is “shredded” into multiple tables. Need to JOIN to get the full data.
==> Multiple tables ==> Multiple storage sectors ==> Not necessarily in close vicinity in disk or memory ==> more disk seek, page swap etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Document Model Locality Cons:

1) Read Whole use Part –> a waste
2) In-place update may require relocation of the document (when it exceeds sizes)

A

some SQL also support locality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

SQL is a declarative query language (vs imperative language like C/Java/Python etc)

SELECT (cols) FROM (table) WHERE (cond) {TRANSFORM}

e.g
select * from * where a=b SORT BY a ASEC

Another declaration language is CSS
# highlight background of blue of any paragraph that is direct child of a <li> element whose class is set to "selected" like </li><li class="selected">
li.selected > p {
  background-color: blue;
}</li>
A

An imperative language tells the computer to perform certain operations in a certain order (exactly as the order of the statements)

In a declarative query language, like SQL or relational algebra, you just specify WHAT not HOW

I.e. specify the pattern of the data you want—what conditions the results must meet, and how you
want the data to be transformed (e.g., sorted, grouped, and aggregated)—but not how to achieve that goal.

The SQL Optimizer will decide HOW, which allows PARALLEL execution (mapreduce for example)

17
Q

SQL is a declarative query language (vs imperative language like C/Java/Python etc)

SELECT (cols) FROM (table) WHERE (cond) {TRANSFORM}

ORDER of execution:
WHERE–> TRANSFORM–>SELECT

e.g
select * from * where a=b SORT BY a ASEC

Another declaration language is CSS
# highlight background of blue of any paragraph that is direct child of a <li> element whose class is set to "selected" like </li><li class="selected">
li.selected > p {
  background-color: blue;
}</li>
A

An imperative language tells the computer to perform certain operations in a certain order (exactly as the order of the statements)

In a declarative query language, like SQL or relational algebra, you just specify WHAT not HOW

I.e. specify the pattern of the data you want—what conditions the results must meet, and how you
want the data to be transformed (e.g., sorted, grouped, and aggregated)—but not how to achieve that goal.

The SQL Optimizer will decide HOW, which allows PARALLEL execution (mapreduce for example)

18
Q

SQL engine can be implemented on top of MapReduce (for parallel execution), Similarly Pandas/Spark etc call all use MapReduce for parallel execution on data processing.

A

MapReduce is a fairly low-level programming model for distributed execution on a cluster of machines. Higher-level query languages like SQL can be implemented as a
pipeline of MapReduce operations

19
Q

ACID-compliant database (regardless SQL or noSQL)

A

(atomicity, consistency, isolation, durability)