Data model Flashcards
Data Model is most important
How to model data (as Data structures/objects) is important
Each “layer” component may model same data differently
each layer hides the complexity of the
layers below it by providing a clean data model
For “address”, at high level model as class/object Address, at DB level model as Table “Address” at HTTP layer model as “JSON/XML” etc
Data Model reflects the problem and its solution
Type of Data Models
(SQL) Relational Data Model : good for many to one (upside-down tree)
(noSQL) Document Data Model: good for one to many (tree)
(noSQL) Graph Data Model: good for many to many (DAG)
(noSQL) Full-Text Search Mode: good for full text search (log files)
- SQL databases target use cases where there is few one to many relationship (mostly many to one) –> SQL
- Document databases target use cases where data comes in self-contained documents and relationships between one document and another are rare. (I.e. such as log files) –> SQL, MapReduce Query
- Graph databases go in the opposite direction, targeting use cases where anything is potentially related to everything. (social network) —> Graph QL
- Full-Text search –> Indexing
NoSQL
Not Only SQL
polyglot persistence (similar to Polygot Programming)
Adoption of multiple persistence data models (SQL, noSQL etc) in an application.
Why NoSQL (Document/JSON model) --Better for One-to-Many (tree structure)
1 person has many job categories (1 category with 1 category ID)
For a complicated object, the “Normalized” SQL split into multiple tables, using Foreign ID to connect the tables (relationships).
To reconstruct the object, a “denormalization” (joining by foreign Ids) need to happen.
If save a object as JSON text inside a column, then the SQL cannot be used to query the content inside JSON, leaving the job to application.
Thus we need NoSQL, that stores data as JSON data model and can use SQL to query
Document Database (NoSQL) vs SQL tables (SQL)
SQL supports many-to-one, many-to-many (Normalization, JOIN) well, but do not support one-to-many well -> shifting the query of values inside JSON to application
NoSQL supports one-to-many natively (JSON tree) but does not support JOIN (many-to-one) well –> shifting the JOIN operation to user
***PRO
JSON object has better locality. All information about 1 person is in the same row/document.
JSON Document implies one-to-many tree structure
SQL tables: information of 1 person is scattered among multiple tables, needs a JOIN operation
***CON
Normalization avoids duplication, but implies many-to-one relationship.
A JSON document with normalization will have both
One-To-Many and Many-To-One relationships
What is Normalization in Database( SQL and NoSQL)
- avoid duplication
- single place change
Normalization in Database is avoid duplication of common constant values:
similar to C/C++ to do: #define REGION_ID_192 "Philadelphia" #define REGION_ID_190 "Seattle"
in C where we can
1) avoid duplication
2) change at 1 place will change all references to
e.e.
#define REGION_ID_190 “Greater Seattle”
will change all ppl referencing 190 to from “Greater Seattle” area.
To Normalize or not to Normalize?
To #define or not #define
Normalization implies Many-to-One
Many ppl reference REGION_ID_192 because many ppl are from Philadelphia
A JOIN-free document has no reference to entity IDs (no use of #define values)
A JOIN-free document today may not be JOIN-free tomorrow, if new feature added forcing normalization (using of entity IDs)
Since NoSQL weak at JOIN forced by Normalization, the debate of NoSQL vs SQL is also a debate of Normalization vs No Normalization
Normalization is An age old debate
Data Locality: Document Model vs Relational Model
Document Model: All data in one (json) document ==> In one storage sector ==> Locality in disk, in memory etc
Relational Model: Data is “shredded” into multiple tables. Need to JOIN to get the full data.
==> Multiple tables ==> Multiple storage sectors ==> Not necessarily in close vicinity in disk or memory
Data Locality: Document Model vs Relational Model
Document Model: All data in one (json) document ==> In one storage sector ==> Locality in disk, in memory etc ==> less disk seek, page swap etc
Relational Model: Data is “shredded” into multiple tables. Need to JOIN to get the full data.
==> Multiple tables ==> Multiple storage sectors ==> Not necessarily in close vicinity in disk or memory ==> more disk seek, page swap etc
Document Model Locality Cons:
1) Read Whole use Part –> a waste
2) In-place update may require relocation of the document (when it exceeds sizes)
some SQL also support locality