Week 3: Big Data Modelling Flashcards
3 Components of a data model
- Structures
- Operations
- Constraint
How is a data model characterized ?
- Structure of the data that it admits
- Operations on that strcutre
- A way to specift constraints
Example of structued data
- First name
- Last name
- DoB
Example of semi-structured data
- First Name
- Last Name
- DOB
- Occupation
- Salary
Example of unstructured data
02023#02023#02023#02023#02023#02023#02023
- Hex dump
NoSQL stands for
- Not Only SQL
4 types of NoSQL database models
- Key / Value
- Document
- Wide column stores
- Graph
4 Advanatages of Key / Value database
- Simplest and most flexible
- Stores a value indexed by a key
- No Schema
- Value is a binary large object which doesnt care about data type
3 Key Value suitable use cases
- Session information
- User profiles and preferences
- Shopping cart data
3 Key Value unsuitable use cases
- Data with relationships
- Operations on sets of data
- Transactions with multiple operations
3 Document-oriented database suitable use cases
- Event logs
- CMS / Blogging
- Analystics
3 Docuement-oriented database unsuitable use cases
- Complex cross-document transactions
- Queries that rely on fixed schema
Wide column stores 3 use cases
- Event logs
- CMS / Blogging
- Analytics
2 Wide column stores unsuitable use cases
- Complex cross-document transactions
- Queries that rely on fixed schema
Graph database consists of what 2 elements
- Node - an entity
- Edge - a relationship
3 Graph database suitable use cases
- Social networks
- Recommender systems
- Routing and location-base services
Dont use graph databases for what use case
Operations on sets of data
JSON is built on what 2 structures
- Collection of name/value pairs
- An ordered list of values
What is a JSON object
an unordered set of name/value pairs
What is an JSON array
an ordered collection of values
Describe MongoDB Model
- Database
- Collections
- Documents
Describe Relation DB Model
- Database
- Tables
- Rows
Describe RDB vs MongoDB in terms of schema
Relation DB has a well deifne schema for the data it stores
MongoDb is schemaless
Describe RDB vs MongoDB in terms of data rejection
- Relation DB will reject data that doesnt conform to schema
- Mongo DB can stored unstructured data where content is not known
MongoDB 3 insert steps
- db
- use myDatabase
- db.myCollection.insert( { “country” : “scotland” } )
MongoDB remove operation
db.myCollection.remove( { “city” : “London” } )
MongoDB update operation
db.myCollection.updateOne( {“country” : “USA”} , { $set : { “city” : “ New York” } } )
An aggregation pipeline uses what 2 operations
- $match
- $group
What is Map
Map is same computation across all items
What is reduce
Combing the mapped values
Design a MongoDB data model for a specified scenario and contrast it with the way the data would be modelled in a relational database (also in week 11)
Retrieve MongoDB data from Python