Lecture 2 Flashcards
Scaling methods
- Scaling up - Vertical scaling
- Scaling out - Horizontal scaling
Sharding?
is a database architecture pattern related to horizontal partitioning - the practice of separating one table’s rows into
multiple different tables, known as partitions.
Concept of sharding does not sit well with relational databases
NoSQL database
“Not Only SQL”
- do not use SQL as their primary query language
- providing access by means of Application Programming Interfaces (APIs).
Types of NoSQL database
Four main types each has model:
- Key-value
- Document
- Wide column stores
- Graph
Data warehouse
- A system used for reporting and data analysis
- Core component of business intelligence
ETL (Extract, Transform, Load)
ETL (Extract, Transform, Load)
Data mart
A simple form of a data warehouse that is Focused on a single subject (or functional area)
Data Lakes
A storage repository of data holds a vast mount of raw data in its native
format until it is needed. There is no hierarchy or organization among
the individual pieces of data.
Data Integration
A set of processes used to retrieve and combine
data from disparate sources into meaningful and valuable
information.
Big Data Integration techniques
- Schema Mapping
- Record Linkage
- Data Fusion
Steps in Data Science Process
- Exploring Data
- Data Pre-processing
CRISP-DM: Cross Industry Standard Process for Data Mining
A well adopted methodology for data mining
Six Phases
- Business understanding
- Data understanding
- Data preparation
- Modeling
- Evaluation
- Deployment