Overview of Data Repositories Flashcards
Definition of Data Repository
-general term used to refer to data that has been collected, organized, and isolated
-isolate data, make reporting and analytics more efficient and credible, and serve as a data archive
Databases
collection of data (or information) designed for the input, storage, search and retrieval, and modification of data
Database Management System (DBMS)
-set of programs that creates and maintains the database
-allows you to store, modify, and extract information from the database using a function called querying
Different types of databases determined by (5)
-data type and structure
-querying mechanisms
-latency requirements
-transaction speeds
-intended use of data
Relational Databases (RDBMSes) vs flat files
-data organized in tabular format (rows and columns) following a well-defined structure and schema (similar to flat files)
-RDBMSes are optimized for data operations and query involving many tables and much larger data volumes (unlike flat files)
Structured Query Language (SQL)
standard querying language for relational databases
Non-relational Databases (NoSQL or “Not only SQL”)
-used to process big data
-stores data in a schema-less or free-form fashion
3 V’s of Big Data
Volume, Velocity, and Variety (aka scale, speed, and flexibility)
Big Data
-cloud computing
-Internet of Things (IoT)
-social media proliferation
Data Repository is comprised of
-a small or large database infrastructure with one or more databases that collect, manage, and store data sets
Data Repository uses
-used in business operations or mined for reporting and data analysis
Data Warehouse
works as a central repository that merges information coming from disparate sources and consolidates it through the ETL process into one comprehensive database for analytics and business intelligence
ETL process
-the extract, transform, and load process
Extract data from different data sources
Transform data into a clean and usable state
Load data into enterprise’s data repository
Data Warehousing and Data Marts were historically
relational as much of traditional enterprise data resided in RDBMes
However, Data Warehousing and Data Marts are now
now include non-relational data repositories (with emergence of NoSQL technologies and new data sources)