Data Management and Analytics_M4 Flashcards
What is the data life cycle?
- The publication phase of the data life cycle is when data is circulated to users for various purposes. Giving that Personal Identification Information (PII) to organizations outside of the company without consent or for any disclosed purpose would be unethical and possibly illegal in some jurisdictions.
- Archival is the storage phase of the data life cycle.
- Capture data collects PII as a normal practice that warrants the collection of such data. The customer willingly provides this information.
- Purge all data is permanently deleted after a period of seven years.
What is the 4 roles of Big Data?
- Defining Data and information
- Defining Big Data
3. Dimensions of big Data (5 components 5V’s)
1. Volume
2. Velocity
3. Variety
4. Veracity
5. Value
4. Big Data Governance ( 4 Components)
I. Big Data Confidentiality (4 Components)
1. Copyrights
2. Patents
3. Trademarks
4. Trade Secrets
II. Big Data Privacy
III. Big Data Ethics
IV. Governance Responsibility
What are the dimensions of Big Data?
3 of 4 rolesof Big Data (5 V’s)
- Volume: Size and amount of data.
- Velocity: Speed at which data may change.
- Variety: different types of data.
- Veracity: Trustworthiness of the data.
- Value: Insights Big Data can yield.
What practices will minimize Big Data Governance Risk?
4 of 4 rolesof big data
- Sound data governance practices can minimize legal, ethical, and customer confidentiality risks while still allowing a company to use data to meet its business objectives. Ensuring that privacy standards are met at every stage of the data life cycle will facilitate this process.
- Disclosing the companyʹs cookies collection policy at the first step (capture) in the data life cycle would provide consumers notice of what the company intends to do. This gives consumers the option to opt out or opt in. Consumers who opt in have given consent to the terms of use, so the company may use that information to offer targeted promotions as outlined in the disclosure.
What are the 2 data management concepts?
- Storing data
- Relational Database Concepts
What are the 10 relational data base concepts?
2 of 2 data management concepts
Relational databases allow data to be stored in different tables, with the tables linked through relationships using key fields.
- Tables
- Attributes (columns)
- Records (rows)
- Fields
- Data Types
- Database keys
- Relationships
- Data Dictionaries
- Data Views
- Data Queries and reports
What are Database Keys
6 of 10 relational database concepts
A Primary Key
- Serves as a unique identifier to allow a user to identify a specific record in a database.
A Foreign Key
- Is a column (or column group) found in a relational database table that links data between two tables.
- It references the primary key of another table.
- A foreign key is an attribute in one table that is also a primary key in another table.
- The foreign key drives the relationship used in a relational database.
A Secondary Key
- Is a non-identifying column (or column set) used to find a row in a table.
- It serves as a candidate key for the primary key but is not actually the primary key chosen.
What is a data dictionary?
8 of 10 Relationaship Database Concepts
A data dictionary (metadata) provides information about the data in a database.
What is Data Analytics aka Data Mininig?
- Allows users to obtain data themselves from databases or data warehouses.
- Allows users to perform diagnostic analytics to drill down into underlying data to answer questions or better understand the data.
- Involves taking raw data, looking for trends, and then transforming that information into insights. Applications used can include basic sums or averages, or can be more advanced, involving statistics or artificial intelligence.
- Analyzes data warehouse in an attempt to discover hidden patterns and trends in historical business activities.
- Helps managers understand the changes occurring in a business and would assist in making strategic business decisions to gain a competitive advantage in the marketplace.
- Used to sift through large amounts of data, sometimes several terabytes of information.
- Data cannot be mined without a computer.
What language is used to pull data from a database for analytics?
data is pulled by ETL read by SQL a query language
- Most analytics exercises uses extract, transform, and load (ETL) to pull and format the data so it is usable.
- It uses a variation of Structure Query Language (SQL) to select the tables and data needed for input into an analytical model.
- SQL is a type of code that uses commands such as SELECT, FROM, and WHERE to query database.
- Programming languages such as C++, C, and JavaScript can be used to pull data, but SQL is the most common.
What are the 3 elements in performing data analytics?
Extract, Transfer, and Load (ETL) process
An extract, transfer, and load (ETL) process involves capturing information from its source and transferring it into the custody of another source.
1. EXTRACT
2. TRANSFER
3. LOAD
- Loading is the last phase in the ETL process.
What is the 1st element of data analytics?
ETL - Extract
EXTRACT
1. Data Identification
2. Obtaining the data
- Requesting the data
- Automated Extraction
- Manual Extraction
What is the 2nd element of data analytics?
ETL-Transfer
TRANSFER
- Cleaning the data
- Validating the data
- Manipulating the data
What is the 3rd elements of data analytics?
ETL - Load
LOAD
1. Data Storage
- Operational Data Storage (ODS)
- Data Warehouse
- Data Mart
- Data Lake: Schema, the organization of data that represents the construction of the database mgmt system.
2. Data Storage Requirements
* Entity Integrity
* Referential Integrity
3. Data Storage Attributes
- Relevance
- Elements to be Included and Excluded
- Relationship between Elements include Validity, Completeness, and Accuracy
4. Types of Loading: Initial full loading, Incremental Loading, Full Refresh Loading.
- Load Verification
What are the different data analytic processes?
Descriptive Analytics
- Describe what happens within data. Compiling the average sales by region would be considered a summary statistic used to describe the data and is therefore descriptive analytics.
- Describe events that have already occurred.
Predictive Analytics
- provide expected or predicted outcomes based on historical data.
- use statistical techniques and forecasting models to predict what could happen.
- This type of analytics only provides a simple descriptive output.
Diagnostic Analytics
- Explain why something happened.
- Provides a simple descriptive output and does not explain the drivers or underlying causes of the value of the output.
- focus on determining why something occurred.
Prescriptive Analytics
- prescribe or recommend actions to be taken based on advanced analytics to reach a desired goal.
- use optimization and simulation algorithms to affect future decisions.