Data Management and Analytics_M4 Flashcards

1
Q

What is the data life cycle?

A
  • The publication phase of the data life cycle is when data is circulated to users for various purposes. Giving that Personal Identification Information (PII) to organizations outside of the company without consent or for any disclosed purpose would be unethical and possibly illegal in some jurisdictions.
  • Archival is the storage phase of the data life cycle.
  • Capture data collects PII as a normal practice that warrants the collection of such data. The customer willingly provides this information.
  • Purge all data is permanently deleted after a period of seven years.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the 4 roles of Big Data?

A
  1. Defining Data and information
  2. Defining Big Data

3. Dimensions of big Data (5 components 5V’s)
1. Volume
2. Velocity
3. Variety
4. Veracity
5. Value

4. Big Data Governance ( 4 Components)

I. Big Data Confidentiality (4 Components)
1. Copyrights
2. Patents
3. Trademarks
4. Trade Secrets

II. Big Data Privacy
III. Big Data Ethics
IV. Governance Responsibility

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the dimensions of Big Data?

3 of 4 rolesof Big Data (5 V’s)

A
  • Volume: Size and amount of data.
  • Velocity: Speed at which data may change.
  • Variety: different types of data.
  • Veracity: Trustworthiness of the data.
  • Value: Insights Big Data can yield.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What practices will minimize Big Data Governance Risk?

4 of 4 rolesof big data

A
  • Sound data governance practices can minimize legal, ethical, and customer confidentiality risks while still allowing a company to use data to meet its business objectives. Ensuring that privacy standards are met at every stage of the data life cycle will facilitate this process.
  • Disclosing the companyʹs cookies collection policy at the first step (capture) in the data life cycle would provide consumers notice of what the company intends to do. This gives consumers the option to opt out or opt in. Consumers who opt in have given consent to the terms of use, so the company may use that information to offer targeted promotions as outlined in the disclosure.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the 2 data management concepts?

A
  1. Storing data
  2. Relational Database Concepts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the 10 relational data base concepts?

2 of 2 data management concepts

A

Relational databases allow data to be stored in different tables, with the tables linked through relationships using key fields.

  1. Tables
  2. Attributes (columns)
  3. Records (rows)
  4. Fields
  5. Data Types
  6. Database keys
  7. Relationships
  8. Data Dictionaries
  9. Data Views
  10. Data Queries and reports
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are Database Keys

6 of 10 relational database concepts

A

A Primary Key

  • Serves as a unique identifier to allow a user to identify a specific record in a database.

A Foreign Key

  • Is a column (or column group) found in a relational database table that links data between two tables.
  • It references the primary key of another table.
  • A foreign key is an attribute in one table that is also a primary key in another table.
  • The foreign key drives the relationship used in a relational database.

A Secondary Key

  • Is a non-identifying column (or column set) used to find a row in a table.
  • It serves as a candidate key for the primary key but is not actually the primary key chosen.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a data dictionary?

8 of 10 Relationaship Database Concepts

A

A data dictionary (metadata) provides information about the data in a database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Data Analytics aka Data Mininig?

A
  • Allows users to obtain data themselves from databases or data warehouses.
  • Allows users to perform diagnostic analytics to drill down into underlying data to answer questions or better understand the data.
  • Involves taking raw data, looking for trends, and then transforming that information into insights. Applications used can include basic sums or averages, or can be more advanced, involving statistics or artificial intelligence.
  • Analyzes data warehouse in an attempt to discover hidden patterns and trends in historical business activities.
  • Helps managers understand the changes occurring in a business and would assist in making strategic business decisions to gain a competitive advantage in the marketplace.
  • Used to sift through large amounts of data, sometimes several terabytes of information.
  • Data cannot be mined without a computer.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What language is used to pull data from a database for analytics?

data is pulled by ETL read by SQL a query language

A
  • Most analytics exercises uses extract, transform, and load (ETL) to pull and format the data so it is usable.
  • It uses a variation of Structure Query Language (SQL) to select the tables and data needed for input into an analytical model.
  • SQL is a type of code that uses commands such as SELECT, FROM, and WHERE to query database.
  • Programming languages such as C++, C, and JavaScript can be used to pull data, but SQL is the most common.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the 3 elements in performing data analytics?

Extract, Transfer, and Load (ETL) process

A

An extract, transfer, and load (ETL) process involves capturing information from its source and transferring it into the custody of another source.

1. EXTRACT
2. TRANSFER

3. LOAD

  • Loading is the last phase in the ETL process.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the 1st element of data analytics?

ETL - Extract

A

EXTRACT

1. Data Identification
2. Obtaining the data

  • Requesting the data
  • Automated Extraction
  • Manual Extraction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the 2nd element of data analytics?

ETL-Transfer

A

TRANSFER

  • Cleaning the data
  • Validating the data
  • Manipulating the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the 3rd elements of data analytics?

ETL - Load

A

LOAD

1. Data Storage

  • Operational Data Storage (ODS)
  • Data Warehouse
  • Data Mart
  • Data Lake: Schema, the organization of data that represents the construction of the database mgmt system.

2. Data Storage Requirements
* Entity Integrity
* Referential Integrity

3. Data Storage Attributes

  • Relevance
  • Elements to be Included and Excluded
  • Relationship between Elements include Validity, Completeness, and Accuracy

4. Types of Loading: Initial full loading, Incremental Loading, Full Refresh Loading.

  • Load Verification
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the different data analytic processes?

A

Descriptive Analytics

  • Describe what happens within data. Compiling the average sales by region would be considered a summary statistic used to describe the data and is therefore descriptive analytics.
  • Describe events that have already occurred.

Predictive Analytics

  • provide expected or predicted outcomes based on historical data.
  • use statistical techniques and forecasting models to predict what could happen.
  • This type of analytics only provides a simple descriptive output.

Diagnostic Analytics

  • Explain why something happened.
  • Provides a simple descriptive output and does not explain the drivers or underlying causes of the value of the output.
  • focus on determining why something occurred.

Prescriptive Analytics

  • prescribe or recommend actions to be taken based on advanced analytics to reach a desired goal.
  • use optimization and simulation algorithms to affect future decisions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the different types of data analytics?

A

Customer and Marketing Analytics

  • Support digital marketing and allow a company to deliver timely, relevant, and anticipated offers to customers.

Managerial and Operational Analytics

  • Operational analytics use data mining and data collection tools to plan for more effective business operations.

Risk and Compliance Analytics

  • Used to monitor transactions through continuous monitoring, continuous auditing, and fraud detection.

Financial Analytics

  • Monitor financial information through data mining and ratios on a continuous basis

Audit Analytics

  • Key to an Audit
  • types of analytics: 1.) asseessing the risk 2.) providing assurance around certain operations 3.) Establishing thresholds and expectations 4.) improving the quality of the testing by testing full populations

Tax Analytics

  • Used by Government entities, organizations, tax accountants, and analyst.
  • Organize tax issues and guide lines.
  • improving tax planning.
  • Monitor tax performance.

New products and services innovation analytics

  • Are used to determine where innovation is needed, and to isolate product qualities that are most important to customers.
17
Q

What are Data Visualizations good for?

A

Can be categorized into pictorials, communicating qualitative or quantitative data.

18
Q

What are the types of data visualizations?

A
  • Waterfall Chart shows the cumulative effect of a series of data points that make up a total, with the incremental contributions making up total net income.
  • Dot plot is a two-dimensional chart that maps data points as frequencies of another dimension onto a coordinate plane, not the cumulative effect of a series of data points.
  • A Flowchart maps out processes, not the cumulative effect of data points.
  • A Directional Chart highlights milestones or events over multiple time periods. This example does not refer to milestones or multiple time periods
  • Line Charts: Quantitative trends over time and can discover hidden trends
  • Column Charts (Bar Charts): A histogram is essentially a bar chart that plots a measurement of data points at different points in time.
  • Stacked Column Charts: Total comparison plus percentage breakdowns as a whole.
  • Scattered Plot Charts: Simple Regression on Correlation.
  • Box Plots a projected stock price is quantitative data, and since it has intervals at which it will trade, there is the option to display these data points showing upper/lower extremes as well as upper/lower quartiles. The best type of chart to illustrate this is a boxplot.
  • Geographic Maps: colored values on a geographical map.
  • Symbol Maps: display data on a geographical map using symbols.
  • Pie Charts: Slices representing a proportional breakdown.
  • Pyramid: bottom layer represents a action or layer that must 1st be achieved.