Data Management and Analytics Flashcards
What is Big Data and the 5 V’s of Big Data?
- the corporate accumulation of massice amounts of data that can be used for analysis, commonly referred to as data analytics
- volume, velocity, variety, vercity and value
Define volume
- the quantity or amount of data points
Define velocity
- the speed of data accumulation or data processing
Define variety
- the different types of data that are involved in the analysis
- unstructured, structured and semi-structured
Define veracity
- represents the reliability, quality or integrity of the data (trustworthiness)
Define value
- the insights Big Data can yield
What is a primary key?
- unique identifiers for a specific row within a table and are made up of one or more attributes
- each row must have a unique primary key
What is a foreign key?
- attributes in one table that are also primary keys in another table
- the link between a primary key in one table and a foreign key in another table is what creates a relationship between tables
What is the extract, transform and load process (ETL)?
- the process in which data is captured from its source and transferred to an org’s custody so that it can then be further analyzed
What is data extraction?
- can take the form of an automated process, semiautomated process or manual extraction
- the native source and the means of accessing the data must be determined in the initial ETL phase which will dictate the tools needed for designing the overall process of extraction
What is manual extraction?
- a person may have to use specialized data mining software or write customized queries to obtain the data
- tools used must ensure the data is coming from the correct location and is complete and accurate
What is data transformation?
- one of the most time consuming steps in the ETL process because it entails taking the often-unstructured raw data, cleaning it, manipulating it and validating it to ensure it is accurate and ready for analysis
What is data validation?
- needed after transformation to ensure data is not lost or inappropriately modified in the cleaning process
- may be a visual review for simple data sets
- if data set is large, basic statistical sets may be required to ensure the data has maintained integrity
What is data load?
- loading the data into a software program for analysis or into a data storage location
- main concern is that the data has been extracted and transformed into a format that is compatible with the software program or storage destination
- may be stored in an Operational Data Store (ODS), data warehouse, data mart or data lake
What are the 4 key applications in data analytics?
Descriptive analytics- describing or explaining WHAT HAS occurred (summarizes the activity)
Diagnostic analytics- diagnosing or explaining WHY it occurred (uncovers correlations, patterns, and relationships)
Predictive analytics- predicting WHAT WILL occur (forecasts and projections)
Prescriptive analytics- prescribing WHAT COULD or SHOULD occur (recommendations and next steps)
the 2 D’s are backward looking
the 2 P’s are forward looking