Data Analysis 1 Flashcards
What are the 4 types of data analytics?
- Descriptive - What happened?
- Diagnostic - Why did it happen?
- Predictive - What will happen next?
- Prescriptive - What should be done about it?
What are the 6 most basic steps in data analysis?
- Understanding the problem and desired result
- Setting a clear metric - what and how will be measured?
- Gathering data
- Cleaning data
- Analysing and mining data
- Interpret and present results
What is the difference between analysis and analytics?
Analysis can be done without numbers or data, such as business analysis psycho analysis, etc.
Whereas Analytics, even when used without the prefix “Data”, almost invariably implies use of data for performing numerical manipulation and inference.
What is the ETL process?
Extract, Transform, Load. Describes taking data from disparate sources and centralising them in a data warehouse.
What is a data warehouse?
Data warehouse - your single source of truth for all data that has been extracted, transformed, loaded from any source
What is a data mart?
Data mart - Subsection of the data warehouse, built for a specific business function, purpose, or community of users (e.g. individual stakeholder data). Isolated security and performance.
What is a data lake?
Data lake - A repository that can store structured, semi-structured and unstructured data in their raw format, classified and tagged with meta data
What is a data pipeline?
Encompasses the entire journey of moving data from one system to another, including the ETL process. Typically loads into a data lake.
What are the 5 V’s of big data?
- Velocity - data is being generated fast and constantly
- Volume - scale and storage of data
- Variety - diversity (structured, non-structured, people-data and machine-data etc.)
- Veracity - quality and origin
- Value - ability to turn data into value
What is a data repository?
A Data Repository is a general term that refers to data that has been collected, organized, and isolated so that it can be used for reporting, analytics, and also for archival purposes.
This can include databases, marts, warehouses etc.
What is data wrangling?
Exploration, transformation validation and publishing of data to prepare it for analysis
What is ‘normalising’ data?
Cleaning unused data, reducing redundancy, reducing inconsistency
What is ‘denormalising’ data?
Combining data from multiple tables into a single table for faster queries and analysis
What is ‘enriching’ data?
Adding to your data to get more value out of it, e.g. using the metadata
What are descriptive statistics and inferential statistics
Descriptive is focused on describing the visible characteristics of a dataset, without necessarily making any inferences or drawing conclusions about it. E.g. Mean/Median/Mode.
Inferential statistics takes data from a sample to make inferences about a larger population from which the sample was drawn