Domain I – Business Acumen – Section C: Data Analytics Flashcards
Data Analytics
is the process of identifying, gathering, and analyzing data with the objective of producing meaningful information and improving the decision‐making process.
Structured data
is understandable and organized in a format that allows for repeatable analysis and queries (e.g. operational, customer or financial data)
Unstructured data
is not organized into traditional data structures. It exists in a free format that makes analysis much more difficult (e.g. call center communications, documents and texts, social media data (Facebook or Twitter), audio, videos, and images data)
Big data
is a term used to describe large, complex data sets that are beyond the capabilities of traditional data processing and analyzing applications.
=> Structured and unstructured data created by individuals, organizations, applications, smart machines, sensing devices, and Internet of Things (IoT) are growing rapidly.
Volume
the amount of data being created, captured and processed is vast compared to traditional data sources. Data analytics infrastructure must be appropriate to handle such greater amounts of data.
Varity
Data is being generated by increasing number of sources with different formats.
=> To be successful, data analytics must take into account all types and formats of data.
Velocity
Data is being produced at extremely increasing speed. => To add value, data analytics must focus on critical data elements that are relevant to the objectives.
Veracity
Data must faithfully reflect the truth. For analytical purposes, the data must be cleaned and normalized in order to limit the possibility of inaccuracy and errors. A strong data governance culture contributes to the veracity of data.
Data governance
is the management of the data used by an organization to ensure its quality, availability, usability, integrity, consistency, and security.
The Five Steps of Data Analytics Process
- Define the Question
- Obtain the Data
- Clean and Normalize the Data
- Analyze the Data
- Communicate the Results
Cleaning (or cleansing) the data
refers to the process of detecting incomplete,
corrupt, incorrect, inaccurate or irrelevant elements of the data and then correcting or removing those elements.
Normalizing the data
refers to the process of restructuring or organizing the data in order to reduce data redundancy and improve its usability. Data elements that are unexpected, peculiar, nonconforming, or not easily classified are identified and corrected or removed.
Descriptive Analytics
What happened? Descriptive analytics answers the question of what has happened.
= > Examples of descriptive analytics include examining all payments to vendors to identify payments without a valid purchase order.
Diagnostic Analytics
Why something happened? Diagnostic Analytics answers the question of why something has happened. It gives a deep insight into a particular problem or event to identify its cause and uncover possible dependencies and patterns.
=> Examples of descriptive analytics include examining authorization and other controls over the paying process to identify why there are payments to vendors without a valid
purchase order.
Predictive Analytics
What might happen? Predictive analytics is based on analyzing current data and providing certain assumptions to draw valid correlations and predict
future outcomes and trends.
=> For example, collecting and analyzing data on buying trends may enable the analyst to predict future changes and risks in the demand.