M1 Flashcards
[ ] are fundamental components in various fields, providing tools for data interpretation and decision-making.
Statistical analysis and Modeling (SAM)
[ ] primarily focuses on the collection, analysis, interpretation, presentation, and organization of data.
Statistics
[ ] provides foundational tools for understanding data distributions, variability, and relationships through methods such as hypothesis testing and regression analysis.
Statistics
[ ] encompasses a broader scope, integrating statistical methods with advanced computational techniques to derive insights from data.
Analytics
[ ] emphasizes predictive modeling, data mining, and the application of algorithms to inform strategic decisions and optimize processes.
Analytics
A [ ] is an interlinked set of activities that an organization performs to convert inputs to outputs that are valuable to a market.
value chain
Insights that prescribe direct and meaningful actions then [ ].
drive decision making
When is the value of data to organizations and their market is realized?
When actions or decisions are implemented from them.
What are the five main data sources?
(TransData - ContractSub - Surve - DataPool - Unstruct)
- Transactional Data
- Contractual, Subscription, or Account Data
- Surveys
- Data Poolers
- Unstructured Data
This type of data source consists of structured, detailed information capturing the key characteristics of a transaction.
Transaction Data
This type of data source includes information about the type of product combined with customer characteristics.
Contractual, Subscription, or Account Data
This type of data source are questionnaires aimed at extracting sociodemographic and behavioral data from a particular group of people.
Surveys
These are companies that gather data in particular settings or for particular purposes and sell them to interested customers looking to enrich or extend their data sources.
Data Poolers
This refers to information that does not reside in a traditional row-column database in the world of big data.
Unstructured Data
What are the phases of Data Analytics?
(Bu - Du - Dp - M - E - D)
- Business Understanding
- Data Understanding
- Data Preparation
- Modelling
- Evaluation
- Deployment
[ ] is knowing what the study is for or identifying a business task.
Business Understanding
[ ] is when you select the related data from many available databases to correctly describe a given business task; identifying relevant data for the problem description.
Data Understanding
[ ] is also known as data preprocessing.
Data Preparation
[ ] is to filter, aggregate, and fill-in (impute) missing values.
Data Preparation
[ ] uses mathematical formulations to convert different measurements into a unified numerical scale.
Data Transformation
Transforming numerical to numerical scales [ ].
shrinks or enlarges the data
Transforming categorical to numerical values can be [ ].
ordinal (less, moderate, strong) or nominal (red, yellow, blue).
What are the two major categories of modeling?
- Predictive Modeling
- Descriptive Modeling
[ ] predicts the value of an attribute based on the values of other attributes.
Predictive Modeling
[ ] derives patters that summarizes the underlying relationships in the data.
Descriptive Modeling
[ ] summarizes the general characteristics or features of a target class of data.
Characterization
What are the methods of Characterization and Discrimination?
- data summaries based on stat measures and plots.
- user-controlled data summarization using OLAP, EXCEL, Spreadsheet, SQL, Python, etc.
What are the outputs of Characterization and Discrimination?
- pie charts, bar charts, curves, crosstabs
- Characteristic Rules
[ ] compares the general features of the target class data objects against the general features of objects from one or multiple contrasting classes.
Discrimination
[ ] is the process of finding a model or function that describes and distinguishes classes or concepts.
Classification
[ ] is a statistical methodology that is most often used for numeric prediction.
Regression Analysis
[ ] is when objects are grouped based on the principle of maximizing the intraclass similarity and minimizing the interclass similarity.
Clustering
[ ] detects objects in data that do not follow norms and its methods may include statistical tests or using distance measures.
Outlier Analysis
What are the two things to consider in the Data Interpretation stage?
- How to recognize business value from knowledge patters discovered.
- How to visualize the results to properly interpret patterns.
A pattern is interesting if:
- easily understood by humans
- valid on new or test data with some degree of certainty
- potentially useful
- novel
What are the two primary techniques of descriptive analytics?
- Data Aggregation
- Data Presentation