Chapter 8 Flashcards
Quantitative analysis
focuses on quantifying the patterns and correlations found in the data
technique involves analyzing a large number of observations from a dataset
Quantitative analysis results are absolute in nature and can therefore be used for numerical comparisons
Qualitative analysis
-focuses on describing various data qualities using words
-smaller sample in greater depth compared to
quantitative data analysis
-They also cannot be measured numerically or used for
numerical comparisons.
Data mining
–data discovery, is a specialized form of data analysis that
targets large datasets
–refers to automated, software-based techniques that sift through massive datasets to identify patterns and trends.
–extracting hidden or unknown patterns in the data with the
intention of identifying previously unknown patterns
–Data mining forms the basis for predictive analytics and business intelligence (BI).
Statistical Analysis
Statistical analysis uses statistical methods based on mathematical formulas as a means for
analyzing data.
is most often quantitative, but can also be qualitative
A/B Testing
known as split or bucket testing, compares two versions of an element to determine which version is superior based on a pre-defined metric.
Correlation
an analysis technique used to determine whether two variables are related to each other
- The use of correlation helps to develop an understanding of a dataset and find relationships that can assist in explaining a phenomenon.
- Correlation is therefore commonly used for data mining where the identification of relationships between variables in a dataset leads to the discovery of patterns and anomalies.
Regression
explores how a dependent variable is related to an independent variable within a dataset.
Classification (Supervised Machine Learning)
supervised learning technique by which data is classified into relevant, previously learned categories.
It consists of two steps:
1. The system is fed training data that is already categorized or labeled, so that it can develop an understanding of the different categories.
2. The system is fed unknown but similar data for classification and based on the understanding it developed from the training data, the algorithm will classify the unlabeled data.
Clustering (Unsupervised Machine Learning)
Clustering is an unsupervised learning technique by which data is divided into different groups so that the data in each group has similar properties.
There is no prior learning of
categories required. Instead, categories are implicitly generated based on the data groupings.
Clustering is used in data mining
Outlier Detection
process of finding data that is significantly different from or
inconsistent with the rest of the data within a given dataset.
-This machine learning technique is used to identify anomalies, abnormalities and deviations that can be advantageous, such as opportunities, or unfavorable, such as risks.
-Outlier detection is closely related to the concept of classification and clustering, although its algorithms focus on finding abnormal values. It can be based on either supervised or unsupervised learning
Filtering
Filtering is the automated process of finding relevant items from a pool of items
Filtering is generally applied via the following two approaches
- collaborative filtering
* content-based filtering
Collaborative filtering
Collaborative filtering is an item filtering technique based on the collaboration, or merging, of a user’s past behavior with the behaviors of others. A target user’s past behavior, including their likes, ratings, purchase history and more, is collaborated with the
behavior of similar users. Based on the similarity of the users’ behavior, items are filtered for the target user.
–solely based on the similarity between users’ behavior.
–large amount of user behavior data in order to accurately filter items
Content-based filtering
an item filtering technique focused on the similarity between users and items
Natural language processing
computer’s ability to comprehend human speech and text as naturally understood by humans