CAP Predictive Analytics Flashcards
Predictive analytics
Statistical techniques used to make predictions about future or otherwise unknown events
Data science
Field that applies statistics, data analysis, machine learning, and data mining in order to understand and analyze phenomena
Big Data
Data sets so large or complex that traditional data processing applications are inadequate
Machine learning
Study of algorithms and models that learn to perform a specific task based on pattern recognition
Data mining
The process of discovering patterns in large data sets
Deep learning
A family of machine learning methods based on learning data representations as opposed to task-specific algorithms
Linear regression
Approach to modeling the relationship between a dependent variable and one or more independent variables in a linear fashion
Support Vector Machine
A classifier that attempts to find the maximum separating hyperplane between samples in two classes
K-nearest neighbor (K-NN)
A classifier that assigns new samples the majority classification of the k nearest samples in the training space
Logistic regression
Approach to modeling the relationship between continuous independent variables and a binary dependent variable, uses a sigmoid function
Decision tree
A classifier that uses root nodes, branches, and leaf nodes to indicate classification decisions on unknown samples; learning comes from maximizing information gain or other metrics at each node
k-means clustering
Unsupervised learning method that groups elements into one of k different clusters based on which cluster it is closest to
Random forest
Ensemble method that randomly selects features, trains a set of decision trees using those features, and then uses majority voting for classification of samples
Naive Bayes Classifier
Probabilistic classifier that applies Bayes’ theorem with strong independence assumptions about features
Neural network
A classifier that uses artificial neurons/perceptrons linked to each other to learn patterns in data
AdaBoost
Boosting algorithm that trains a series of weak learners on data and focuses later learners on data misclassified by earlier learners
Kernel factory
An ensemble method that uses kernel machines as classifiers and a genetic algorithm to tune the weights of each machine’s vote in classification
Rotation Forest
An ensemble method that uses decision trees as classifiers and principal components of randomly selected variables from training data set as features
Dimension reduction
Methods that reduce the number of features used as inputs to a classifier or regressor
Gradient boosting
a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion and it generalizes them by allowing optimization of an arbitrary differentiable loss function
Ensemble
Machine learning method that aggregates multiple models into one model
Bagging; bootstrap aggregating
Ensemble method that uses sampling with replacement to build training data sets for multiple machine learning methods; intended to decrease variance
Boosting
Ensemble method that uses weighted sampling based on errors made by previous models to build training data sets for future models; intended to decrease bias
Stacking
Ensemble method that trains multiple different types of models on full training data set, then uses those predictions as features to train a meta-method; intended to increase predictive performance