CAP Predictive Analytics Flashcards

Question 1

Q

Predictive analytics

Answer

A

Statistical techniques used to make predictions about future or otherwise unknown events

Question 2

Q

Data science

Answer

A

Field that applies statistics, data analysis, machine learning, and data mining in order to understand and analyze phenomena

Question 3

Q

Big Data

Answer

A

Data sets so large or complex that traditional data processing applications are inadequate

Question 4

Q

Machine learning

Answer

A

Study of algorithms and models that learn to perform a specific task based on pattern recognition

Question 5

Q

Data mining

Answer

A

The process of discovering patterns in large data sets

Question 6

Q

Deep learning

Answer

A

A family of machine learning methods based on learning data representations as opposed to task-specific algorithms

Question 7

Q

Linear regression

Answer

A

Approach to modeling the relationship between a dependent variable and one or more independent variables in a linear fashion

Question 8

Q

Support Vector Machine

Answer

A

A classifier that attempts to find the maximum separating hyperplane between samples in two classes

Question 9

Q

K-nearest neighbor (K-NN)

Answer

A

A classifier that assigns new samples the majority classification of the k nearest samples in the training space

Question 10

Q

Logistic regression

Answer

A

Approach to modeling the relationship between continuous independent variables and a binary dependent variable, uses a sigmoid function

Question 11

Q

Decision tree

Answer

A

A classifier that uses root nodes, branches, and leaf nodes to indicate classification decisions on unknown samples; learning comes from maximizing information gain or other metrics at each node

Question 12

Q

k-means clustering

Answer

A

Unsupervised learning method that groups elements into one of k different clusters based on which cluster it is closest to

Question 13

Q

Random forest

Answer

A

Ensemble method that randomly selects features, trains a set of decision trees using those features, and then uses majority voting for classification of samples

Question 14

Q

Naive Bayes Classifier

Answer

A

Probabilistic classifier that applies Bayes’ theorem with strong independence assumptions about features

Question 15

Q

Neural network

Answer

A

A classifier that uses artificial neurons/perceptrons linked to each other to learn patterns in data

Question 16

Q

AdaBoost

Answer

Study These Flashcards

A

Boosting algorithm that trains a series of weak learners on data and focuses later learners on data misclassified by earlier learners

Question 17

Q

Kernel factory

Answer

Study These Flashcards

A

An ensemble method that uses kernel machines as classifiers and a genetic algorithm to tune the weights of each machine’s vote in classification

Question 18

Q

Rotation Forest

Answer

Study These Flashcards

A

An ensemble method that uses decision trees as classifiers and principal components of randomly selected variables from training data set as features

Question 19

Q

Dimension reduction

Answer

Study These Flashcards

A

Methods that reduce the number of features used as inputs to a classifier or regressor

Question 20

Q

Gradient boosting

Answer

Study These Flashcards

A

a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion and it generalizes them by allowing optimization of an arbitrary differentiable loss function

Question 21

Q

Ensemble

Answer

Study These Flashcards

A

Machine learning method that aggregates multiple models into one model

Question 22

Q

Bagging; bootstrap aggregating

Answer

Study These Flashcards

A

Ensemble method that uses sampling with replacement to build training data sets for multiple machine learning methods; intended to decrease variance

Question 23

Q

Boosting

Answer

Study These Flashcards

A

Ensemble method that uses weighted sampling based on errors made by previous models to build training data sets for future models; intended to decrease bias

Question 24

Q

Stacking

Answer

Study These Flashcards

A

Ensemble method that trains multiple different types of models on full training data set, then uses those predictions as features to train a meta-method; intended to increase predictive performance

CAP Predictive Analytics Flashcards

(24 cards)