CAP Predictive Analytics Flashcards

1
Q

Predictive analytics

A

Statistical techniques used to make predictions about future or otherwise unknown events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data science

A

Field that applies statistics, data analysis, machine learning, and data mining in order to understand and analyze phenomena

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Big Data

A

Data sets so large or complex that traditional data processing applications are inadequate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Machine learning

A

Study of algorithms and models that learn to perform a specific task based on pattern recognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data mining

A

The process of discovering patterns in large data sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Deep learning

A

A family of machine learning methods based on learning data representations as opposed to task-specific algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Linear regression

A

Approach to modeling the relationship between a dependent variable and one or more independent variables in a linear fashion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Support Vector Machine

A

A classifier that attempts to find the maximum separating hyperplane between samples in two classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

K-nearest neighbor (K-NN)

A

A classifier that assigns new samples the majority classification of the k nearest samples in the training space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Logistic regression

A

Approach to modeling the relationship between continuous independent variables and a binary dependent variable, uses a sigmoid function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Decision tree

A

A classifier that uses root nodes, branches, and leaf nodes to indicate classification decisions on unknown samples; learning comes from maximizing information gain or other metrics at each node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

k-means clustering

A

Unsupervised learning method that groups elements into one of k different clusters based on which cluster it is closest to

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Random forest

A

Ensemble method that randomly selects features, trains a set of decision trees using those features, and then uses majority voting for classification of samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Naive Bayes Classifier

A

Probabilistic classifier that applies Bayes’ theorem with strong independence assumptions about features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Neural network

A

A classifier that uses artificial neurons/perceptrons linked to each other to learn patterns in data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

AdaBoost

A

Boosting algorithm that trains a series of weak learners on data and focuses later learners on data misclassified by earlier learners

17
Q

Kernel factory

A

An ensemble method that uses kernel machines as classifiers and a genetic algorithm to tune the weights of each machine’s vote in classification

18
Q

Rotation Forest

A

An ensemble method that uses decision trees as classifiers and principal components of randomly selected variables from training data set as features

19
Q

Dimension reduction

A

Methods that reduce the number of features used as inputs to a classifier or regressor

20
Q

Gradient boosting

A

a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion and it generalizes them by allowing optimization of an arbitrary differentiable loss function

21
Q

Ensemble

A

Machine learning method that aggregates multiple models into one model

22
Q

Bagging; bootstrap aggregating

A

Ensemble method that uses sampling with replacement to build training data sets for multiple machine learning methods; intended to decrease variance

23
Q

Boosting

A

Ensemble method that uses weighted sampling based on errors made by previous models to build training data sets for future models; intended to decrease bias

24
Q

Stacking

A

Ensemble method that trains multiple different types of models on full training data set, then uses those predictions as features to train a meta-method; intended to increase predictive performance