Data Mining/ ML Methods Flashcards

Question 1

Q

Discuss the general approach to classification

Answer

A

Classification is when you want to assign an item to a specific category based on various conditions. Generally find location of items that need classification, compare it to items close by, and then assign group. Also used for object detection spam cancer etc method is called K nearest neighbors.

Question 2

Q

Clustering

Answer

A

Groupings are unknown, and analyst wants to determine if object belongs to any group. Clustering is unsupervised learning and data set is unlabeled.

Question 3

Q

Bayes Theorem

Answer

A

Given the hypothesis and the observed data, this theorem is the probability of observing data. Basically the probability of getting the data that you found.

Question 4

Q

Naive Bayes

Answer

A

Estimates the conditional probability of an outcome. Naive Bayes is an algorithm that applies to Bayes theorem. Naive Bayes classifier is a ml model used to classify the object based on different features.

Question 5

Q

PCA principal component analysis

Answer

A

This is an attempt to find out if variables themselves group in any meaningful way. This is a data reduction method used to reduce dimensionality of large data sets. This is done by transforming large set of variables into smaller ones that still contains most of the information in the large set.

Question 6

Q

Dimensionality reduction

Answer

A

Reduces the number of variables and the amount of data. PCA is a technique for this

Question 7

Q

Data reduction

Answer

A

Reducing volume of data in storage or in database. Goal is or optimize storage capacity.

Question 8

Q

Hierarchal clustering

Answer

A

Algorithm that groups similar objects into groups that are called clusters.

Question 9

Q

Anomaly detection

Answer

A

Identify rare items. Can be used to detect fraud. Using R or tableau with s local outlier factor or Alfa function

Question 10

Q

Neural networks

Answer

A

Algorithm that mimics the operation of human brain to recognize relationships in data sets.

Question 11

Q

Deep learning

Answer

A

Type of neural network capable of performing text classification. Also type of recurrent neural network RNN that works best on sequential data.

Question 12

Q

Decision Trees

Answer

A

Tree like model of alternative decisions and the consequences. It is a sequence of binary decisions based on your data that can combine to predict an outcome by branching out from one decision to the next.

Question 13

Q

Optimization Analysis

Answer

A

Finding the best value for one or more target variables given certain constraints. Showing what value a variable should have given certain conditions or restraints

Question 14

Q

Supervised model versus unsupervised

Answer

A

Supervised is an ml algorithm that has a labelled data set. Such as classification or regression

Unsupervised is unlabeled data that an ml algorithm tries to find patterns. This would be clustering anomaly detection or a neural network.

Data Mining/ ML Methods Flashcards

(14 cards)