LU1- Recap Flashcards by Karmen Fabel

What are the four types of Data Analytics?

Descriptive
Diagnostic
Predictive
Prescriptive

How well did you know this?

Not at all

Perfectly

Name 5 statistical techniques for Data Analysis

Linear regression
Classification
Resampling methods
Tree based methods
Unsupervised learning

How well did you know this?

Not at all

Perfectly

What is Linear Regression?

Linear Regression is the technique that is used to predict a target variable by providing the best linear relationship among the dependent and independent
variable where best fit indicates the sum of all the distances amidst the shape and
actual observations at each data point is as minimum as achievable.

How well did you know this?

Not at all

Perfectly

What is classification

Classification allocates specific
categories to a collection of data for making more spesific predictions and analysis.

How well did you know this?

Not at all

Perfectly

Name two types of classification techniques

Logistic Regression
Discriminant Analysis

How well did you know this?

Not at all

Perfectly

What is Logistic Regression?

A regression analysis technique to perform when the dependent variable is binary. It is a predictive analysis that is utilized for explaining data and the connection amongst one dependent variable and other nominal independent variables.

How well did you know this?

Not at all

Perfectly

Name two resampling techniques

Bootstrapping
Cross- Validation

How well did you know this?

Not at all

Perfectly

What is bootstrapping?

It operates through sampling with replacement from the actual
data and accounts the “not selected” data points as test samples.

How well did you know this?

Not at all

Perfectly

What is Cross-Validation?

This technique is used in order to validate the model performance, and
can be executed by dividing the training data into K parts. During cross validation execution, the K-1 part can be considered as training and the rest made out part acts
as a test set. Up to K times, the process is repeated and then the average of K scores is
accepted as performance estimation

How well did you know this?

Not at all

Perfectly

When does Undersampling take place?

When the majority of the class is copied

How well did you know this?

Not at all

Perfectly

When does oversampling take place?

When the minority of the class gets copied

How well did you know this?

Not at all

Perfectly

Name 3 Unsupervised learning algorithms

Principal component Analysis
K-Means Clustering
Hierarchical Clustering

How well did you know this?

Not at all

Perfectly

What is Principal component analysis?

recognising a linear-set of the mutually uncorrelated blend of features having maximum variance. Also, it helps in acquiring latent interaction among the
variables in an unsupervised framework.

How well did you know this?

Not at all

Perfectly

What is Machine Learning?

Machine Learning is the adoption of mathematical and or statistical models in order to get customized knowledge about data for making foresight.

How well did you know this?

Not at all

Perfectly

Name an unsupervised machine-learning technique

Clustering

How well did you know this?

Not at all

Perfectly

What are Latent variable models?

Latent variable models are commonly used for data preprocessing, such as reducing the number of features in a dataset (dimensionality reduction) or decomposing the dataset into multiple components.

How well did you know this?

Not at all

Perfectly

What is Clustering

A clustering problem is where you want to discover the inherent groupings in the
data, such as grouping customers by purchasing behavior.

How well did you know this?

Not at all

Perfectly

what type of learning( supervised/ unsupervised) makes use of clustering?

unsupervised

How well did you know this?

Not at all

Perfectly

what type of learning( supervised/ unsupervised) makes use of Classification?

Supervised

How well did you know this?

Not at all

Perfectly

what type of learning( supervised/ unsupervised) makes use of Regression?

Supervised

How well did you know this?

Not at all

Perfectly

What type of machine learning type makes use of labelled input and output data during the training phase of the machine learning lifecycle

Study These Flashcards

Supervised learning

To be able to classify new and unseen datasets and predict outcomes, what does a supervised learning model need to learn

Study These Flashcards

relationship between input and output data

What machine learning is where we have input variables (X) and an output variable (Y)

Study These Flashcards

Supervised learning

why do we call it supervised learning?

Study These Flashcards

Because part of the approach requires human oversight

what is classification?

Classification is used when the output variable is categorical

Give an example of categorical data

yes/no or male/female or true/false

what is regression

regression is used when the output variable is a real or continuous value

give an example of regression variables

salary based on work experience or weight based on height

give an example of an algorithm used for regression problem

Linear regression or support vector regression or regression tree

give an example of a classification problem

The machine needs to understand the difference between stuff (apple, banana& cherry)

what type of learning does not make use of output variables

unsupervised

Unsupervised learning makes use of output variables (true or false)

False

what do you call unsupervised learning output

pseudo output

what is anomaly detection

It is when machine learning automatically detects unusual data points in a dataset

what is association mining

Identifies sets of items that frequently occur together in your dataset

what is latent variable models

Commonly used for data preprocessing such as reducing the number of features in a dataset

Give a real world example of unsupervised learning

computer vision - object recognition medical imaging anomaly detection customer personas- habits recommendation engines

what is the difference between unsupervised vs supervised

unsupervised - no output data is given data is not labeled computationally complex less accurate & trustworthy number of classes is not known

what is accuracy in regards to supervised learning

the ability of a model to make correct predictions

what is interpretability in regards to supervised learning

what degree the model allows for human understanding

give an example of an interpretable model

linear regression Random forest

Give an example of a Non interpretable model

SVM(Support vector machine) LSTM(Long short term memory) Deep learning(DL)

What is K-means clustering

A method of vector quantization, aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean

what type of learning category does K- means clustering fall into

Unsupervised

why use k- means clustering

K-means is used to classify unlabelled data by grouping them by features rather than categories. the goal is to split the data into k different clusters and report the location of the centre of mass for each cluster

what does the K represent in K means cluster

the K represents the number of groups or categories created.

what is hierarchical clustering

algorithm that creates clusters that have predominant ordering from top to bottom

what does hierarchical clustering do?

Hierarchical clustering separates data into groups based on some measure of similarity

what is Agglomerative Hierarchical Clustering

(“bottom-up”) clustering starts with each observation being its own cluster. They merge into subgroups as we move up the tree.

what is divisive clustering

(“top-down”) clustering starts with one cluster of all observations. The cluster is split into subgroups as we move down the tree.

LU1- Recap Flashcards

(50 cards)