LU1- Recap Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What are the four types of Data Analytics?

A

Descriptive
Diagnostic
Predictive
Prescriptive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name 5 statistical techniques for Data Analysis

A

Linear regression
Classification
Resampling methods
Tree based methods
Unsupervised learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Linear Regression?

A

Linear Regression is the technique that is used to predict a target variable by providing the best linear relationship among the dependent and independent
variable where best fit indicates the sum of all the distances amidst the shape and
actual observations at each data point is as minimum as achievable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is classification

A

Classification allocates specific
categories to a collection of data for making more spesific predictions and analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Name two types of classification techniques

A

Logistic Regression
Discriminant Analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Logistic Regression?

A

A regression analysis technique to perform when the dependent variable is binary. It is a predictive analysis that is utilized for explaining data and the connection amongst one dependent variable and other nominal independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Name two resampling techniques

A

Bootstrapping
Cross- Validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is bootstrapping?

A

It operates through sampling with replacement from the actual
data and accounts the “not selected” data points as test samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Cross-Validation?

A

This technique is used in order to validate the model performance, and
can be executed by dividing the training data into K parts. During cross validation execution, the K-1 part can be considered as training and the rest made out part acts
as a test set. Up to K times, the process is repeated and then the average of K scores is
accepted as performance estimation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When does Undersampling take place?

A

When the majority of the class is copied

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When does oversampling take place?

A

When the minority of the class gets copied

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Name 3 Unsupervised learning algorithms

A

Principal component Analysis
K-Means Clustering
Hierarchical Clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Principal component analysis?

A

recognising a linear-set of the mutually uncorrelated blend of features having maximum variance. Also, it helps in acquiring latent interaction among the
variables in an unsupervised framework.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Machine Learning?

A

Machine Learning is the adoption of mathematical and or statistical models in order to get customized knowledge about data for making foresight.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Name an unsupervised machine-learning technique

A

Clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are Latent variable models?

A

Latent variable models are commonly used for data preprocessing, such as reducing the number of features in a dataset (dimensionality reduction) or decomposing the dataset into multiple components.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is Clustering

A

A clustering problem is where you want to discover the inherent groupings in the
data, such as grouping customers by purchasing behavior.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what type of learning( supervised/ unsupervised) makes use of clustering?

A

unsupervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what type of learning( supervised/ unsupervised) makes use of Classification?

A

Supervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what type of learning( supervised/ unsupervised) makes use of Regression?

A

Supervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What type of machine learning type makes use of labelled input and output data during the training phase of the machine learning lifecycle

A

Supervised learning

21
Q

To be able to classify new and unseen datasets and predict outcomes, what does a supervised learning model need to learn

A

relationship between input and output data

22
Q

What machine learning is where we have input variables (X) and an output variable (Y)

A

Supervised learning

23
Q

why do we call it supervised learning?

A

Because part of the approach requires human oversight

24
Q

what is classification?

A

Classification is used when the output variable is categorical

25
Q

Give an example of categorical data

A

yes/no or male/female or true/false

26
Q

what is regression

A

regression is used when the output variable is a real or continuous value

27
Q

give an example of regression variables

A

salary based on work experience or weight based on height

28
Q

give an example of an algorithm used for regression problem

A

Linear regression or support vector regression or regression tree

29
Q

give an example of a classification problem

A

The machine needs to understand the difference between stuff (apple, banana& cherry)

30
Q

what type of learning does not make use of output variables

A

unsupervised

31
Q

Unsupervised learning makes use of output variables (true or false)

A

False

32
Q

what do you call unsupervised learning output

A

pseudo output

33
Q

what is anomaly detection

A

It is when machine learning automatically detects unusual data points in a dataset

34
Q

what is association mining

A

Identifies sets of items that frequently occur together in your dataset

35
Q

what is latent variable models

A

Commonly used for data preprocessing such as reducing the number of features in a dataset

36
Q

Give a real world example of unsupervised learning

A

computer vision - object recognition
medical imaging
anomaly detection
customer personas- habits
recommendation engines

37
Q

what is the difference between unsupervised vs supervised

A

unsupervised - no output data is given
data is not labeled
computationally complex
less accurate & trustworthy
number of classes is not known

38
Q

what is accuracy in regards to supervised learning

A

the ability of a model to make correct predictions

39
Q

what is interpretability in regards to supervised learning

A

what degree the model allows for human understanding

40
Q

give an example of an interpretable model

A

linear regression
Random forest

41
Q

Give an example of a Non interpretable model

A

SVM(Support vector machine)
LSTM(Long short term memory)
Deep learning(DL)

42
Q

What is K-means clustering

A

A method of vector quantization, aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean

43
Q

what type of learning category does K- means clustering fall into

A

Unsupervised

44
Q

why use k- means clustering

A

K-means is used to classify unlabelled data by grouping them by features rather than categories. the goal is to split the data into k different clusters and report the location of the centre of mass for each cluster

45
Q

what does the K represent in K means cluster

A

the K represents the number of groups or categories created.

46
Q

what is hierarchical clustering

A

algorithm that creates clusters that have predominant ordering from top to bottom

47
Q

what does hierarchical clustering do?

A

Hierarchical clustering separates data into groups based on some measure of similarity

48
Q

what is Agglomerative Hierarchical Clustering

A

(“bottom-up”) clustering starts with each
observation being its own cluster. They merge into subgroups as we move up the tree.

49
Q

what is divisive clustering

A

(“top-down”) clustering starts with one cluster of
all observations. The cluster is split into subgroups as we
move down the tree.