LU1- Recap Flashcards
What are the four types of Data Analytics?
Descriptive
Diagnostic
Predictive
Prescriptive
Name 5 statistical techniques for Data Analysis
Linear regression
Classification
Resampling methods
Tree based methods
Unsupervised learning
What is Linear Regression?
Linear Regression is the technique that is used to predict a target variable by providing the best linear relationship among the dependent and independent
variable where best fit indicates the sum of all the distances amidst the shape and
actual observations at each data point is as minimum as achievable.
What is classification
Classification allocates specific
categories to a collection of data for making more spesific predictions and analysis.
Name two types of classification techniques
Logistic Regression
Discriminant Analysis
What is Logistic Regression?
A regression analysis technique to perform when the dependent variable is binary. It is a predictive analysis that is utilized for explaining data and the connection amongst one dependent variable and other nominal independent variables.
Name two resampling techniques
Bootstrapping
Cross- Validation
What is bootstrapping?
It operates through sampling with replacement from the actual
data and accounts the “not selected” data points as test samples.
What is Cross-Validation?
This technique is used in order to validate the model performance, and
can be executed by dividing the training data into K parts. During cross validation execution, the K-1 part can be considered as training and the rest made out part acts
as a test set. Up to K times, the process is repeated and then the average of K scores is
accepted as performance estimation
When does Undersampling take place?
When the majority of the class is copied
When does oversampling take place?
When the minority of the class gets copied
Name 3 Unsupervised learning algorithms
Principal component Analysis
K-Means Clustering
Hierarchical Clustering
What is Principal component analysis?
recognising a linear-set of the mutually uncorrelated blend of features having maximum variance. Also, it helps in acquiring latent interaction among the
variables in an unsupervised framework.
What is Machine Learning?
Machine Learning is the adoption of mathematical and or statistical models in order to get customized knowledge about data for making foresight.
Name an unsupervised machine-learning technique
Clustering
What are Latent variable models?
Latent variable models are commonly used for data preprocessing, such as reducing the number of features in a dataset (dimensionality reduction) or decomposing the dataset into multiple components.
What is Clustering
A clustering problem is where you want to discover the inherent groupings in the
data, such as grouping customers by purchasing behavior.
what type of learning( supervised/ unsupervised) makes use of clustering?
unsupervised
what type of learning( supervised/ unsupervised) makes use of Classification?
Supervised
what type of learning( supervised/ unsupervised) makes use of Regression?
Supervised