From Questions Flashcards
What is CRISP-DM?
Cross Industry standard process for data mining - data mining process model. It consists six phases:
1. Business understanding
2. Data understanding
3. Data preparation
4. Modeling
5. Evaluation (validate your results)
6.deployment (put the model into production)
Is CRISP-DM iterative process?
Yes due to continues process cycle and each step is important.
What are preattentive attributes? examples
Help us to create visuals that emphasise the most important information
First thing you see in the data
Examples - color form size form
Gestalt principles of visual perception
It has 6 principles:
1. Proximity ( element close to each other perceived as a group)
2. Similarity (visually similar color size)
3. Closure ( information filling with general i formation)
4. Continuity (the eye is drawn to continues lines and paterns)
5. Enclosure (elements not close to each other perceived as a group)
6. Connection (physically connected as a group)
What is MLE? And its purpose?
MAXIMUM LIKEHOOD ESTIMATION
Method that estimates the best parameters to probabilistic model.
Its purpose to make observation most probable under the given model
Properties that make MLE good?
1) consistency
2) asymptotic normality (normal distribution allows to do hypothesis test)
3) efficiency (makes the most precise estimates of the parameters)
4) invariance ( simplifies the estimation of transform parameters)
What method among supervised learning methods uses MLE?
Logistic regression, to estimate its parameters allowing it to effectively model the probability of binary outcomes.
Which learning framework linear regression belongs to
Supervised learning framework
Which learning framework Logistic regression belongs to
Supervised learning framework
Which data science task Linear regression solves ?
It is used to solve the regression tasks in data science to predict continues outcome variable .
Linear regression assumptions that it makes on the data:
1) Linearity ( relationship between dependent and independent variables)
2) Independence (observations are independent of each other
3) Homoscedasty (residuals have constant variance of independent variables )
4) normality (residuals are normally distributed )
5) No multicollinearity (variables are not highly correlated)
6) No autocorrelation (residuals are not correlated)
How to use linear regression model?
Y is the only one dependent variable
Also there is intercept ( constant ) variable
X is more independent variables
How to fit linear regression model?
Using least squares method
How predictions are made in linear regression model?
Predictions are made by applying the fitted linear equation to new data points.
What learning framework logistic regression model belongs to?
Supervised learning framework
Which data science task logistic regression model solves?
It predict the output with the given data
Assumptions that logistic regression model makes on the data?
- Binary dependant variable (have two outcomes 0 and 1)
- Linearity (relationship between independent and log odds of the dependent variables )
- Independence (observations is independent of each other)
- Absence of multicollinearity (independent variables are highly correlated)
- Sufficient sample size (sample size should be large enough to provide sufficient data point)
How logistic regression model is used and how it works?
Variables of logistic regresiion are used to predict binary outcome (1 or 0, true or false)
This model is based on logistic function to make predicted values to probabilities
How to fit logistic regression model?
Involve estimating the coefficients
How predictions are made in logistic regression model?
It is made based on The logistic function which linear combination turns into probability
What is gradient decent? And how it works?
gradient decent is an iterative optimisation method that iteratively adjusts parameters to minimise a given function
What are the methods to minimise function ( in gradient decent method)
- Scalar
- Multivariable
Describe gradient decent method in the scalar case and in multivariable case?
- Scalar case - goal is to find the value of the only one variable X that minimises function f(x)
- Multivariable case - the goal is to find vector that minimises function f(x)
How to make gradient decent more effective ?
Use adaptive methods such as AdaGrad or Adam or use mini-batch gradient decent.
How we evaluate the quality of a logistic regression model, referring to possible performance indicators?
- u need to find right threshold to make class labels
- Do a confusion matrix
- After that you can evaluate accuracy and error rate
What confusion matrix describes
How many predictions were correct and incorrect
How the threshold affects the performance indicators and how we choose its value?
By changing the threshold we get different confusion matrixes. Default threshold 0,5 it maximises the accuracy but depending on your data you can lover it or increase threshold and it modifies accuracy, error rate and all confusion matrix components
How to evaluate the quality of a linear regression model, referring to possible performance indicators ?
- Residuals sum of squares (RSS)
- Root mean squared error (rmse)
- Mean squared error (MSE)
- No threshold
What is approximation trade off that is cone ted to supervised learning methods?
Is about finding the right balance between making your model simple enough to work well on new data and complex enough to fit in the training data
How does the selection of the hypothesis space H of a supervised learning method affect the the trade off?
If the hypothesis space is to simple the model might miss important information patterns (high bias, under fitting)
If its to complex the model might fit the noise in the training data to closely (high variance, overfitting)
The key is to find middle ground where the model captures the right trends without getting distracted by irrelevant details.
What are the differnces between regression trees and classification trees?
Regresion trees - for predicting continues numeric outcomes
Classification trees - predicts caregorical outcomes
Decision tree types
Regression and classification
Recursive binary splitting in regression trees
Minimise variance
Cost-complexity pruning in decision trees
Both trees purpose is to remove the tree by removing less important splits
Regression trees - removing splits that don’t significantly reduce the overall variance
Classification trees - remove splits that don’t significantly improve the purity of the groups
Making predictions decision trees
Regression - predict the average value of the target variable
Classification - predicts the most common category
What are forward propagation
Is the process by which the inputs data passes thou the neural network to generate outcome
What is back propagation
Is the process in neural networks of adjusting the weight and biases to minimise the error in predictions
Advantages of back propagation?
Efficient
Less tine consuming when its done by programs
Flexibility
Generalisation
What is Computer vision and how does it relate to machine vision ?
Computer vision helps computers understand and interpret data like images and videos.
Machine vision applies computer vision technologies specifically to industrial processes
Which tasks Computer vision solve?
- Image classification
- Object detection
- Segmentation
- Annotation
Challenge that drove computer vision development?
Challenge: the need to achieve higher accuracy as efficiency in image recognition tasks
Standard de facto methods used to address Computer vision tasks?
CNN
What is the purpose if transfer learning?
Purpose : is to leverage a pre-trained data model which has already learned features from a large dataset and applies to a new task
How transfer learning compares to traditional machine learning?
Machine learning generates a new model for every new task while transfer learning uses already generated model for new tasks
How can we apply transfer learning and what are you its advantages?
Using transfer learning on CNN saves time and is more efficient in terms of saving data
Based on two basic paradigms describe hierarchical clustering method?
• Agglomerative - button up approach, each observation represents a cluster
• Divisive - top down approach, we have only one big cluster which contains all the data
Based on the concept of linkage describe hierarchical clustering method?
- Complete linkage
- Single linkage
- Average linkage
- Centroid linkage
What is dendrogram
Diagram of clusters
How do we select the number of clusters
Number is selected by vertical lines
Describe the two interpretations of Principal Component Analysis (PCA)
- Data reduction ( reduces the number of features while keeping the most importantant info)
- Pattern recognition (identifies patterns where the data varies the most )
Which matrix decomposition PCA rely on and how?
Relies on eigenvalue decomposition Singular value decomposition
- Breaks down original data matrix
- Identify the principal components
- These components