From Questions Flashcards
What is CRISP-DM?
Cross Industry standard process for data mining - data mining process model. It consists six phases:
1. Business understanding
2. Data understanding
3. Data preparation
4. Modeling
5. Evaluation (validate your results)
6.deployment (put the model into production)
Is CRISP-DM iterative process?
Yes due to continues process cycle and each step is important.
What are preattentive attributes? examples
Help us to create visuals that emphasise the most important information
First thing you see in the data
Examples - color form size form
Gestalt principles of visual perception
It has 6 principles:
1. Proximity ( element close to each other perceived as a group)
2. Similarity (visually similar color size)
3. Closure ( information filling with general i formation)
4. Continuity (the eye is drawn to continues lines and paterns)
5. Enclosure (elements not close to each other perceived as a group)
6. Connection (physically connected as a group)
What is MLE? And its purpose?
MAXIMUM LIKEHOOD ESTIMATION
Method that estimates the best parameters to probabilistic model.
Its purpose to make observation most probable under the given model
Properties that make MLE good?
1) consistency
2) asymptotic normality (normal distribution allows to do hypothesis test)
3) efficiency (makes the most precise estimates of the parameters)
4) invariance ( simplifies the estimation of transform parameters)
What method among supervised learning methods uses MLE?
Logistic regression, to estimate its parameters allowing it to effectively model the probability of binary outcomes.
Which learning framework linear regression belongs to
Supervised learning framework
Which learning framework Logistic regression belongs to
Supervised learning framework
Which data science task Linear regression solves ?
It is used to solve the regression tasks in data science to predict continues outcome variable .
Linear regression assumptions that it makes on the data:
1) Linearity ( relationship between dependent and independent variables)
2) Independence (observations are independent of each other
3) Homoscedasty (residuals have constant variance of independent variables )
4) normality (residuals are normally distributed )
5) No multicollinearity (variables are not highly correlated)
6) No autocorrelation (residuals are not correlated)
How to use linear regression model?
Y is the only one dependent variable
Also there is intercept ( constant ) variable
X is more independent variables
How to fit linear regression model?
Using least squares method
How predictions are made in linear regression model?
Predictions are made by applying the fitted linear equation to new data points.
What learning framework logistic regression model belongs to?
Supervised learning framework
Which data science task logistic regression model solves?
It predict the output with the given data
Assumptions that logistic regression model makes on the data?
- Binary dependant variable (have two outcomes 0 and 1)
- Linearity (relationship between independent and log odds of the dependent variables )
- Independence (observations is independent of each other)
- Absence of multicollinearity (independent variables are highly correlated)
- Sufficient sample size (sample size should be large enough to provide sufficient data point)
How logistic regression model is used and how it works?
Variables of logistic regresiion are used to predict binary outcome (1 or 0, true or false)
This model is based on logistic function to make predicted values to probabilities
How to fit logistic regression model?
Involve estimating the coefficients
How predictions are made in logistic regression model?
It is made based on The logistic function which linear combination turns into probability