First Part Flashcards
To learn and answer well
What are the properties of normal distribution
Properties of Nornal Distribution:
Unimodal -one mode
Symmetrical -left and right halves are mirror images
Bell-shaped -maximum height (mode) at the mean
Mean, Mode, and Median are all located in the center
Asymptotic
What is the goal of A/B testing
It is a statistical hypothesis testing for a randomized experiment with two variables A and B.
What is sensitivity ,specificity ,accuracy and precision
Sensitivity or TPR(True Postive Rate)= TP/(TP+FN)
Specificity or TNR(True Negative Rate)= TN/(TN+FP)
Precision or PPV(Positive Predictive Value)=TP/(TP+FP)
ACC=(TP+TN)/(TP+FP+TN+FN)
What is over-fitting
In over-fitting, a statistical model describes/follows the random error or noise instead of the underlying relationship. Over-fitting occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. A model that has been over-fit has poor predictive performance, as it overreacts to minor fluctuations in the training data.
What is under-fitting
Under-fitting occurs when a statistical model or machine learning algorithm cannot capture the underlying trend of the data. Under-fitting would occur, for example, when fitting a linear model to non-linear data. Such a model too would have poor predictive performance.
What is Univariate analysis
Univariate analyses are descriptive statistical analysis techniques which can be differentiated based on the number of variables involved at a given point of time. For example, the pie charts of sales based on territory involve only one variable that can be referred to as univariate analysis.
What are bivariate and multivariate analysis
Bi variate tries to see how two variables interact with each other and understand what are the differences between the two. Example is a scatter plot. Multi-Variate analysis is to do the same but with more than 2 variables
What are eigen values and eigen vectors
Eigen Vectors are used for understanding linear transformations. In analysis they ae generally used for correlation or covariance matrix.Eigenvectors are the directions along which a particular linear transformation acts by flipping, compressing or stretching.
Eigenvalue can be referred to as the strength of the transformation in the direction of eigenvector or the factor by which the compression occurs.
What is machine learning
Machine Learning explores the study and construction of algorithms that can learn from and make predictions on data.
What is supervised learning
Supervised learning is the machine learning task of inferring a function from labeled training data. The training data consist of a set of training examples.
It essentially means that there is a target variable
Eg: Support Vector Machines, Regression, Naive Bayes, Decision Trees, K-nearest Neighbor Algorithm and Neural Networks
What is unsupervised learning
Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labeled responses.
There is no target variable.
eg: Clustering, Anomaly Detection, Neural Networks and Latent Variable Models
What is logistic regression
Logistic Regression often referred as logit model is a technique to predict the binary outcome from a linear combination of predictor variables.
What is the logit model
Logit model=log(p/(1-p)) where p is the probability of the event occurring
What are recommender systems
Recommender Systems are a subclass of information filtering systems that are meant to predict the preferences or ratings that a user would give to a product.
What is collaborative filtering
The process of filtering used by most of the recommender systems to find patterns or information by collaborating viewpoints, various data sources and multiple agents.