MFDS Flashcards
What is Probab? State its types
Possibility of occurence.
No. of favourable outcomes / No. of possible outcomes
Types :
Marginal (no condition - king out of 52 cards)
Conditional (If A occurs then B)
Joint (Both A and B occur simultaneously)
Complementary (Does not occur)
What is normal distr? Applications?
(Gaussian Distr) Distribution that is symmetric about its mean
Law of large numbers and Central Limit Theorem
LLN - result of performing an experiment large number of times (result gets closer to expected value)
formula - sum Xi/n (mean)
CLT - relies on sampling distribution (random samples taken from a pop)
Sampling distribution of mean will always be normal distr
What is Random Variable
It is a variable that takes on numerical values determined by random phenomenon
Discrete (countable distinct values )
Continuous (infinite no. in range)
Applications of Probab Theory?
Risk analysis
Predictive outcomes
Natural language processing
Machine learning
Recommendation system
Hypothesis testing
Expected Value and Variance
Mean
sum( Xi x P(Xi) )
-inf -> inf x(f(x))dx
Variance
sum [Xi - E(X)]^2 x P(Xi) E(X^2) - [E(X)]^2
Mean and Variance of Bernoulli, Binomial
E(X) = p
V(X) = pq
E(X)= np
V(X)=npq
SD = npq^1/2
Probab Distributions
Bernoulli (Discrete - two outcomes)
Binomial (Discrete - fixed no. of bernoulli)
Poission (Discrete - Fixed interval/period of time, fixed mean rate)
Normal (Continuous - symmetric about its mean)
Poission Distribution
P(X=K) = e^-(lam) . lam^k / k!
Point estimation and its methods
Statistical technique to find unknown parameter in a population with the help of sample data.
Method of moments
Mean = (sum xi)/n variance = sum(xi-x)^2 / (n-1) proportion = x/n
Method of ML u=(sum xi)/n variance = 1/n(sum xi-u)^2
Method of LS minimized sum of squared differences betn observed and predicted values
interval estimation and confidence level, margin of error
find population parameter by finding the interval in which it lies with confidence level
probability that parameter falls between a set of values
u = x+-Z sig/rt(n)
level of uncertainity in point estimate
Hypothesis testing
Statistical method to make inferences/decisions about pop parameters based on sample data
Formulate hypothesis (Ho and Hi)
Choose significance level (0.05)
Choose a test statistic (t test, z test)
Compute the test statistic
Compute the P value
Make a decision
Parametric and Non prametric
Parametric - assumed to be drawn from a distribution (known parameters)
Non parametric (data skewed or contains outlier)
Annova test
SSC = [Xa(-)-X(–)]^2 + … n=3
SSE = [A-Xa(-)]^2 +… n = 9
MSC = SSC/n-1
MSE = SSE/n-c
Non paramtric tests
Mann Whitney (2 Independent groups)
Wilcoxon Signed Rank (2 related groups)
Kruskal Wallis (3 or more independent)
Chi Sqaure (categorical values)
Spearman rank (strength of association between two ranked variables)