Statistic Flashcards

1
Q

What do we mean by the population of study?

A

The population is the set of sources from which data has to be collected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a sample?

A

A Sample is a subset of the Population being studied.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a variable?

A

A Variable is any characteristics, number, or quantity that can be measured or counted. A variable may also be called a data item.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a statistical parameter?

A

Also known as a statistical model, A statistical Parameter or population parameter is a quantity that indexes a family of probability distributions. For example, the mean, median, etc of a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the two data type ?

A

Numerical: data expressed with digits; is measurable. It can either be discrete (finite number of values) or continuous (infinite number of values).

Categorical: qualitative data classified into categories. It can be nominal (no order) or ordinal (ordered data).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the mean, median and mode?

A

Mean: the average of a dataset.

Median: the middle of an ordered dataset; less susceptible to outliers.

Mode: the most common value in a dataset; only relevant for discrete data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the range?

A

Range: the difference between the highest and lowest value in a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the variance, its properties and its formula?

A

Variance (σ2): measures how spread out a set of data is relative to the mean.

Var(X) = E[(X-E(X))2]

Var(aX+b) = a2Var[X]

Var[X+Y] = Var[X]+Var[Y]+2Cov[X,Y]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is R squared?

A

R-Squared: a statistical measure of fit that indicates how much variation of a dependent variable is explained by the independent variable(s); only useful for simple linear regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the covariance, its properties and its formula?

A

Covariance: Measures the variance between two (or more) variables. If it’s positive then they tend to move in the same direction, if it’s negative then they tend to move in opposite directions, and if they’re zero, they have no relation to each other.

Cov[X,Y] = E[XY]-E[X]E[Y] (is zero if X and Y are independant)

Cov[X,Y]=E[(X-E(X))(Y-E(Y))]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the correlation and its formula?

A

Correlation: Measures the strength of a relationship between two variables and ranges from -1 to 1; the normalized version of covariance. Generally, a correlation of +/- 0.7 represents a strong relationship between two variables. On the flip side, correlations between -0.3 and 0.3 indicate that there is little to no relationship between variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a probability density function (pdf)?

A

Probability Density Function (PDF): a function for continuous data where the value at any point can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a probability mass function (pmf)?

A

Probability Mass Function (PMF): a function for discrete data which gives the probability of a given value occurring.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a cumulative density function (CDF)?

A

Cumulative Density Function (CDF): a function that tells us the probability that a random variable is less than a certain value; the integral of the PDF.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the moment of a distribution?

A

Moments describe different aspects of the nature and shape of a distribution. The first moment is the mean, the second moment is the variance, the third moment is the skewness: and the fourth moment is the kurtosis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the skewness of a distribution ?

A

Skewness is the third central moment of a distribution. It is the measure of the lopsidedness of the distribution; any symmetric distribution will have a third central moment, if defined, of zero. The normalised third central moment is called the skewness, often γ. A distribution that is skewed to the left (the tail of the distribution is longer on the left) will have a negative skewness. A distribution that is skewed to the right (the tail of the distribution is longer on the right), will have a positive skewness.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the kurtosis of a distribution?

A

Its the fourth central moment. It is a measure of the heaviness of the tail of the distribution, compared to the normal distribution of the same variance. Since it is the expectation of a fourth power, the fourth central moment, where defined, is always nonnegative; and except for a point distribution, it is always strictly positive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What do we mean by probability?

A

Probability is the likelihood of an event occurring.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What do we mean by a independant event ?

A

Independent events are events whose outcome does not influence the probability of the outcome of another event; P(A|B) = P(A).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What do we mean by mutually exclusive event?

A

Mutually Exclusive events are events that cannot occur simultaneously; P(A|B) = 0.

21
Q

What do we mean by conditional probability? What its formula?

A

Conditional Probability [P(A|B)] is the likelihood of an event occurring, based on the occurrence of a previous event.

22
Q

What is bayes theorem and its formula?

A

Bayes’ Theorem: a mathematical formula for determining conditional probability. “The probability of A given B is equal to the probability of B given A times the probability of A over the probability of B”.

23
Q

When talking about hypothesis testing, what do we means by null hypothesis, alternative hypothesis, p-value, alpha and beta.

A

Null Hypothesis: the hypothesis that sample observations result purely from chance.

Alternative Hypothesis: the hypothesis that sample observations are influenced by some non-random cause.

P-value: the probability of obtaining the observed results of a test, assuming that the null hypothesis is correct; a smaller p-value means that there is stronger evidence in favor of the alternative hypothesis.

Alpha: the significance level; the probability of rejecting the null hypothesis when it is true — also known as Type 1 error.

Beta: type 2 error; failing to reject the null hypothesis that is false.

24
Q

What are the 4 step of hypothesis testing?

A

Steps to Hypothesis testing:

  1. State the null and alternative hypothesis
  2. Determine the test size; is it a one or two-tailed test?
  3. Compute the test statistic and the probability value
  4. Analyze the results and either reject or do not reject the null hypothesis (if the p-value is greater than the alpha, do not reject the null!)
25
Q

What is the expected value and its basic properties?

A

The expected value of a discrete random variable is the probability-weighted average of all its possible values.

Its basic properties are:

E[X]= Σxipi

E[X+Y] = E[X]+E[Y]

E[aX+b] = aE[X] + b

E[XY] = E[X]E[Y] iff X and Y are independant

26
Q

How are the variance and mean related ?

A

Var(X) = E[X2]-(E[X])2 = E[(X-u)2]

27
Q

What are the pdf, mean, variance and domain of X of these discrete distributions: Geometric, Uniform, Binomial, Bernoulli,Hypergeometric, Negative Binomial and Poisson?

A
28
Q

Explain the logic behind these discrete distributions: Geometric, Uniform, Binomial, Bernoulli,Hypergeometric, Negative Binomial and Poisson?

A

Geometric: The probability distribution of the number X of Bernoulli trials needed to get one success.

Uniform: every one of n values has equal probability 1/n.

Binomial: the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments.

Bernoulli: the set of possible outcomes of any single experiment that asks a yes–no question.

Hypergeometric: Describes the probability of x successes in n draws, without replacement, from a finite population of size N that contains exactly r objects with that feature. Where in each draw is either a success or a failure.

Negative Binomial: is the number of successes in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of failures (denoted r) occurs.

Poisson: expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event

29
Q

What is the exponential distribution? What are the mean, variance and pdf? What does it pdf look like?

A

the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate.

pdf: f(x,λ) = λe-λx x>= 0

mean = 1/λ

Variance: 1/λ2

30
Q

What is the normal distribution? What its pdf, mean and variance? What does it look like?

A

A normal distribution is a symmetrical distribution sometimes informally called a bell curve (Gaussian).

Mean: u

Variance: σ2

pdf:

31
Q

What do we mean by information entropy?

A

The information entropy, often just entropy, is a basic quantity in information theory associated to any random variable, which can be interpreted as the average level of “information”, “surprise”, or “uncertainty” inherent in the variable’s possible outcomes.

32
Q

What does information theory study ?

A

Information theory studies the quantification, storage, and communication of information.

Overview: Information theory studies the transmission, processing, extraction, and utilization of information. Abstractly, information can be thought of as the resolution of uncertainty. In the case of communication of information over a noisy channel, this abstract concept was made concrete in 1948 by Claude Shannon in his paper “A Mathematical Theory of Communication”, in which “information” is thought of as a set of possible messages, where the goal is to send these messages over a noisy channel, and then to have the receiver reconstruct the message with low probability of error, in spite of the channel noise.

33
Q

What is the main idea of the central limit theorem (CLT) and how can it be applied to the sample mean?

A

In probability theory, the central limit theorem (CLT) establishes that, in some situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution (informally a “bell curve”) even if the original variables themselves are not normally distributed.

For example let take the sample mean: Sn = (X1+X2+…+Xn)/n

The sample mean is a random variable. The usefulness of the theorem is that the distribution of √n(Sn − µ) approaches N(0,σ2) regardless of the shape of the distribution of the individual Xi.

34
Q

What is a combination and what is its formula?

A

A combination is similar to a permutations. However, the order of the selected items does not matter. For example, the arrangements ab and ba are equal in combinations.

35
Q

What is a permutation? What are the formula for the total number of permutation with and without repetition?

A

A permutation is an ordered combination.

Let say we have N item and pick r of them. When repetition is allowed the total number of permutation is: P(n,r) = nr

When repetition is not allowed the total number of permutation is:

P(n,r) = n!/(n-r)!

36
Q

What is an estimator?

A

In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data. The estimator itself is a random variable.

37
Q

What is the difference between a point estimator and a interval estimator?

A

The point estimators yield single-valued results, although this includes the possibility of single vector-valued results and results that can be expressed as a single function.

This is in contrast to an interval estimator, where the result would be a range of plausible values (or vectors or functions).

38
Q

What are the four quantified properties of an estimator? What is the link between MSE, Bias and Variance?

A

Note, i use o to represent the estimator generally it is theta.

1) Error: For a given sample x, the error of the estimator ô is defined as

e(x) = ô(x)-o

2)Mean squared error, is the probability-weighted average of the sqaured erros:

MSE(ô) = E[(ô(X)-o)2]

3) Variance: It is used to indicate how far, on average, the collection of estimates are from the expected value of the estimates. Keep in mind the estimator is a random variable.

Var(ô)=E[(ô-E(ô))2]

4) Bias is the distance between the average of the collection estimates, and the single parameter being estimate. It is defined as:

B(ô) = E(ô)-o

5) Relationships among the quantites:

MSE(ô)= var(ô) + (B(ô))2

39
Q

What do we mean by a consistent estimator ?

A

A consistent sequence of estimators is a sequence of estimators that converge in probability to the quantity being estimated as the index (usually the sample size) grows without bound. In other words, increasing the sample size increases the probability of the estimator being close to the population parameter. Mathematically we have:

40
Q

What do we mean by a Asymptotically normal estimator ?

A

An asymptotically normal estimator is a consistent estimator whose distribution around the true parameter θ approaches a normal distribution with standard deviation shrinking in proportion to {\displaystyle 1/{\sqrt {n}}} as the sample size n grows. Mathematically we have:

41
Q

What do we mean by an efficient estimator?

A

An efficient estimator, is an estimator having the lowest variance. In other word, it extract the optimal amount of information from the data.

42
Q

What do we mean by a Robust estimator/statistic?

A

Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal.

43
Q

What do we mean by parametric statistics?

A

Parametric statistics is a branch of statistics which assumes that sample data come from a population that can be adequately modeled by a probability distribution that has a fixed set of parameters.

44
Q

What do we mean by non-parametric statistics?

A

Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distribution-free or having a specified distribution but with the distribution’s parameters unspecified. Nonparametric statistics includes both descriptive statistics and statistical inference.

45
Q

What is the difference between Parametric probability density estimation and nonparametric probability density estimation?

A

Parametric probability density estimation involves selecting a common distribution and estimating the parameters for the density function from a data sample.

Nonparametric probability density estimation involves using a technique to fit a model to the arbitrary distribution of the data, like kernel density estimation.

46
Q

What are the two step to estimate the pdf of a parametric distribution and how can we check if it is a good fit?

A

The shape of a histogram of most random samples will match a well-known probability distribution. The common distributions are common because they occur again and again in different and sometimes unexpected domains. Once identified, estimate the parameter of the distribution. For example, if it look like a normal we need the mean and variance.

To verify if it a good fit we can:

1) Plot the density function and comparing the shape to the histogram.
2) Sample the density function and comparing the generated sample to the real sample.
3) Use a statistical test to confirm the data fits the distribution.

47
Q

When should we perform a non-parametric density estimation?

A

In some cases, a data sample may not resemble a common probability distribution or cannot be easily made to fit the distribution.

This is often the case when the data has two peaks (bimodal distribution) or many peaks (multimodal distribution).

In this case, parametric density estimation is not feasible and alternative methods can be used that do not use a common distribution. Instead, an algorithm is used to approximate the probability distribution of the data without a pre-defined distribution, referred to as a nonparametric method.

48
Q

What is a kernel density estimation?

A

In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample.

49
Q

What is bootstrapping?

A

In statistics, bootstrapping is any test or metric that relies on random sampling with replacement. Bootstrapping allows assigning measures of accuracy to sample estimates.