Stats Midterm (L2 - L12) Flashcards

1
Q

What is the difference between sample and population?

A

The population refers to the entire group of individuals, items, or data points of interest in a specific study.

A sample is a subset of the population, representing a smaller group of individuals or data points selected from the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe the principles of statistics.

A

Sample —> Empirical Distribution —> Theoretical Distribution —-> Population

Lecture 2, Slide 9

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the four key components of descriptive statistics?

A
  1. Measures of central tendency.
  2. Measures of Dispersion (variability)
  3. Measures of distribution shape

ChatGPT Added
4. Frequency Distribution

Lecture 2: Slide 13

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define hypothesis testing.

A

Hypothesis testing is a method used to evaluate two competing claims about a population based on sample data, with the goal of determining whether there is enough evidence to reject the null hypothesis in favor of the alternative hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

With regard to hypothesis testing, what is the significance level (α)?

A

The significance level (α) is the probability of rejecting the null hypothesis when it is actually true (a false positive).

Extra Notes: A common value is 0.05, meaning there is a 5% risk of rejecting the null hypothesis when it is true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

With regard to hypothesis testing, what is the p value?

A

The p-value in hypothesis testing is the probability of observing the data (or something more extreme) assuming that the null hypothesis H0 is true. It quantifies the strength of the evidence against the null hypothesis: the smaller the p-value, the stronger the evidence that the null hypothesis may be incorrect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the relationship between p and α when rejecting or failing to reject the null hypothesis?

A

p < α —> reject the null hypothesis
p > α —> do not reject the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the difference between a histogram and a cumulative histogram?

A

Histogram: Shows the individual frequency for each bin.

Cumulative Histogram: Shows the running total (cumulative frequency) for each bin and those that came before it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

True or False:
The mean is sensitive to outliers.

A

True.

Lecture 2 - Slide 16

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe how a histogram could be used to find the median.

A

A histogram can be used to estimate the median of a dataset by visually analyzing the distribution of data across the bins. The median is the value that separates the dataset into two equal halves, meaning 50% of the data points lie below it and 50% lie above it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

True or False:
The standard deviation is a measure of dispersion.

A

True.

Lecture 2 - slide 24

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define DOF with respect to statistics.

A

The DOF is the number of values in the final calculation of a statistic that are free to vary.

Lecture 2 - slide 27

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the two basic measures of general shape?

A
  1. Skewness
  2. Kurtosis

Lecture 2- slides 30 to 33

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

A normal distribution has a kurtosis of ________.

A

3

Lecture 2- Slide 33

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define the probability density function (PDF).

A

The PDF f(x) defines the probabiity that the variate f has the probability x.

Lecture 2 - slide 37

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define the cumulative distribution function (CDF).

A

The CDF F(x) is the sum of a discrete PDF or the integral of a continuous one.

It tells you the probability that F takes on a value of less than x.

Lecture 2 -slide 37

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Define the interquartile range (IQR).

A

The interquartile range (IQR) is a measure of statistical dispersion that represents the range within which the central 50% of the data points lie. It is calculated as the difference between the third quartile (Q3) and the first quartile (Q1) of a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Define a uniform distribution.

A

A uniform distribution is a type of probability distribution where all outcomes are equally likely within a specified range. In a uniform distribution, each value within the range has the same probability of occurring. The distribution is “flat” because no single outcome is favored over others.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Define the binomial distribution.

A

The binomial distribution is a discrete probability distribution that describes the likelihood of obtaining a fixed number of successes in a specified number of independent trials of a binary experiment, where each trial has only two possible outcomes: success or failure. It is widely used in situations where the outcome of interest is a count of successes over multiple attempts.

Source:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Describe a Poisson distribution.

A

The Poisson distribution expresses the probability of a given number of events occuring in a fixed interval (of space or time), if these events occur with a known constant mean rate and independantly of the time since the last event.

Lecture 3- slide 24

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

True or False:
The rate must be constant for a Poisson distribution to hold.

A

True.

Lecture 3 - slide 24

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

True or False
As the number of trials goes to infinity and the sucess probability p goes to zero, the binomial distribution approaches the Poisson distribution.

A

True.

Lecture 3 - slide 25

23
Q

What is the z-distribution?

A

When the standard normal distribution has a mean of zero and a standard deviation of one.

Lecture 3 - slide 31

24
Q

When is the logarithmic-normal distribution appropriate?

A

When the data have a lower limit (precip, frequency of earhtquakes etc.) in which the distribution has significant skewness.

Lecture 3 - slide 39

25
Q

Describe students T distribution.

A

The Student’s t-distribution (or simply the t-distribution) is a probability distribution used in statistics that is similar to the normal distribution but accounts for small sample sizes or when the population standard deviation is unknown. It plays a critical role in hypothesis testing and confidence intervals, particularly in situations where the sample size is small.

26
Q

What is the difference between the tails of the normal and the T-distribution?

A

The t-distribution is symmetrical and bell-shaped, similar to the normal (Gaussian) distribution. However, it has heavier tails, meaning it has more probability mass in the tails than the normal distribution. This reflects a higher likelihood of extreme values occurring, particularly in small samples.

27
Q

True or False:
As the dof increase, the T-distribution approaches the normal distribution.

A

True.

28
Q

The mean of the t-distribution is _____.

A

zero

29
Q

What is the difference between the one sample t-test and the two sample t-test?

A

One-sample t-test: Used to determine if the mean of a single sample is significantly different from a known or hypothesized population mean.

Two-sample t-test: Used to compare the means of two independent samples to see if they are significantly different from each other.

30
Q

Define a “confidence interval.”

A

A confidence interval (CI) is a range of values, derived from sample data, that is likely to contain the true population parameter (such as the population mean) with a certain level of confidence.

It provides an estimate of the uncertainty or variability in a statistic, helping to indicate how well the sample data represent the entire population.

31
Q

The chi^2 test is mainly used for ______.

A

Statistical hypothesis testing.

32
Q

What is the F-distribution primarily used for?

A

The F-distribution is a continuous probability distribution that arises frequently in the context of analysis of variance (ANOVA), regression analysis, and hypothesis testing, especially when comparing variances of two different populations. It is primarily used to test whether the variances of two populations are equal.

33
Q

What is the purpose of the z-score?

A

The z-score is used to measure how many standard deviations an individual data point is away from the mean of the distribution. It helps to standardize different data points, allowing comparisons across different normal distributions.

34
Q

When calculating the confidence interval, when should the t-test be used? The z-test?

A

> The T-Test should be used when the population standard deviation is unkonwn and the sample size is small.

> The Z-test should be used when the population standard deviation is known or if the sample size is large.

35
Q

Define ANOVA.

A

ANOVA (Analysis of Variance) is a statistical method used to compare the means of three or more groups to determine whether there are statistically significant differences between them. It helps to assess whether the variation in a dataset is due to differences in group means or due to random chance.

36
Q

Describe spectral analysis.

A

An equivalent representation of a data trend using a sum of different periodic function.

Lecture 7 - slide 45

37
Q

What is a corelation coefficient?

A

The correlation coefficient measures the strength and direction of the linear relationship between two variables.

38
Q

What is Bivariate statistics?

A

Bivariate statistics is a branch of statistics that involves the analysis of two variables simultaneously to understand the relationship between them.

39
Q

Define “Deviation.”

A

The difference between a random variable and its expected value.

Lecture 9 - slide 6

40
Q

Define “Variance.”

A

Expected value of the square of the deviation of a random variable. In other words, the square of the standard deviation.

Lecture 9 - Slide 6

41
Q

Define “residuals.”

A

Data - model prediction.

42
Q

Generally describe both bootstrapping and and jackknifing.

A

Jackknifing and bootstrapping are both resampling techniques used in statistics to estimate the properties (e.g., standard error, bias, confidence intervals) of a sample statistic, particularly when the underlying distribution of the data is unknown or difficult to calculate analytically. While both methods are based on resampling, they differ in how the resampling is performed and in their applications.

43
Q

Describe how Jackknifing works.

A

Jackknife involves systematically leaving out one or more observations from the sample at a time to create different subsamples.
For a sample size of n, it creates n subsamples, where each subsample consists of n−1 data points, excluding a single observation at a time.

The statistic of interest (e.g., mean, variance) is recalculated for each subsample, and the results are used to estimate properties like the variance or bias of the original statistic.

44
Q

Describe how Bootstrapping works.

A

Bootstrapping involves drawing random samples with replacement from the original dataset to create multiple new datasets, called bootstrap samples.

For a sample size of n, you create many new samples (often 1,000 or more), each of size n, by randomly selecting data points from the original sample, allowing for repetition (i.e., the same data point can appear multiple times in a bootstrap sample). The statistic of interest is calculated for each bootstrap sample, and these estimates are used to derive the distribution of the statistic, as well as properties like the standard error, confidence intervals, and bias.

45
Q

Define the samping interval in a time series.

A

The time spacing between two samples.

46
Q

Define the sampling frequency between two samples in a time series.

A

1 over the sampling interval.
f = 1/dt

47
Q

True or False:
An integral transform is best thought of as a continuous analog to a dot product.

A

True.

Lecture 10 - slide 9

48
Q

A Fourier transform of a time series is anaogously a set of ___________________

A

Complex valued coefficients indexed by frequency.

49
Q

True or False:
Fourier transforms are linear operators.

A

True

Lecture 10 - slide 21

50
Q

Convolution is __________ in fourier space.

A

Multiplication.

Lecture 10 - slide 25

51
Q

True or False:
Periodic extension with the comb function leads to the Fourier series.

A

True

Lecture 10 - slide 48

52
Q

What is “band limited” mean?

A

A signal is said to be band limited if the amplitude of its spectrum goes to zero for all frequencies beyond some threshold called the cutoff frequency.

Lecture 11 - slide 3

53
Q

The nyquist frequency is calculated as f_N = ________

A

f_N = 1 / 2\Delta t