Module 1: Descriptive Statistics & Estimation Flashcards

1
Q

Define sample.

A

A sample is a representative portion of a population that makes statistical analysis practical. However, sampling brings uncertainty since it is not a true representation of the entire population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define statistics.

A

Statistics is the study of methods for measuring aspects of populations from samples and for quantifying the uncertainty of these measurements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define estimation.

A

Estimation is the process of inferring an unknown quantity of a target population using sample data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define population parameter.

A

Parameters are quantities that describe some truth about a population, such as averages and proportions. Statistics involves the estimation of these parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define sample statistic.

A

A sample statistic is any value calculated from a sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define statistical population.

A

A population is the entire collection of individual units of interest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define sampling unit.

A

A sampling unit is whatever basic unit the data points of a sample are defined by. This may be actual individuals of 1, or individual groups or selections.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define sampling error.

A

Sampling error refers to the chance difference between an estimate describing a sample and the corresponding parameter of the whole population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the goal of sampling?

A

Sampling aims to increase the precision and accuracy of estimates and to ensure that it is possible to quantify these outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define precision.

A

Precision refers to how consistent estimates are when compared to each other. Larger samples are less affected by chance, therefore allowing for more precise estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define accuracy.

A

Accuracy refers to how close estimates are to the true population characteristic. If an estimate is unbiased, estimates will be more accurate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define bias.

A

Bias is a systematic discrepancy between estimates obtained repeatedly and the true population characteristic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define random sample.

A

Random samples contain individuals of a population that have had an equal and independent chance of being selected. The use of random samples minimizes bias and allows the sampling error to be measured.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define independent observation.

A

Independent observation assumes that no value from a sample can be inferred from any other value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define sample of convenience.

A

A sample of convenience is a collection of individuals that are easily available for the researcher, but very likely to be biased and not selected independently. A sample of convenience can be a random sample, but is likely not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define volunteer bias and provide examples.

A

Volunteer bias is the systematic differences between volunteer samples and their populations, which occurs when the behaviour of subjects impacts their chance of being sampled.

ex. sicker volunteers for medical studies, liberal volunteers for sex studies, animals willing to be trapped

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Define variables.

A

Variables are any characteristics or measurements that differ across a group of individuals. Estimates are technically variables, since they differ across samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Define descriptive statistics.

A

Descriptive statistics are quantities that capture important features of frequency distributions, which is done by measuring the location and spread of a given distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Define cumulative frequency distribution.

A

Cumulative frequency distributions show all the quantiles within a sample on a graph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Define standard error.

A

The standard error of an estimate is the standard deviation of the estimate’s sampling distribution, which means it reflects the precision of an estimate.

21
Q

Define sampling distribution.

A

A sampling distribution is the probability distribution of all the values for an estimate that could possibly have been obtained.

22
Q

Define probability distribution.

A

A probability distribution describes the distribution of a variable across an entire population. Since this is almost never known in nature, it is approximated by the normal distribution or bell curve.

23
Q

Define frequency distribution.

A

A frequency distribution describes how many times each value occurs within a sample.

24
Q

Define confidence interval.

A

A confidence interval is a range of values surrounding the sample estimate that is likely to contain the population parameter. The values beyond the lower and upper limits are less plausible values for the parameter.

25
Q

What is the difference between explanatory and response variables?

A

Explanatory (independent) variables predict or affect response (dependent) variables. In experiments, the explanatory variable is manipulated to measure the effect on the response variable. There may be more than one of each variable in more complex experiments.

26
Q

What is the difference between experimental and observational studies?

A

Experimental studies involve the manipulation of variables by the researcher (ex. clinical trial), which allows cause-and-effect relationships to be determined. Observational studies can only point to associations, since the researcher has no control over different treatment groups.

27
Q

What are the properties of good samples?

A
28
Q

How are random samples obtained?

A

Every individual within a population is assigned a number, then a computer is used to randomly generate which of these individuals will be sampled (n).

29
Q

How are types of data classified?

A

Data may be categorical (qualitative) or numerical (quantitative). If data is categorical, it may be nominal (no inherent organization) or ordinal (consecutive). If data is numerical, it may be discrete (indivisible, finite values) or continuous (range of infinite values).

30
Q

How can you distinguish between frequency, probability, and sampling distributions?

A
31
Q

How do you calculate variance?

A

Variance () is measured by squaring the sum of each deviation from the mean (∑(Y-Ybar)²), then dividing this by n-1.

32
Q

How do you calculate residual deviation?

A
33
Q

How do you calculate sums of squares?

A

The sum of squares of Y is calculated by adding each value’s deviation from the mean.

34
Q

How do you calculate standard deviation?

A

Standard deviation (s) measures how far observations are from the mean. It is calculated by taking the square root of variance.

35
Q

How do you calculate arithmetic sample mean?

A

The sample mean (Ybar) is the average of the measurements in the sample. It is calculated by dividing the sum of all individual values (∑Y) by the number of individuals (n).

36
Q

How do you calculate coefficient of variation?

A

The coefficient of variation is the standard deviation expressed as a percentage of the mean. To calculate CV, divide standard deviation (s) by the mean (Ybar) and multiply by 100%. This allows different values to be compared even if they do not have the same units or relativity.

37
Q

How do you calculate interquartile range?

A

Interquartile range refers to the span of the middle half of the data, from the first quartile to the third quartile. It is the result of subtracting the first quartile from the third quartile. To obtain these values: multiply n by the target quartile (ex. 0.25) to determine whether j is an integer. If it is an integer, then the quartile is the average of that integer and the next one up. If it is not, then the quartile is the rounded up version of the number.

38
Q

How do you calculate median?

A

The median is the middle observation in a set of data. It is equivalent to whatever value is at either (n+1)/2 (if n is odd) or [(n/2)+(n/2+1)] (if n is even).

39
Q

How do you calculate percentiles?

A

The percentile of a measurement specifies the percentage of observations less than or equal to it.

40
Q

How do you calculate quantiles?

A

Quantiles are the same measurement as percentiles, just displayed as decimals instead of percentages. To calculate a quantile, multiply n by the target quantile (ex. 0.25) to determine whether j is an integer. If it is an integer, then the quantile is the average of that integer and the next one up. If it is not, then the quantile is the rounded up version of the number.

41
Q

How do you calculate standard error of the mean?

A

Standard error of the mean is equal to the sample standard deviation (σ) divided by the square root of n.

42
Q

How are residuals, sums of squares, variances, and standard deviations related to one another?

A
43
Q

What are the advantages and disadvantages of residuals?

A
44
Q

What are the advantages and disadvantages of sums of squares?

A
45
Q

What are the advantages and disadvantages of variances?

A
46
Q

What are the advantages and disadvantages of standard deviations?

A
47
Q

How are Greek and Roman letters used to distinguish between population parameters and sample statistics?

A

Sample statistics are written in Roman letters, whereas population parameters are denoted by Greek letters.

48
Q

What happens to the precision of the sampling distribution as the sample size increases?

A

Increasing sample size reduces the spread of the sampling distribution, which means precision increases.