Statistics Theory L7 = Statistics Basics Flashcards

1
Q

Basic principles of statistics? (4)

A
  • PSDI diagram has statistical population parameters (μ, σ, β, δ) & sample parameters (x bar, s, b, cursive s).
  • Not only do we want the estimate, but we also want a measure of how good that estimate is.
  • We’re forced to look at the subset of the population, because the population is large & sometimes vaguely defined.
  • What is the consequence of this “subsettedness” on our ability to get a clear view of the population of interest?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Scales of measurement? (4)

A
  • Ratio.
  • Interval.
  • Ordinal.
  • Nominal.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Ratio attributes? (3)

A
  • Are meaningful.
  • Zero means absence.
  • Can be continuous data, or discrete/integer.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Eg of how Ratios are meaning?

A

20 elephants = 0.5 x 40 elephants.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Interval attributes? (3)

A
  • Differences are meaningful.
  • Zero is arbitrary (compare degree C & F).
  • Can be continuous data, or discrete/integer.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Eg of how differences can be meaningful?

A

30 degrees - 10 degrees = 20 degrees.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Ordinal attributes? (2)

A
  • Are categorical data.
  • Order of categories are meaningful.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Eg of Ordinal?

A

Education level: primary, secondary, undergraduate, postgraduate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Nominal attributes? (2)

A
  • Are also categorical data.
  • There’s no inherent order.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Egs of Nominal scale of measurement? (2)

A
  • Colour.
  • Sex.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why does the type of data matter? (3)

A
  • Affects the type of analysis we can do.
  • Affects interpretation of the analysis.
  • As we go down in scale (ratio -> nominal), the data contain less information. Therefore, where possible, we want to stick to higher scales.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Population?

A

= the entire collection of entities about which we want to make an inference or draw a conclusion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Sample?

A

= subset of the population, drawn because we can’t measure all entities in the populations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Simple random sample?

A

= a sample of n entities drawn from a population so that each entity has the same chance of selection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Parameter?

A

= the true value of something in the population that we want to know about (usually Greek letters).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Parametric statistics?

A

= statistical methods/models that focus on estimating parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Non-parametric statistics?

A

= statistical methods/models that don’t focus on estimated parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Statistic?

A

= any value calculated from sample data (usually Latin letters).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Statistics?

A

= tools used to make conclusions/inferences about an unknown population from a known sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Probability?

A

= tools used to make conclusions/inferences about an unknown sample from a known population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Model?

A

= an approximation/simplification of reality for the purpose of improving understanding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Statistical model?

A

= mathematical expression to summarise the relationship between a response (Y) variable & one or more explanatory (X) variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Kinds of measurement? (2)

A
  • Measures of central tendency/location.
  • Measures of dispersion.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Measures of central tendency/location? (7)

A
  • Mean.
  • Median.
  • Quantile (percentile).
  • Mode.
  • Symmetrical, unimodal distribution.
  • Skew.
  • Kurtosis.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Types of Mean? (2)

A
  • Population mean.
  • Sample mean/sample average.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Population mean (μ) equation?

A

μ = 1/N ∑ᵢ=₁N Yᵢ

27
Q

Sample mean (ȳ) equation?

A

ȳ = 1/n ∑ᵢ=₁ⁿ Yᵢ

28
Q

Median?

A

= middle value in an ordered/ranked/sorted sample.

29
Q

Median attribute?

A

The 50th precentile.

30
Q

Quantile (percentile)?

A

= a division of sorted data by the percentage of observations occurring below it.

31
Q

Quantile (percentile) attributes? (2)

A
  • Quartile: <25%, 25-50%, 50-75%, 75-100%.
  • Interquartile range (IQR): 25-75%.
32
Q

Mode?

A

= most frequently occurring observation or grouping of observations in a sample.

33
Q

Symmetrical, unimodal distribution?

A

= when the mean=median=mode.

34
Q

Skew?

A

= when there is asymmetry in the distribution.

35
Q

Types of Skew? (2)

A
  • Positive skew.
  • Negative skew.
36
Q

Positive skew attributes? (2)

A
  • Long tail to the right.
  • Mode < Median < Mean.
37
Q

Negative skew attributes? (2)

A
  • Long tail to the left.
  • Mode > Median > Mean.
38
Q

Types of Kurtosis? (3)

A
  • Leptokurtic.
  • Mesokurtic.
  • Platykurtic.
39
Q

Leptokurtic?

A

= narrow middle on distribution graph/curve.

40
Q

Mesokurtic?

A

= normal distribution/in the middle.

41
Q

Platykurtic?

A

= flat-ish but not (evenly spread out).

42
Q

Measures of dispersion? (5)

A
  • Variance.
  • Sum of squared deviations/errors.
  • Sample variance.
  • Standard deviation.
  • Coefficient of variation.
43
Q

Types of variance? (2)

A
  • Population variance.
  • Sample variance.
44
Q

Population variance (σ²) equation?

A

σ² = 1/N ∑ᵢ=₁N (Yᵢ - Ȳ)²

45
Q

Sample variance (s²) equation?

A

s² = 1/n-1 ∑ᵢ=₁n (Yᵢ - Ȳ)²

46
Q

Why the -1 in the sample variance equation?

A

Reduces bias in the sample.

47
Q

Sum of squared deviations/errors equation?

A

∑ (Yᵢ - Ȳ)²

48
Q

Why the squaring in the Sum of squared deviations/errors equation?

A

Squaring removes the negative deviations.

49
Q

Sample variation AKA? (3)

A
  • Mean squared error.
  • Mean squared residual.
  • Mean squared deviation.
50
Q

Standard deviation?

A

= the average deviation or difference between each observation & the mean.

51
Q

Population standard deviation?

A

σ = square root of σ².

52
Q

Sample standard deviation?

A

s = square root of s².

53
Q

Coefficient of variation (CV)?

A

= for comparing variability in a sample when the means differ a lot between populations.

54
Q

Coefficient of variation (CV) equation?

A

CV = s/ȳ x 100.

55
Q

NB!! of the Measures of dispersion?

A

There’s a difference between standard deviation & standard error.

56
Q

Graphing attributes? (2)

A
  • Help us to assess our data sets & their distributions.
  • Reveal mistakes/problems in data entry, or interesting patterns that aren’t apparent in a numerical analysis.
57
Q

Graphical methods? (3)

A
  • Relative frequency histogram.
  • Stem-and-leaf plot.
  • Box-and-whisker plot.
58
Q

Relative frequency histogram attributes? (3)

A
  • X-axis = data values broken into bins or categories.
  • Y-axis = number or frequency of observations in each bin.
  • Area (height) of each bar is proportional to the number of observations.
59
Q

Stem-and-leaf plot attributes? (4)

A
  • Useful for getting a distribution & comparing two distributions.
  • Identify outliers.
  • Can see every observation.
  • Useful for quick assessment of distribution in the field.
60
Q

Standard distribution VS Standard error?

A
  • Standard distribution
    = width of the data from the sample.
  • Standard error
    = width of the distribution of the mean.
61
Q

Box-and-whisker plot attributes? (5)

A
  • Help us assess spread, skewness, outliers & to compare groups.
  • Whole box = IQR (25-75%).
  • Dark line in box = Median.
  • Outlier = extreme observation.
  • Positive skew = box is at the bottom (low).
62
Q

Things to note on Statistics in terms of histograms? (2)

A
  • Large sample = what we want, more narrow, low SE, high precision.
  • Small sample = what we don’t want, more broad, high SE, low precision.
63
Q

Main lesson under Statistics basics?

A

Statistical inference deals with uncertainty or variability, in the data & in the things we try to estimate with data.

64
Q

What is the purpose of SE?

A

To show how much the sample mean is likely to differ from the true population mean (precision).