Statistics Theory L7 = Statistics Basics Flashcards by Palesa Mamabolo

Basic principles of statistics? (4)

PSDI diagram has statistical population parameters (μ, σ, β, δ) & sample parameters (x bar, s, b, cursive s).
Not only do we want the estimate, but we also want a measure of how good that estimate is.
We’re forced to look at the subset of the population, because the population is large & sometimes vaguely defined.
What is the consequence of this “subsettedness” on our ability to get a clear view of the population of interest?

How well did you know this?

Not at all

Perfectly

Scales of measurement? (4)

Ratio.
Interval.
Ordinal.
Nominal.

How well did you know this?

Not at all

Perfectly

Ratio attributes? (3)

Are meaningful.
Zero means absence.
Can be continuous data, or discrete/integer.

How well did you know this?

Not at all

Perfectly

Eg of how Ratios are meaning?

20 elephants = 0.5 x 40 elephants.

How well did you know this?

Not at all

Perfectly

Interval attributes? (3)

Differences are meaningful.
Zero is arbitrary (compare degree C & F).
Can be continuous data, or discrete/integer.

How well did you know this?

Not at all

Perfectly

Eg of how differences can be meaningful?

30 degrees - 10 degrees = 20 degrees.

How well did you know this?

Not at all

Perfectly

Ordinal attributes? (2)

Are categorical data.
Order of categories are meaningful.

How well did you know this?

Not at all

Perfectly

Eg of Ordinal?

Education level: primary, secondary, undergraduate, postgraduate.

How well did you know this?

Not at all

Perfectly

Nominal attributes? (2)

Are also categorical data.
There’s no inherent order.

How well did you know this?

Not at all

Perfectly

Egs of Nominal scale of measurement? (2)

Colour.
Sex.

How well did you know this?

Not at all

Perfectly

Why does the type of data matter? (3)

Affects the type of analysis we can do.
Affects interpretation of the analysis.
As we go down in scale (ratio -> nominal), the data contain less information. Therefore, where possible, we want to stick to higher scales.

How well did you know this?

Not at all

Perfectly

Population?

= the entire collection of entities about which we want to make an inference or draw a conclusion.

How well did you know this?

Not at all

Perfectly

Sample?

= subset of the population, drawn because we can’t measure all entities in the populations.

How well did you know this?

Not at all

Perfectly

Simple random sample?

= a sample of n entities drawn from a population so that each entity has the same chance of selection.

How well did you know this?

Not at all

Perfectly

Parameter?

= the true value of something in the population that we want to know about (usually Greek letters).

How well did you know this?

Not at all

Perfectly

Parametric statistics?

= statistical methods/models that focus on estimating parameters.

How well did you know this?

Not at all

Perfectly

Non-parametric statistics?

= statistical methods/models that don’t focus on estimated parameters.

How well did you know this?

Not at all

Perfectly

Statistic?

= any value calculated from sample data (usually Latin letters).

How well did you know this?

Not at all

Perfectly

Statistics?

= tools used to make conclusions/inferences about an unknown population from a known sample.

How well did you know this?

Not at all

Perfectly

Probability?

= tools used to make conclusions/inferences about an unknown sample from a known population.

How well did you know this?

Not at all

Perfectly

Model?

= an approximation/simplification of reality for the purpose of improving understanding.

How well did you know this?

Not at all

Perfectly

Statistical model?

= mathematical expression to summarise the relationship between a response (Y) variable & one or more explanatory (X) variables.

How well did you know this?

Not at all

Perfectly

Kinds of measurement? (2)

Measures of central tendency/location.
Measures of dispersion.

How well did you know this?

Not at all

Perfectly

Measures of central tendency/location? (7)

Mean.
Median.
Quantile (percentile).
Mode.
Symmetrical, unimodal distribution.
Skew.
Kurtosis.

How well did you know this?

Not at all

Perfectly

Types of Mean? (2)

- Population mean. - Sample mean/sample average.

Population mean (μ) equation?

μ = 1/N ∑ᵢ=₁N Yᵢ

Sample mean (ȳ) equation?

ȳ = 1/n ∑ᵢ=₁ⁿ Yᵢ

Median?

= middle value in an ordered/ranked/sorted sample.

Median attribute?

The 50th precentile.

Quantile (percentile)?

= a division of sorted data by the percentage of observations occurring below it.

Quantile (percentile) attributes? (2)

- Quartile: <25%, 25-50%, 50-75%, 75-100%. - Interquartile range (IQR): 25-75%.

Mode?

= most frequently occurring observation or grouping of observations in a sample.

Symmetrical, unimodal distribution?

= when the mean=median=mode.

Skew?

= when there is asymmetry in the distribution.

Types of Skew? (2)

- Positive skew. - Negative skew.

Positive skew attributes? (2)

- Long tail to the right. - Mode < Median < Mean.

Negative skew attributes? (2)

- Long tail to the left. - Mode > Median > Mean.

Types of Kurtosis? (3)

- Leptokurtic. - Mesokurtic. - Platykurtic.

Leptokurtic?

= narrow middle on distribution graph/curve.

Mesokurtic?

= normal distribution/in the middle.

Platykurtic?

= flat-ish but not (evenly spread out).

Measures of dispersion? (5)

- Variance. - Sum of squared deviations/errors. - Sample variance. - Standard deviation. - Coefficient of variation.

Types of variance? (2)

- Population variance. - Sample variance.

Population variance (σ²) equation?

σ² = 1/N ∑ᵢ=₁N (Yᵢ - Ȳ)²

Sample variance (s²) equation?

s² = 1/n-1 ∑ᵢ=₁n (Yᵢ - Ȳ)²

Why the -1 in the sample variance equation?

Reduces bias in the sample.

Sum of squared deviations/errors equation?

∑ (Yᵢ - Ȳ)²

Why the squaring in the Sum of squared deviations/errors equation?

Squaring removes the negative deviations.

Sample variation AKA? (3)

- Mean squared error. - Mean squared residual. - Mean squared deviation.

Standard deviation?

= the average deviation or difference between each observation & the mean.

Population standard deviation?

σ = square root of σ².

Sample standard deviation?

s = square root of s².

Coefficient of variation (CV)?

= for comparing variability in a sample when the means differ a lot between populations.

Coefficient of variation (CV) equation?

CV = s/ȳ x 100.

NB!! of the Measures of dispersion?

There's a difference between standard deviation & standard error.

Graphing attributes? (2)

- Help us to assess our data sets & their distributions. - Reveal mistakes/problems in data entry, or interesting patterns that aren't apparent in a numerical analysis.

Graphical methods? (3)

- Relative frequency histogram. - Stem-and-leaf plot. - Box-and-whisker plot.

Relative frequency histogram attributes? (3)

- X-axis = data values broken into bins or categories. - Y-axis = number or frequency of observations in each bin. - Area (height) of each bar is proportional to the number of observations.

Stem-and-leaf plot attributes? (4)

- Useful for getting a distribution & comparing two distributions. - Identify outliers. - Can see every observation. - Useful for quick assessment of distribution in the field.

Standard distribution VS Standard error?

- Standard distribution = width of the data from the sample. - Standard error = width of the distribution of the mean.

Box-and-whisker plot attributes? (5)

- Help us assess spread, skewness, outliers & to compare groups. - Whole box = IQR (25-75%). - Dark line in box = Median. - Outlier = extreme observation. - Positive skew = box is at the bottom (low).

Things to note on Statistics in terms of histograms? (2)

- Large sample = what we want, more narrow, low SE, high precision. - Small sample = what we don't want, more broad, high SE, low precision.

Main lesson under Statistics basics?

Statistical inference deals with uncertainty or variability, in the data & in the things we try to estimate with data.

What is the purpose of SE?

To show how much the sample mean is likely to differ from the true population mean (precision).

Statistics Theory L7 = Statistics Basics Flashcards

(64 cards)