Introduction to Statistics Flashcards
Ordinal
Categorical variable that can be ordered
1234 in a race, or ordering ones qualifications
Nominal
Categorical variable that cannot be ordered
Male,female, religious group, ethnic group
Population
All of the information that we are interested in
Interval
Metric variable where numbers are used to label and order, the intervals between the numbers are equal
Celsius or Fahrenheit, the interval still means something.
Ratio
Metric variable, numbers are used to label and order. Zero means the absence of something
Age, or numbers of answers In a test
Sample
A subset of all the information. Ideally representative of population
Sampling Bias
Any effect that makes our results non representative
Proportion Calculation
Frequency divided by total number
Variable
Anything that we want to measure that varies such as age, gender, vehicle type etc.
Metric Variable
Occurs naturally as numbers
Categorical Variable
Those that can be put into groups, numbers are assigned arbitrarily
Frequency
How many in each group
Valid percent
Not counting the missing amount, always quote the valid percent
Descriptive Statistics
The best way we can describe a variable or statistic
Which procedure for categorical data
Frequencies
Which procedure for metric data
Explore procedure
The Mean
The average
Add up all the numbers, divided by how many there are
The Median
The middle number,or 50% point
Standard Deviation
How spread the data is, the larger the number, the more spread the number
Minimum
The smallest number
Maximum
The largest number
Mode
The most common occurance
Histogram
Used for Metric data
Percentiles
The percentage of observations that are less than the stated value
Normal Distribution
Bell Curve,
Symmetric distribution,
Mean in centre
Area under the bell curve presents probabilities
68-95-99.7% Rule
One std deviation either side of the mean captures 68% of data,
Two std deviations either side of the mean captures 95% of data,
Three std deviations either side of the mean captures 99.7% of data.
What is the z - value?
Number of standard deviations away from the mean.
Z score formula
Value of interest, subtract the mean, divided by standard deviations.
When is a z score unusual?
When it is more than two std deviations from the mean.
Variance
Takes into account all of the data, not just the two end points.
Variance looks at how much each individual score differs from the mean. Squaring them, then averaging them
With percentile a what it the median?
The 50% point
In percentiles what is the first quartile?
The 25% percentile
In percentiles what is the third quartile?
The 75% percentile
Reporting Categorical Data
Sample Size, sample proportion / percentage, 95% confidence interval, anything else of interest
Reporting Metric Date
Shape, centre (mean / median), Spread, Outliers
What is Inference?
Taking information from a sample, inferring about a population from a sample.
What is a hypothesis?
Turning a research question into a statement. hypothesis is not a question. Hypothesis is to be tested
What is binomial test?
Looks at categorical data, specifically those with two categories, compares a percentage / proportion to a fixed value
What is one sample t-test?
For metric date, compares a mean to a fixed value.
What is the structure of a report?
Hypothesis - what is the sample being measured
Sample - sample size, who is in the sample?
Comparison -
Name of test -
Quote test statistics - if significance include 95% confidence
Conclusion - use appropriate language
What do we include when quoting the mean?
Standard deviation (s= )
When is a p-value significant?
When it’s below 0.05 (<0.05)
What do we include when reporting a t value?
t-value -
Degrees of freedom (df) -
P value -
t(115) = 2.453, p = .016
What is a p-value?
p value is probability that our test statistic takes the observed value or a value more extreme.
The smaller the p value, the stronger the evidence.