Introduction to Statistics Flashcards
Ordinal
Categorical variable that can be ordered
1234 in a race, or ordering ones qualifications
Nominal
Categorical variable that cannot be ordered
Male,female, religious group, ethnic group
Population
All of the information that we are interested in
Interval
Metric variable where numbers are used to label and order, the intervals between the numbers are equal
Celsius or Fahrenheit, the interval still means something.
Ratio
Metric variable, numbers are used to label and order. Zero means the absence of something
Age, or numbers of answers In a test
Sample
A subset of all the information. Ideally representative of population
Sampling Bias
Any effect that makes our results non representative
Proportion Calculation
Frequency divided by total number
Variable
Anything that we want to measure that varies such as age, gender, vehicle type etc.
Metric Variable
Occurs naturally as numbers
Categorical Variable
Those that can be put into groups, numbers are assigned arbitrarily
Frequency
How many in each group
Valid percent
Not counting the missing amount, always quote the valid percent
Descriptive Statistics
The best way we can describe a variable or statistic
Which procedure for categorical data
Frequencies
Which procedure for metric data
Explore procedure
The Mean
The average
Add up all the numbers, divided by how many there are
The Median
The middle number,or 50% point
Standard Deviation
How spread the data is, the larger the number, the more spread the number
Minimum
The smallest number
Maximum
The largest number
Mode
The most common occurance
Histogram
Used for Metric data
Percentiles
The percentage of observations that are less than the stated value
Normal Distribution
Bell Curve,
Symmetric distribution,
Mean in centre
Area under the bell curve presents probabilities
68-95-99.7% Rule
One std deviation either side of the mean captures 68% of data,
Two std deviations either side of the mean captures 95% of data,
Three std deviations either side of the mean captures 99.7% of data.
What is the z - value?
Number of standard deviations away from the mean.
Z score formula
Value of interest, subtract the mean, divided by standard deviations.
When is a z score unusual?
When it is more than two std deviations from the mean.
Variance
Takes into account all of the data, not just the two end points.
Variance looks at how much each individual score differs from the mean. Squaring them, then averaging them
With percentile a what it the median?
The 50% point
In percentiles what is the first quartile?
The 25% percentile
In percentiles what is the third quartile?
The 75% percentile
Reporting Categorical Data
Sample Size, sample proportion / percentage, 95% confidence interval, anything else of interest
Reporting Metric Date
Shape, centre (mean / median), Spread, Outliers
What is Inference?
Taking information from a sample, inferring about a population from a sample.
What is a hypothesis?
Turning a research question into a statement. hypothesis is not a question. Hypothesis is to be tested
What is binomial test?
Looks at categorical data, specifically those with two categories, compares a percentage / proportion to a fixed value
What is one sample t-test?
For metric date, compares a mean to a fixed value.
What is the structure of a report?
Hypothesis - what is the sample being measured
Sample - sample size, who is in the sample?
Comparison -
Name of test -
Quote test statistics - if significance include 95% confidence
Conclusion - use appropriate language
What do we include when quoting the mean?
Standard deviation (s= )
When is a p-value significant?
When it’s below 0.05 (<0.05)
What do we include when reporting a t value?
t-value -
Degrees of freedom (df) -
P value -
t(115) = 2.453, p = .016
What is a p-value?
p value is probability that our test statistic takes the observed value or a value more extreme.
The smaller the p value, the stronger the evidence.
How is the p-value quoted?
Not with the zero in front of the decimal,
Always quote the tree numbers. p=.115 only with carrot when we are
Say below .001 ( <.001)
What is sampling variation?
The difference between sampling.
What are the underlying propositions of sampling theory?
Normal distribution -
The mean of the sample proportion ( or sample mean) equals population proportion (or population mean)
Standard deviation of sample distribution depends on the size of the sample
What is in the centre of a sampling distribution?
The proportion in the population
In sampling distribution, what does the p-value represent?
The area outside the 95% markers. The 5% probability.
What defines experimental design?
When the researcher is able to manipulate the IV.
We can then have causal conclusions.
What defines observational design?
We are just observing what happens,
Not manipulating the IV.
No causal conclusions.
When I observational research conducted?
When the researcher is unable to conduct experimental study,
Or it is unethical.
What can we determine from observational (correlational) design?
We cannot determine something for certain,
We cannot make definitive statements.
What is a nuisance variable?
A variable that correlates (might effect) the dependant variable,
The IV is NEVER a nuisance variable,
Nuisance variable must vary
What is a subject nuisance variable?
Associated with the participant, age, gender, driving experience etc.
What is a situational nuisance variable?
Accociated with the conditions of the experiment.
What is repeated measures design?
When the same participants are used for both conditions.
What is matched pairs design?
Two separate groups with where the participants are matched as similar to one another as possible.
What is independent groups design?
Groups are randomly separated.
What is the best way to deal with nuisance variables?
To hold them constant.
What is a confounding factor?
A variable that alters the logic of the experiment by being correlated to the IV and DV.
What is simple random sampling?
A random sample with an arbitrary starting position. Random numbers are drawn to select the sample.
What is stratified sampling?
Where the population comprises subgroups.
What is multi stage sampling?
Where we combine different sampling methods.
What is cluster sampling?
Population has some kind of natural (ideally homogenous) group (cluster),
Eg: all Victorians = clutter would be local government area. Sample within the cluster.
What is systematic sampling?
From a random starting point, sample every Kth item.
What are we looking for in DV and IV?
Cause and effect
What is another name for prediction?
Hypothesis
What is another word for correlation?
Observation.
How do nuisance variable effect our results?
They mask or hide the effects of the independent variable,
They destroy the logic of an experiment.
What is an independent sample t - test?
Compares sample means for two groups, making inference
What are the assumptions of a independent t-test?
DV is metric,
Independence of observations,
Both samples must come from normal distribution,
Equal Variance, both sample should have similar spread
What does a t-value round to?
The t value is rounded to two decimal places
When quoting how is the 95% interval rounded?
In relation to the sample rounding.
What is a paired samples t test
Used to test the relationships when we have repeated measures or matched pairs research design.
What do we want to infer about a sample?
Something about the population
How do we report paired sample t test?
With the mean for each group first, then the sample mean difference xd.
What does the 95% confidence interval allow us to do?
Infer about the population
What indicates significance?
The p value and the means
What are the assumptions of a paired sample t test
Metric data
Independence of observations
Normality
What does a p value represent?
The probability in that the sample can say something about the population.
What is correlation?
Looking at the relationship between two metric variables
Where is the independent variable on the scatterplot?
On the x axis (horizontal)
Where is the dependant variable on the scatterplot?
On the y axis (vertical)
How do we describe scatterplots?
Direction
Form
Strength
Outliers
What does correlation not mean?
Causation
Pearson’s R correlation coefficient?
The measure of the strength of a linear association between two metric variables
When can’t Pearson’s R apply?
When the form is non linear (curved)
Co efficient of determination R2?
Tells us more about the relationship between two variables
Co efficient of determination R2 formula?
Example: .123 x .123
R squared
How do we interpret R2?
Example: .085
8.5%
.123 12.3%
How do we interpret the 95% confidence interval for correlations?
Indicates that in the population the strength of the linear relationship is between …
Spurious Correlation
Where we have strong positive correlation where it does not make sense, sometimes a third factor.
The symbol for correlation in the population?
Rho,
Looks like a p
Positive relationship?
Upwards from left to right.
More of IV means more of DV
Negative relationship?
Downwards from left to right
More of IV means less of DV
What is the strength of a relationship?
An indication of how well you can predict the value of the DV when you know the value of the IV
What is this p value telling me? p = .005
That there is 5 chances In 1000
What is this p value telling me? p = .050
That there is 5 chances in 100
What is the regression equation?
Y = a + b x X
What does Y represent in regression equation?
Dependant Variable
What does a represent in regression equation?
A constant known as the vertical intercept
What does b represent in the regression calculation?
Slope or regression coefficiant.
What does X represent in regression calculation?
Independent variable.
Regression?
To calculate the linear relationship
What is the constant?
The vertical intercept
In a report, was is the conclusion trying to tell us?
What evidence we are looking for to draw a conclusion.
What is a causal conclusion?
That a change in the IV will produce a change in the DV
What is x2
Chi squared (pronounced ki) testing the relationship between two categorical variables.
What does the chi squared test measure?
The relationship between the two measured variables, not the difference like in some tests, categorical variables.
What is a parametric test?
There is a specific population parameter that we are trying to estimate using the sample statistic
What is a non parametric test?
A test that does not measure the relationship between sample and population. Simply measures significance.