Introductory statistics Flashcards
Statistics Aim
• describe and summarize the data
• visualize trends for better understanding
• make conclusions from the sample to the population
(there are always differences …)
- decide if the sample is signicantly different in
comparison to the population (one sample tests)
- decide whether two groups should be taken as different
(two sample tests)
- describe the relationship between two variables in the
population (correlation)
basic sampling strategies:
simple random sampling
systematic sampling
stratified sampling
simple random sampling (SRS, like rolling a dice …)
systematic sampling (every k.th - 10.th - person, ordered before by height or similar things)
stratied sampling (making subgroups based on categories, male, females, smokers, non-smokers)
correlational vs experimental research
• correlational research
{ look what’s happen in nature
{ we don’t manipulate a variable
{ does reading books helps learning
{ we just collect answers for reading behaviour
{ we compare grades in relation to reading behaviour
• experimental research
{ we manipulate a variable
{ divide our sample for reading in two groups
{ one group must read statistic books
{ other group is not allowed to read statistic books
{ after a month we summarize and compare both groups
with regards to gain in knowledge
{ sometimes experimental research is not possible :(
{ examples: with small lovely cats, with smoking
Inferential vs Descriptive Statistics
Descriptive statistics are used to describe the main features of
a collection of data in quantitative terms. Descriptive statistics are distinguished from inferential statistics (or inductive statistics), in that descriptive statistics aim to quantitatively summarize a data set, rather than being used to support inferential statements about the population that the data are thought to represent. … to give the audience an overall sense of the data being analyzed.
Statistical inference or statistical induction comprises the use of statistics and random sampling to make inferences concerning
some unknown aspect of a population.
parameters vs statistics
• population is characterized by parameters
• sample is characterized by statistics
• we use the sample to estimate the parameters of the population
• parameters and statistics have the same name but mean different things
• center: mean m(mit strich) of sample mean ist ca µ
–> sample mean m is the unbiased estimator for the
parameter µ, mean of the population
Anova and Ancova
• comparing two means with normal data –> t.test
• comparing more than three means ñ anova, aov
• one-way anova (n ~c) !!
- first variable normal numeric (non-normal kruskal-test)
- second variable categorical, more than two levels
- second variable numerical (n ~n) correlation !!
• two-way anova (n ~c + c)
-fi rst variable normal numeric
- second variable categorical, more than two levels
- third categorical variable
• ancova (n ~c + n)
- first variable normal numeric
- second variable categorical
- third variable (c)ontinuous numerical
If ANOVA is significant: which pair differ?
aov and pairwise.t.test Post-hoc tests:
• pairwise.t.test (Calculate pairwise comparisons between group levels with corrections for multiple testing
• TukeyHDS
aov and TukeyHSD
–> Tukey Honest Signicant Differences
TukeyHSD Create a set of confidence intervals on the
differences between the means of the levels of a factor with the specified familywise probability of coverage. The intervals are based on the Studentized range statistic, Tukey’s
“Honest Significant Difference”method.