Sampling Statistics Flashcards
What is a population?
- All people (or items, locations, etc.) of interest
- Who you want your results be relevant for, generalize to
- Can be large (i.e. all 4-year-old children who are English-Spanish bilinguals) or relatively small (i.e. all children in a particular education center)
What is a sample?
- The individuals actually in your study
- Representative of the population (equal chance of people selected; intended vs. accessible population)
- Use sample statistics to make inferences about population parameters
What are parameters?
- Numbers used to describe a population
- EX: mu = population mean
What are statistics?
- Numbers used too describe a sample
- EX: x bar = sample mean
What occurs in a census?
-Population = Sample
What is sampling bias?
- Failure to identify/examine all members of a population
- Sources: samples of convenience, volunteerism
What are the two main types of sampling?
- Probability
- Non-probability
What are the types of probability sampling?
- Simple random sampling
- Systematic random sampling
- Stratified random sampling
- Cluster sampling
- Multistage sampling
What are the types of non-probability sampling?
- Convenience sampling
- Purposive sampling
What is probability sampling?
- Uses some form of random selection, based on probability
- Requires setting up a procedure that assures that the different members of your population have equal probabilities of being chosen
What is simple random sampling?
- Choose such that each sample in the population has an equal chance of being selected (i.e. picking out of a hat)
- Advantages: equal chances of selection, fair, free from sampling bias
- Disadvantages: need to know entire population, not most statistically efficient method, luck of the draw (may not represent subgroups well)
What is systematic random sampling?
- Selecting one member randomly and then choose additional members at evenly spaced intervals
- EX: want a sample of 20/100 students, select 1 every 5th person in the alphabetical class list until you have N= 20
- Disadvantages: you need a complete listing, need to watch out for periodicity in the list
- Advantages: fairly easy to do
What is stratified random sampling?
- Population can be divided into different groups based on criteria (i.e. strata)
- Separate simple random sample from each population stratum
- EX: men vs. women who are ASHA members
- Advantages over simple: assures representation of overall population AND key subgroups, potentially apply results to subgroups
What is cluster sampling?
- Select clusters from population on the basis of simple random sampling, then sample all people in the cluster
- EX: if you want to sample all pre-k kids in MD, take a random sample of MD schools with pre-k programs, sample all those kids in the sampled schools
- Economical, but still susceptible to sampling bias (clusters are intrinsically homogeneous)
What is multistage sampling?
- Combine different methods of probability sampling
- EX: using cluster sampling to select certain schools, and then random sampling within each school
What is non-probability sampling?
- Does not involve random selection
- May or may not represent the population well, hard to know how well (even with large N)
- Susceptible to researcher bias
What is convenience sampling?
- Convenient samples are chosen from a population (what we do most frequently)
- EX: college students, local volunteers
- Disadvantage: no evidence that they are representative of the populations we’re interested in generalizing to (and often would suspect that they are not)
What is purposive sampling?
- Specific, predefined groups that we seek
- Frequent in qualitative research
- Smaller N
- Useful for situations where you need to reach a targeted sample quickly and where sampling for proportionality is not the primary concern
- EX: expert, extreme/deviant cases, criterion sampling (“all white cars”)
Describe how to determine sample size.
- Determine BEFORE you start an experiment (a priori)
- Practical concerns
- Sample size estimates depend on:
- The size of the effect you’re interested in (effect size)
- Variability across the sample (i.e. participants)
- Reliability of your measure
Describe random assignment to groups.
- Ideal way to divide your sample into groups
- Any TRUE experiment with treatment/control or multiple treatments
Describe counterbalancing of repeated measures conditions: within-subjects.
- Each person completes each condition 1+ time(s)
- Reverse counterbalancing: ABCCBA
- Block counterbalancing: select # of repetitions of each condition, present in a variety of sequences (i.e. ABCBACBCA)
Describe counterbalancing of repeated measures conditions: across-subjects.
- Each participant only gets 1 sequence of conditions, but the sequences differ across people
- Can be complete or partial
What is the goal of counterbalancing?
-To balance out carry-over or order effects
Describe across-subject counterbalancing when complete is not feasible.
- Randomized partial counterbalancing
- Latin-square counterbalancing
What is randomized partial counterbalancing?
-Each participant gets a different random sequence
Describe complete across-subject counterbalancing.
- Need condition! (factorial) groups
- EX: 3 conditions, 3! = 3 X 2 X 1 = need 6 groups
What is latin-square counterbalancing?
-A square of sequences such that each condition appears only once in any order position in the sequences
What is balanced latin-square counterbalancing?
-A square of sequences such that each condition appears only once in any order position in the sequences
AND
-Follows each other condition an equal number of times
What are the 4 levels of measurement?
1) Nominal
2) Ordinal
3) Interval
4) Ratio
What is nominal level of measurement?
- Categorical
- Ideally exhaustive and mutually-exclusive categories
- EX: type of hearing loss
What is ordinal level of measurement?
- Categorical
- Ordered
- EX: degree of HL
What is interval level of measurement?
- Discrete or Continuous
- Equal distances between scores
- Calculate differences but not proportions
- EX: dB HL, temperature
What is ratio level of measurement?
- Discrete or Continuous
- Interval and has a true zero point
- EX: WR score
How can you report descriptive statistics?
- Frequency, percentage, proportion
- Measures of central tendency
- Measures of variability
What are measures of central tendency?
- Mean: M, interval/ratio data
- Median: Mdn, ordinal/interval/ratio data
- Mode: all types
What are measures of variability?
- Min, max
- Range
- Interquartile range
- Standard deviation
- Standard error of the mean
What is interquartile range?
- Score at 75th percentile - score at 25th percentile
- Relevant if your data have extreme highs or lows
What is standard deviation?
- Dispersion of scores around the mean
- Colloquially: On average, how much do observations differ from the mean?
- SD = SQRT(variance)
What is standard error of the mean?
- How far is the sample mean likely to be from the population mean?
- SE = s/ (SQRT[n])
What is it important to know shapes of distributions?
- To determine the best way to summarize your data
- To determine the type of statistical test you should perform (some tests assume a particular distribution [i.e. “normally distributed”)
Describe normal distribution.
- AKA “Gaussian curve”
- Largest number of observations at the center
- Symmetric
- Fewer as you get towards extreme values (2/3 of observations will fall within 1 SD of the mean)
Describe skewed distributions.
- Not symmetric
- More extreme scores in one directions (i.e. negatively skewed, positively skewed)
- Mean most affected by skew (the more skewed the more difference between mean and median)
Describe bimodal distribution.
-2 peaks
What are standardized scores?
- Account for both average and variability of the score
- Z score = (score-M)SD
- Resulting M(z-score) = 0 and SD(z-score) = 1
- How many SD above/below the mean is a given score?
- Straightforward way to relate a value to a normal distribution and to other z-scores
What are outliers?
- Extremely deviant values
- Not necessarily inaccurate
- Can be identified via: reviewing experiment notes, plotting raw data, setting a priori criteria
How do you deal with outliers?
- NEVER EVER remove data without minimally describing:
a. How and why you did
b. What was the impact on the results
c. How much data you’ve removed and was removal equally distributed across condition - Problems can arise from interpreting data with real outliers or without “outliers”
What is the rule of thumb regarding data representation in tables?
-Data that required less than 2 columns or rows should just be presented in the text
What are different types of figures?
- Pie chart
- Scatterplot
- Column/bar graph
- Line graph