3. Sampling Validity and reliability Flashcards
what is sampling?
finding participants.
why is it important to sample properly?
so that we can make generalizable inferences
population
a group of people about whom one would like to draw some meaningful conclusions
sample
a subset of that population that is actually included in your research study
sample framw
a list of members/elements of a population from which one might obtain a sample
census
a list of all of the people comprising a particular population
what is critical to have when making generalisable inferences about the population on the basis of measurements of your sample?
representative samples
what are representative samples?
select a sample whose typical characteristic are approximately the same as the typical characteristic of the population
sampel statistics
a numeric characteristic of a sample (measured)
population parameter
a numeric characteristic of the population (often not known)
Resposne rate
what proportion of the people responded?
sampling error
the difference between the sample statistic and the population parameter (depends on the sample size)
what is sampling bias
something that we would like to avoid
probability sampling
a way to ensure that your sample is representative of the population (on the characteristics deemed important for the study
what is the basic principle of probability sampling?
a sample will be representative of the population if all members of the population have an equal chance of being selected in the sample
allows the researcher to calculate the relationship between the sample statistic and the population parameter
what are the types of probability sample
simple random sample systematic random samle stratified random sample multistage cluster sampling Multi-stage / multi-phase sampling
simple random sampling
each member has an equal and independent chance of being selected.
you define the population - list all members - assign numbers.
how would you do random sampling>
using a table of random numbers to select. Use a lottery method
use a computer program to randomly select
systematic random sample
every kth person (k=number).
randomly selected the first person then divides the size of the population bu the size of the desired sample, and use this to determine the interval at which sample is selected.
what must you ensure when doing a systemtic random sample?
that the list of elements is not arranged in a way that means systematic sampling could lead to biased sample (e.g. GPA order)
stratified samplign
if you want to make sure the profile of the sample matches the profile of the population on some important characteristic.
Researcher divides the population into sub populations (strata) and randomly samples from the stata
why use stratified sampling
can reduce sampling error by ensuring ratios reflect actual population (e.g., ratio of males to females)
To ensure that small subpopulations rations reflect actual population (e.g. ratio or males and females)
multi-stange cluster samplingg
• Begin with a sample of groupings and then sample individuals
• E.g. Rural sample
o Define rural townships as those with populations
when might you use multi-stage cluster sampling?
• hunger games – different districts, different characteristics,
Multi-stage / multi-phase sampling
- Larger sample obtained first in order to identify members of a sub-sample
- Sub-sample then randomly chosen from for study
- Good (but costly) way to identify not readily identifiable subgroups
advantages of probability sampling?
helps overcome sampling bias
- representativeness
disadvantages with probaility sampling?
It’s all very well selecting people at random, but the fact that you have selected them doesn’t mean that they will take part in your study…
non-probability sampling
Not every member of the population has an eqyal chance of being part of the sample
why use non-probability sampling?
o There are no lists for some populations under study, e.g.
• The homeless
• Certain occupations (e.g., farmers)
• Hidden populations (e.g., people involved in “clandestine” activities)
• Convenience/ resource restrictions
convenience samples
• A sample of available participants, e.g.,
o students enrolled in a particular course
o People passing a particular location
advantages of convenience samples
easy cheap
disadvantages of convenience samples
no control over representativeness
snowball sampling
Involves collecting data with members of the population that can be located and then asks those members to provide information/contacts for other members of the population
Used mainly for hard to study populations, e.g.,
o Gay men
o Homeless young people
o Illegal immigrants
Quota sampling
- Non-probability sampling equivalent of a stratified random sample
- Want to reflect relative proportions of a population
- But you don’t/aren’t able to sample randomly from each strata as you do in stratified random samples
Purposive / judgement sampling
Clear purpose to the sampling strategy: select key informants, atypical cases, deviant cases or a diversity of cases.
why is Purposive / judgement sampling used
o Select cases that might be especially informative
o Select cases in a difficult-to-reach population
o Select cases for in-depth investigation
what to consider when selecting a method of sampling?
- As a major aim of quantitative research is the ability to generalise results the ultimate method is probability sampling one
- However this is often not workable or feasible given resources, time, the specific target population
- Sampling method used should be fully explained and caveats about the likely generalisability of results made accordingly so that the reader can review your results in an informed way
sample size
as we have already seen the size of your sample can influence how representative it is of the population
important that you have an appropriate size
how many participants do i need for the study?
o Largely determined by the analysis you plan to conduct with the data derived
o Generally the more complex the analysis the larger the sample you require
o Increases in sample size bring with them increases in accuracy/precision/reduces sampling error
what does heterogenity of the population mean?
greater variation in the population, the larger the sample should be
when is a larget sample size needed?
o heterogeneous
• composed of widely different kinds of people
o you want to breakdown the sample into multiple subcategories
• e.g., look at males and females separately
o when you expect a small effect or weak relationship
o when you use less efficient methods of sampling
• e.g., cluster sampling
o for some statistical techniques
what are the five simple rules for determining sample size?
- if population is less than 100, use entire population
- larger sample sizes make it easier to detect an effect or relationship in the population
- compare to other research studies in area by doing a literature review
- use a power Table for a rough estimate
- use a sample size calculator (e.g., G-Power)
What does it mean to operationalise the IVs?
How are you going to manipulate it? How is it manipulated (if you cant)?
what does it mean to operationalise the DVs?
how are you going to measure it?
what does reliability question?
does our measurement instrument behave sensibly? Does it always measure the same thing in the same way?
what does validity question?
are we measuring wht we think we are measuring?
is our measure credible, is it believable?
How do we asses whether our measures/operationalisations are good?
we test validity and reliability
when can you test reliability and validity?
not until after the the questionairre is developed and used
why is pilot test beneficial?
so that you can determine whether the means of measures are reliable and valid.
what is relaibility?
the consistency or repeatability of your measure
can a measure be reliabile but not valid?
you could have consistent measure that does not accurately measure the construct
can a measure be valid but not reliable?
NO! if your measure doesnt consistently and dependably measure the construct it cannot possibly be measuring what it says its measuring
what are types of reliability?
stability of the measure (tes-retest)
Internal consistency of the measures (split-half, crohbach’s alpha)
Agreement or consistency across raters (inter-rater)
what does test-retest reliability question?
does your test measure the same thing every time yo uuse it?
how do you perform a test-retest test?
address the stability of your measure
you administer the measure at one point in time (time 1)
you then give the same measure to the same participants at a later point in time (time 2)
You correlate the scores on the two measures
what are the two main problems with test-retest
memory effect, practice effect
memory effect
you might remember the question and look up the ones you didnt know, therefore, your performance might get better from time 1 to time 2
practice effect
performance improves because of practice in taking the test
what are other considerations to make when performing a test-retest test?
intervals
what is something to consider when thinking about intervals for a test-retest?
if too short theres as greater risk of memory effects
if too long theres a risk of other variables (e.g. additional learning) that may influence the result
What does the split-half reliability question?
is your measure internally consistent?
How does one perform a split-half reliability test?
you administer a single measure at one time to a group of participants
But, for your purposes you split the measure into two halves and you correlate the scores on the two halves of the measure (higher correlation means greater reliability)
what are the strengths of split-half reliability?
eliminates memory & practice effects
what are the limitations of the split-half reliability?
the two halves may not be completely equivalent
What does Cronbach’s alpha assess?
the internal consistency of your measure. i.e. tells you how well the items or questions in your measure appear to reflect the same underlying construct
what is the range of coefficient for Cronbach’s alpha?
0 - 1.00
the closer it is to 1 the better the reliability of the measure
what does inter-rather or inter-observer reliability question?
Do different raters measure the same thing?
It checks the match between two or more raters or judges.
e.g. research investigating the relationships between communication and family functioning
how does one calculate inter-rater reliability?
Nominal or ordinal scale - the percentage of times different raters agree
Interval or ration scale - correlation coefficient
what kind of reliability co-efficients should i be aiming for test-retest coefficients?
below 0.7
what kind of reliability co-efficients should i be aiming for Internal consistency?
below 0.7
what kind of reliability co-efficients should i be aiming for rating consistency
below 0.9
why is validity an issue?
many (if not most) variables in social research cannot be directly observed
e.g. motivation, satisfaction, helplessness
what are types of validity?
face, content, criterion (concurrent, predictive), construct (convergent, discriminant / divergent)
what does face validity question?
does it (subjectively) look like it measures what we want it to measure?
e. g. on the face of it, which of the following is a more valid measure of worker morale?
- no of grievances filed with the union
- no of books borrowed by workers during off-duty hours
what is the problem of face validity?
it is a weak subjective method for assessing validity but it is a first step.
content validity
the extent to which the measure represents a balanced adequate sampling of relevant dimensions of the measured construct
what does content validity question
does it cover all aspects of the construct that it purports to measure?
Criterion-related validity
involves checking the performance of your measure against some external criterion
what are the two types of criterion-related validity?
concurrent and predictive
what does concurrent, criterion-related validity question?
does it relate to a known criterion, for example, an alternative (gold standard) measure of the same construct?
what does predictive, criterion-related validity question?
does the measure predict/relate to some criterion that you would expect it to predict?
Does the measure differentiate between people in the way you would expect ?
What should measures of the following constructs predict?
Criterion-related validity: concurrent validity
Establish the validity of your measure by comparing it to a “gold standard” (i.e., existing validated measure of the same construct)
In concurrent validity, if theres already an existing, validated measure, why is it necessary to come up with a new one?
Your test might have some time/cost savings, e.g., shorter, easier to administer
Conversely your test might aim to demonstrate a more nuanced understanding of the construct in question (and be longer/more complex)
Construct validity
Establishes validity by showing that your measure relates to other constructs in a way that you would expect (theoretically)
e.g., you would expect a measure of marital satisfaction to be positively related to respect for partner and negatively related to marital infidelity
what are the two types of construct validity?
convergent
divergent
convergent validity
Measures of constructs that theoretically should be related to each other, are, in fact, observed to relate to each other (i.e., there is correspondence or convergence between similar constructs)
divergent validity
Measures of constructs that theoretically should not be related to each other, are, in fact, observed not to relate to each other