WEEK 2 - Measuring variables, sampling, validity and reliability Flashcards
Why is generalisability in sampling important?
Generalisable results are results that reflect the true state of affairs in the population of interest - to claim this your sample needs to be as representative of the population as you can make it
What is ‘population’
The totality to whom/which you wish to generalise your study findings
What is ‘sample’
the participants in your study
What are the two types of sampling procedures?
Probability sampling - simple random, systematic random, stratified, multi-stage cluster
Non-probability sampling - convenience, snowball, purposive
What is probability sampling?
- A way to ensure that your sample is representative of the population (on the characteristics deemed important to the study)
- Basic principle: A sample will be representative of the population if all members of the population have an equal chance of being selected for the sample
- Allows the researcher to calculate the relationship between the sample and the population
What are the types of probability sample?
- Simple Random sample
- Systematic random sample
- Stratified random sampling
- Multi-stage cluster sampling
What is a simple random sample?
- each member has an equal and independent chance of being selected
- define the population, list all members, assign numbers then randomly select number
What is systematic random sampling?
- Every kth person
-Randomly select the first person then divide the size of the population by the size of the desired sample and then use this to determine the interval at which the sample is selected
**size of population/size of desired sample **
Example: to select a sample of 1000 people from a list of 10 000 randomly select the first person and then select every 10th person from the list
What is stratified sampling?
- If you want to make sure the profile of the sample matches the profile of the population on some important characteristics (for example, age or ethnicity)
- Researcher divides population into sub-populations (strata) and then randomly samples from strata
Why do we use stratified sampling?
- Can reduce sampling error by ensuring ratios reflect actual populations (example -ratio of different ethnic groups)
- To ensure that small sub-populations are included in the sample
What is multi-stage cluster sampling
- Begins with a sample of groupings then samples individuals
example: Rural sample
- Define rural sample as those with populations <X
- Get a listing of all relevant towns
- Take a random sample of towns
- Randomly sample people from within the randomly sampled town
What is the difference between multi-stage cluster sampling and stratified sampling?
Multi-stage cluster is not the same as stratified sampling as each cluster does not need to be sampled.
What is multi-stage / multi-phase sampling?
- Larger sample obtained first to identify members of a sub-sample
- Sub-sample randomly chosen from study
Example: Large community survey in Australia which asks if they had diagnosis X disease –> X disease sufferers followed up again for sampling
What is non-probability sampling?
- Not every member of the population has equal chance of being part of the sample
Why use non-probability sampling?
There are no lists for some populations under study, for example:
- The homeless
- Certain occupations (e.g. farmers)
- Hidden or specific populations (e.g. farmers with mental health issues
- convenience/resource restriction
Types of non-probability samples
- convenience sample
-snowball sample - Purposive sample
What is a convenience sample
A sample of available participants
Example: students enrolled in a particular course or people passing a particular location
What are advantages and disadvantages of convenience sampling
Advantages:
Easy
Inexpensive
Disadvantages:
No control over representativeness
Bias
What is snowball sampling?
- Involves collecting data with members of the population that can be located and then asking those members to provide information/contacts for other members of the population
- Used mainly for hard-to-study populations
for example: homeless young people, people with not commonly listed characteristics
What is Quota sample?
- Non-probability sampling equivalent of stratified random sample
- Want to reflect relative proportion of population but you don’t/can’t sample randomly from each strata as you do in stratified random sampling
What is purposive/judgment sampling?
- Selecting a sample based on knowledge of the population, its elements and the purpose of the study
- Clear purpose to sampling strategy. Select key informants, atypical cases, deviant cases or a diversity of cases
Example: If a study aimed to find problems experienced by new immigrants it may sample key people involved in agencies that help immigrants such as ethnic welfare groups, community immigration legal aid groups
Why is purposive sampling often used?
- To select cases that may be especially informative
- Select cases in a difficult to reach population
- Select cases for in depth investigation
Which method of sampling do I use?
- best method is normally a probability sampling one (as the aim of research is to generalise findings to the population)
- Sometimes different sampling methods aren’t feasible given resources, time etc.
-
How do you determine sample size?
- Largely determined by the analysis you plan to conduct with the data derived
- Generally: the more complex the analysis the larger the sample
When are larger sample sizes needed?
- When the sample is heterogeneous (when the sample is composed of widely different people)
- When you want to break down the sample into subcategories (e.g. look at gender separately)
- If you want to obtain a more narrow or precise confidence interval
- when you expect a small effect or weak relationship
- for some statistical techniques
What are the five simple rules for determining sample size?
- If less than 100, use entire population
- Larger sample sizes make it easier to detect an effect or relationship in the population
- Compare to other research studies in area by doing a literature review
- Use a power table for a rough estimate
- Use a sample size calculator (e.g. G power)
What is a metric? (levels of measurement)
When we want to measure something (e.g religion, self-esteem, tennis ability) we need to choose a metric with which we can measure it
The metric will determine the statistical analysis we will perform
What are the levels of measurement?
Nominal
Ordinal
Ratio
Interval
(NORI)
What is nominal?
Something which is purely categorical information (quality or kind of something)
Example: religion, gender
What is ordinal?
A rank order
Ordinal variables do indicate an underlying quantity but they do not obey mathematical laws (you cannot meaningful subtract or divide)
What is interval?
A true number in the sense that there are equal intervals implied but no true zero point - example: temperature in degrees
Ratio
A true number. The distinguishing feature of a ratio scale variable is that it has a meaningful zero point, which participants could use to indicate the quantity is completely absent
What is an issue with reliability and validity
The issue is you can’t assess these until after you have developed your questionnaire and use them . Therefore, a pilot test can be beneficial
Many people chose to use established measures instead of their own
What are the two broad types of validity
Internal validity
External validity
Why is validity an issue?
- Many (if not most) variables in social research cannot be directly observed
example: motivation, satisfaction, helplessness
Therefore, the challenge is to make a judgment call on weather we are measuring what we think we are measuring
What are the types of validity
Face validity
content validity
criterion-related validity - (concurrent validity +predictive validity)
construct validity - (convergent + divergent)
What is face validity
- Asks the question: on the face of it, does my measure seem to relate to the construct?
- Measures that lack face validity have the potential to alienate research participants (what are they really trying to measure)
- a weak subjective method for assessing validity but a first step
What is content validity
- The extent to which the measure represents a balanced adequate sampling of relevant dimensions
- Considers what should go into a measure and what should stay out - considers boundaries
- How much does the measure cover the content of the definition?
Example: which of the following would be a more valid test of mathematical ability;
- a 20-question test containing addition problems
- a 20 question test containing addition, subtraction, multiplication problems
What is criterion-related validity
- involves checking the performance of your measure against some external criterion
What are the two types of criterion-related validity:
Concurrent - Establish the validity of your measure by comparing it to a “gold standard” (i.e., Existing validated measure of the same construct)
Predictive - does the measure predict/relate to some criterion that you would expect it to predict
What is predictive validity (in criterion-related validity)
- Does the measure predict something that it is theoretically supposed to predict
- Does the measure differentiate between people in a way that you would expect
- What should a measure of the following constructs predict?
- Iq -> perhaps some cognitive-based performance task
- Workplace depression scale -> number of mental health sick days
What is construct validity
Demonstrating that the measure relates to the theoretical construct of interest
What are the two types of construct validity
convergent
divergent
What is convergent validity (in construct validity)
Demonstrates that the measure relates to other similar measures
What is divergent validity (in construct validity)
Demonstrates that the measure does not relate to unrelated constructs
Summary of validity types
Face - in the judgment of others items appear to relate to construct
content - Captures the entire meaning (all elements of definition) of a construct
criterion - agrees with external source
concurrent - agrees with existing gold standard measure
Predictive- agrees with future behaviour
construct - how well multiple indicators relate to each other (consistent with theory)
convergent: similar measures (or measures of theoretically related constructs) are related
divergent - different measures are unrelated
What is reliability?
- The consistency or repeatability of your measurement
What are the types of reliability?
- Stability of the measure (test-retest)
- Internal consistency of the measure (split-half,
Cronbach’s alpha) - Agreement or consistency across raters (inter-rater)
What is test-retest reliability?
- Addresses the stability of your measure
- you administer the measure at one point in time and then you give the same measure to the same participant at a later point in time
- You correlate the scores on the two measures
What is the problem with test-retest?
- memory effect
- practice effect (performance improves because person has had practice taking test before)
- Another thing to consider is time between intervals. If too short there is a greater risk of memory effect. If too long there is a risk of other variables (e.g. additional learning) influencing results
What is split half reliability
- administer a battery of questions
- split the measure into two halves
- correlate the score on the two halves of the measure
higher correlation means greater reliability
strengths: eliminates memory and practice effects
limitations: are the two halves equivalent
What is inter-item reliability
- Assesses the internal consistency of the measure
ie. tells you how well items or questions in your measure appear to reflect the same underlying construct. - You will get good internal consistency if individuals respond in approximately the same way on your survey
Cronbach’s alpha can range from 0 (when the items are not
correlated with one another) to 1.00 (when all items are
perfectly correlated to each other). The closer the alpha is
to 1.00, the better the reliability of the measure
Inter-rater or inter-observer reliability
- Checking the match between two or more raters or judges in your study
Calculations of inter-rater reliability
- Nominal or Ordinal scale (the percentage of times different raters agree)
- Interval or ratio scale - correlation coefficient
What kind of reliabilities co-efficients should I be
aiming for?
- Test-retest coefficients > .70
- Internal consistency >.70 (but ideally much higher)
- Rating consistency >.90
Reliability and measurement error
*measurement error serves to weaken our statistical tests
- All other things being equal, more error in measurement means lower power
- Choosing a measure which is highly reliable decreases measurement error and increases the power of your design
Can a measure be reliable but not valid?
Yes! You could have a consistent measure that does not
actually measure the construct
Can a measure be valid but not reliable?
Yes.
Example of valid tool but is unreliable – something that is
difficult to implement (e.g., Skin fold tests – require
technical skill) – may be unreliable across multiple
administrators.
Summary of validity types
Test-Retest - Same question given on different occasions and data correlated
Split half - Split questions in half and correlate data from two halves
Inter-item reliability - Overall correlation between items in scale
Inter-rater - Checking for agreements between multiple raters or judges