9.0 Quantitative - data collection and analysis Flashcards
Types of data collection in quantitative research
Physiological measurement
Observation
Interviews
Questionnaires
Records or other documents
Interview schedule (structured) and questionnaire differ only in that
the interview involves interviewer asking the questions, and with the questionnaire the respondent reads themselves.
Three ways to administer a questionnaire
Mail/Web
Collective/captive audience
In public place
Considerations in choosing between an interview or questionnaire
Topic (e.g. a sensitive issue)
Where the respondents live
Who the study population is
Advantages of using a questionnaire for data collection
Less expensive
Anonymity
Disadvantages of using a questionnaire for data collection
Application is limited
Literacy respondents
Low response rate
Self-selecting bias
Opportunity to clarify issues not there
Does not allow for spontaneous response
Response to a question may be influenced by other responses.
Opportunity to consult with others.
Cannot be supplemented with other information (i.e. + observation)
What is secondary data in research?
Existing data/Second hand information.
Examples:
Earlier research, census data, personal records, government publications
Government registries including cancer registries, hospital morbidity data, the mental health register etc.
Clinical records
Surveys and questionnaires - purpose:
A structured way to collect information.
Surveys are typically conducted through the mail (electronic or surface), phone, or internet.
Purpose:
To collect standardized information from large numbers of individuals.
When face to face meetings are inadvisable.
When privacy is important or independent opinions and responses are needed
What’s the difference between a survey and questionnaire?
Survey – method - “a descriptive research method where respondents are asked a series of questions in a standard manner so that responses can be easily quantified and analysed statistically.”
Questionnaire - tool - a specific type of written survey made up of a structured series of questions. Questionnaires usually have highly standardised response options so that data can be easily analysed and compared.”
Research design steps
Decide who should be involved in the process.
Define content.
Identify your respondents.
Decide on the survey method.
Develop the questionnaire.
Pilot test the questionnaire and other materials.
Think about analysis.
Communicate about your survey and its results.
Develop a budget, timeline, and management process
Representativeness
- refers to how well the sample drawn for the questionnaire research compares with (eg, is representative of) the population of interest.
Questionnaire structure
Title
Introductory paragraph (ethics, confidentiality)
Content questions:
Section
Filter questions (Go to question…)
Place easiest questions upfront
Avoid providing answers to later questions
Finish:
How to return questionnaire
Link to press
Acknowledgement
Response rate
The proportion of people who respond: divide the number of returned surveys by the total number of surveys distributed.
If 50 questionnaires are distributed and 25 questionnaires returned the response rate is 50%.
High response rate - promotes confidence in results
Lower response rate - increases the likelihood of biased results.
How to increase response rate
Generate positive publicity for your survey.
Over sample.
Ensure respondents see the value of participating.
Make (multiple) follow-up contacts.
Provide incentives.
Provide 1st class postage/return postage.
Set return deadlines.
Make the survey easy to complete.
Good questionnaires are NOT EASY
Developing a good questionnaire takes time, time and more time.
Multiple drafts may be involved before the questionnaire is ready.
It’s important to involve others in writing the questionnaire.
Questionnaire design − Considerations
Kind of information: What do you want to know?
Is the information already available?
Wording of questions and responses
Formatting the questionnaire
Pre-testing
Cover letters and introductions
When/where will the questionnaire be distributed? How will returns be managed? How will the data be analyzed?
Who is responsible for each task?
Steps to take if using questionnaires in research
Step 1: What information is needed?
Step 2: Sample
Step 3: Develop questionnaire
Step 4: Plan distribution, return, follow-up
Step 5: Pilot test
Step 6: Revise and revise
Open-ended questions
− allow respondents to provide their own answers
N
umeric open end (e.g. please state AUD$_______ )
Text open end (sometimes called “verbatims”). How could you increase the proportion of household income used on healthcare?
Closed-ended questions
− list answers and respondents select either one or multiple responses
Questions: How much do you spend on health care per year?
multiple choice (<100, <300, <500, <700,+1000)
Question design
Avoid vague questions and answers.
Avoid ambiguous words or phrases.
Avoid questions that may be too specific.
Avoid making assumptions.
Avoid leading questions.
Biased questions
- Influence people to respond in a certain way – “loaded questions”
- Make assumptions about the respondent – “leading questions”
- Use language that has strong positive or negative appeal
Write questions through your respondent’s eyes
Will the question be seen as reasonable?
Will it infringe on the respondent’s privacy?
Will the respondent be able and willing to answer the question?
Be selective and realistic when writing questions.
Order and wording of questions can affect responses and bias the results (Bowling, 2002).
Types of questions
Behaviour questions - ask about what people do – do you currently smoke cigarettes?
Belief questions – ask about whether people belief something to be true or false - Do you think peer pressure or parental example is more influential in determining smoking uptake among adolescents?
Attitude questions – seek to establish what respondents think is desirable- Do you agree with the statement ‘Airports should provide a smoking room for travellers.’
Knowledge questions - seek to determine what people know about particular topics. – what do you know about the effects of smoking on women?
Attribute questions –information about more objective characteristics of respondents - age, gender, place of residence etc.
Reliability is …
the extent to which a measurement instrument is dependable, stable, and consistent when repeated under identical conditions.
Validity is …
the extent to which the scale measures what it is supposed to measure (content and construct validity).
Likert scale
Commonly 5 or 7 point scales.
Most commonly used scaling method in health research.
Has typically high reliability.
Demographic data collection questions
AgeGender
Ethnicity
Marital status
Family size
Occupation
Education
Employment status
Residence
Previous contact with organisation
Prior knowledge of topic
First-time participant vs. repeats
How you learned about the program
Pilot testing a questionnaire
Always!
With people as similar to respondents as possible:
Do they understand the questions? The instructions?
Do questions mean same thing to all?
Do questions elicit the information you want?
How long does it take?
Revise as necessary
Data Processing involves:
- Editing data - includes procedures for detecting and correcting errors in the raw data.
- Coding data
- Analysis of data - computers play a major role here for analysing quantitative data.
Quantitative data analysis software packages: SPSS, SAS, Stata, R, JMP, MATLAB
There are four levels of measurement:
nominal, ordinal, interval and ratio
Nominal (a level of measurement)
Variables are divided into two or more categories and assigned arbitrary numbers. These numbers have no rank order.
Gender: male 1, female 2; Marital status: single 1, married 2, separated 3, divorced 4, widowed 5
Ordinal (a level of measurement)
Reflects a rank order among the categories, but we do not know how much greater than or less than.
Pain: ranked on a scale of 1 to 10 with 10 being the worst pain and 1 being no pain.
Anxiety: ranked on a 5-point scale from low to high.
Interval (a level of measurement)
Are made up of ‘real’ numbers that allow us to order the numbers and to know the distance between those numbers. The intervals between the categories are equal.
Time: hours, minutes and seconds are precise intervals)
Temperature: interval between the numerical values of 76 and 77 degrees is the same as the interval between the values of 44 and 45 degrees.
Ratios (a level of measurement)
Ratios have the same properties as interval data, except that the measurement scale for the variable possesses a meaningful zero. This means that when zero is reached on the scale, the variable is absent.
Weight, height: zero means no weight or no height.
Types of statistical analysis
- Descriptive
- Inferential
Descriptive statistics
Describe or summarise information obtained from sample (e.g. frequency, percentage, mean, mode, median)
Inferential statistics
use sample results to draw conclusions regarding the relevant population.
Statistics that allow a researcher to make inferences about whether relationships observed in a sample are likely to occur in the wider population from which that sample was drawn. Inferential statistics use logic and mathematical processes in order to test hypotheses relating to a specific population based on data gathered from a sample of the population of interest.
Allow you to test a hypothesis or assess whether data is generalisable to the broarder population.
Difference between descriptive and inferential statistics
Descriptive statistics summarize the characteristics of a data set.
Inferential statistics allow you to test a hypothesis or assess whether your data is generalizable to the broader population.
Mode
Most frequently occurring score in a frequency distribution. Only measure of central tendency where data are nominal, however it can be used with all levels of measurement.
Median
Middle score or where 50% of scores are above it and 50% below it.
Mean
Most widely used measure of central tendency.
Average of all scores. Used with interval and ration data.
Measures of variability
Range
Variance
Standard Deviation
z scores
Quartiles and the interquartile range
Percentil
Standard Deviation
Standard Deviation - meausre of average deviation or distance of each score from the group mean in a normal distribution. Must always be reported with the mean.
68% of the sample will fall within 1 SD from the mean.
Small DF = less variability within the sample and the more similar the scores to the mean and to each other.
Variation
Variance = sum of squares / number of values
Small value for variable = values are very close ot the mean and therefore similar to each other.
Larger variance = values are very spread out around the mean and from each other.
Standard Deviation
Standard Deviation - measure of average deviation or distance of each score from the group mean in a normal distribution. Must always be reported with the mean.
68% of the sample will fall within 1 SD from the mean.
Small DF = less variability within the sample and the more similar the scores to the mean and to each other.
z scores
Used to compare measurements/values in standard units. Takes account of mean and SD of the distribution
Each score is converted to a z score and then the z scores are used to examine the relative distance of the scores from the mean - process called standardising the score.
z score = 1.5 = observation is +1.5 SD above the mean
z score = -2 = observation is -2 SD below the mean
Quartiles
Cuts the observations into 4 equal amounts / sections. Q1 - 25th percentile , Q3 = 75 percentile.
Distance between Q1 and Q3 = interquartile range and indicates the range of the middle 50% of scores. More stable than range because it is less likely to be changed by a single extreme score.
Percentile
Represents the percentage of cases a given score exceeds.
A score in 90th percentile is only exceeded by 10% of the scores.
Skewness
Refers to asymmetry of a distribution of interval or ration scores.
Kurtosis
Related to peakness or flatness of distibution
Inferential statistics
Hypothesis testing
Probability and level of significance
95% CI
Odds ratio
Errors in statistical inference: type 1 and type 2 errors
Power anaylsis
Effect size
Tests of significance (parametric or non-parametric)
The term statistically significant means
that the result is unlikely to have occurred due to chance fluctuations in sampling.
Levels of significance - alpha levels
a = 0.5 (researcher willing to accept a 5% risk that the results are in error) - minimal levels acceptable for all scientific disciplines.
a = 0.01 (1% risk of error)
a = 0.001
Confidence interval
identifies a range of values that includes the true population value or a particular characteristic at a specified probability level.
If CI passes through 0 = no realtionship
e.g. 95% CI of a mean indicates if we took 100 similar samples and calculated their means, 95 of those samples would contain the ‘true’ population mean, and 5% wouldn’t.
Mean _+ SD
6.12 +- 2.54
95% probability that in the target population mean would be between 5.53 and 6.70.
Odds ratio
Another way of presenting probabilities.
Summary statistic that estimates the odds of an event occuring in one group compared to another. Measure of strength of association.
OR of 1 = that either event is likely
OR > 1 = event is more likely to happen
OR <1 event is less likely to happen
Can be obtained in chi square analysis, logisitc regression where outcome is dichotomous, used in meta-analyses.
OR = 1.60 (can interpret at 60% increase in risk)
Statistic used to assess the risk of a particular outcome and is widely used in the healthcare literature—particularly epidemiology because it can be calculated for case-control studies, in studies using logistic regression, and as a way of presenting the results of a meta-analysis.
Type 1 and Type 2 errors
Two types of errors in statistical inference.
Type I - false positive, e.g. stats say treatment is effective when it isn’t).
Type II - false negative.
Impossible to eliminate these errors however researchers try and minimise through conducting appropriate power analysis.
Power analysis
quantitative method that allows us to determine the sample size required to detect an effect of a given size with a given degree of confidence.
Effect size
equates to the magnitude of the effect of an intervention or treatment.
Degrees of freedom
Represent the number of data points in any given set of data are are free to vary.
Parametric tests
type of inferential statistic that involves the estimation of at least one parameter. Such tests require either interval or ratio data and involve a number of assumptions about the variables under investigation including the fact that the variable is normally distributed.