Research Methods Flashcards
All information that was taught to me while attending Vanier College's "Animal Health Technology" Program, located in St-Laurent Montreal.
What is the goal of statistics
Figure out if what we observe is the result of the factor studied or a background of normal variation.Evaluate what the numbers actually meanRepresent them in a way that communicates their meaning to others
What is a variable
A characteristic that varies between individuals
What is a nominal qualitative variable
EX: Color (Catergories)
What is an ordinal qualitative variable
EX: Body condition (Catergories)
What are some biological variations
Genetics, Environment, Gender, Age
What are two types of technical errors
Human errorsInstrument errors
What is a population
All representatives in a group
What is a sample
A subgroup of the population
What are descriptive statistics used for
Used to summarize data in diagrams, tables, mean, variance.
What is inferential statistics,
To generalize from the sample something that can be applied to the whole populationEstimation of a population’s parameterHypothesis testing (to investigate a theory about the data)
Which types of diagrams are used to represent qualitative data
Bar chartsPie charts
Which types of diagrams are used to represent quantitative data
Dot diagramHistogramStem and leaf diagramBox and whisker plotScatter diagram
What are the averages taken of central tendency
MeanMedianMode
What is used to measure dispersions (spread)
RangeVariationStandard deviation
What is the price we pay by using sampling instead of questioning the whole population?
The price we pay for sampling is that we cannot make statements of absolute certainty about the population.The doubt is expressed as a probability. The larger the sample, the more representative it is.
What are the six types of studies
¤ Observational vs experimental ¤ Cross-sectionnal vs longitudinal ¤ Cohort vs case-control
What are six things to consider when trying to increase precision of the estimates
¤ Replication¤ blocks¤ independant vs pairing¤ Confounders & Interaction ¤ Outliers¤ Missing data
Why is it impossible to prove something with statistics
Because of the sampling error. We are not directly measuring a population
What is a null hypothesis
A hypothesis proving that there is no difference
What is an alternate hypothesis
The opposite of the null hypothesis: there is a difference
What is the P value
Probability. Value that makes you decide if you reject Ho or not.
What does it mean if your P value is very small
it is unlikely that we could have obtained the observed results ifthe null hypothesis were true, so we reject Ho. There is a small probability that Ho is true.
What does it mean if your P value is very large
there is a high chance that we could have obtained the observed results if the null hypothesis were true, and we do not reject Ho. There is a large probability that Ho is true.
What does an A value of 0.01 mean
even more certain to have truly taken the right decision to reject Ho
What does an A value of 0.05 mean
generally accepted
What is a type 1 error
False negative
What is a type 2 error
False positive
What is simple randomization
we would use a computer to generate the sequence, or a table of random numbers, or flip a coin
What is the drawback to simple randomization
The main drawback is that, with a small sample size, there could be a severe imbalance in the numbers assigned to each treatment.
What is sampling error
The sampling error is an error in the estimation of the POPULATION parameter, because we are using a SAMPLE of the population and not measuring the whole population.
What are the two types of sampling errors
sampling error in relation to the mean and sampling error in relation to a proportion.
Give an example of qualitative nominal
The nominal scale is composed of categories of things that can be assigned a name and are not in any particular order (coat color, white blood cells, type of diet (wet-dry)).
Give an example of qualitative ordinal
The ordinal scale is composed of categories that CAN be given an order, but there is not a consistently defined interval (body condition, toxic changes in neutrophils (1+, 2+, 3+))
Give an example of quantitative continuous
The continuous scale is composed of values on a continuum (age, hematocrit).
Give an example of quantitative discontinuous
The discontinuous scale is composed of integer3 values values (# of puppies in a litter, results of rolling a dice))
The rate of a particular enzyme reaction
Numerical, continous
number of offspring per litter
Numerical, discontinuous (discrete)
coat color in horses
Categorical, nominal
disagree, neutral , agree, somewhat agree
Categorical, ordinal
What is a measure of location used for
to measure the central tendency of the data set.
What are the measures of location?
Mean, median, mode
What is a measure of dispersion?
to measure how widely scattered the observations are in either direction from that average.
What are the measures of dispersion?
Range, interquartile range, variance, standard deviation
What is a variance a measure of
Dispersion
What is range a measure of?
Dispersion
What is mode a measure of
Location
What is interquartile range a measure of
dispersion
What type of data is best represented by a pie chart?
Pie chart: circle divided into segments with each segment portraying a different category of the qualitative variable. For categorical data.
What type of data is best represented by a bar chart?
Bar chart: diagram in which every category of the variable is represented. Thelenght of each bar (width is constant) depicts the # or % of individuals belonging to that category. For: categorical data.
What type of data is best represented by a dot diagram?
Dot diagram: each observation is a dot within horizontal and vertical axis calibrated in the units of measurement. For quantitative data (small size)
What type of data is best represented by a histogram?
Histogram: Two-dimensional diagram with (usually) the unit of measurement onthe horizontal axis and the height is proportional to the frequency (verticla axis is frequency). * The rectangles are contiguous (no space between each bar) because the numerical variable is continuous (compared to a bar chart). For quantitative variable (frequencies).
What type of data is best represented by a stem and leaf diagram?
Stem and leaf diagram: modified histogram. Row of number that represents the observations instead of a rectangle (vertical). (not used very often - you do not need to study)
What type of data is best represented by a box and whisker plot
Box-and-whisker plot (box plot): The scale of measurement of the variable isusually drawn vertically. The diagram comprises a box with horizontal limits defining the upper and lower quartiles and representing the interquartile range, enclosing the central 50% of the observations, with the median marked by a horizontal line within the box. The whiskers are vertical lines extending from the box as low as the 2.5th percentile and as high as the 97.5th percentile (sometimes the percentiles are replaced by the minimum and maximum values of the set of observations). Very common. For quantitative variables.
What type of data is best represented by a scatter diagram
Scatter diagram: effective way of presenting data when we are interested in examining the relationship between two variables which may be numerical or ordinal. The diagram is a two-dimensional plot in which each axis represents the scale of measurement of one of the two variables. Using this rectangular co-ordinate system, we relate the value for an individual on the horizontal scale (the abscissa) to the corresponding value for that individual on the vertical scale (the ordinate) by marking the relevant point with an appropriate symbol. For quantitative variables.
What is an outlier?
Outlier: An outlier is an observation whose value is highly inconsistent with themain body of the data. An outlier with an excessively large value will tend to increase the mean unduly, whilst a particularly small value will decrease it.
(T/F and Why) A qualitative variable comprises two categories which may be ordinal or numerical.
qualitative (categorical) —> nominal or ordinal scale; Numerical (quantitative) is the opposite category of qualitative.
(T/F and Why) An ordinal variable comprises categories that cannot be ordered.
ordinal (scale) is, by definition, composed of categories that CAN be ordered.
(T/F and Why) The age groups ‘young’, ‘middle aged’ and ‘old’ relate to a nominal categorical variable.
It is in the categorical (qualitative) variable group, but since we can give an order to the categories, it is part of the ordinal scale.
(T/F) Blood group is classified as a nominal categorical variable.
True
The number of eggs per clutch is a _____ variable
The number of eggs per clutch is an Numerical (quantitative) - discontinuous(discrete) variable
What are the 3 ways a researcher can deal with an outlier
- Include them and proceed as originally planned, recognizing that the distribution assumptions of the analysis may not be met.2. Include them in the analysis but adopt a procedure that is appropriate for the data.3. Perform a sensitivity analysis by analysing the data both with and without the outliers to determine the effect, if any, of removing them.4. Exclude them from the analysis (this is a high-risk strategy, and, before you do so, you should thoroughly investigate the reason for their presence. Beware that some computer packages will automatically eliminate outliers from the analysis.
Write a short explanation of what is meant by Evidence Based Veterinary Medicine.
Conscientious, explicit and judicious use of current best scientific evidence to inform clinical judgements and decision making with a view to improving clinical outcome in veterinary care.
What is Random sampling (Simple randomization)
Computer or random number tableused. Does not have any restrictions or refinements (see other methods). Similar to flipping a coin for each subject. Problems: there is a small probability of assigning the same number of subjects to each treatment group. A severe imbalance in the numbers assigned to each treatment is a critical issue can result with this method with small sample sizes. This can lead to imbalance among the treatment groups with respect to prognostic variables that affect the outcome variables
What is Restricted (blocked) randomization:
The experimenter divides subjects into subgroups called blocks. Then, subjects within each block are randomly assigned to treatment conditions. Compared to a completely randomized design, this design reduces variability within treatment conditions and potential confounding, producing a better estimate of treatment effects.This is done to achieve similar # in each groups.
What is Stratified sampling:
The population is divided into strata according to maincofounding variable (eg if we study arthritis in dogs, the strata are dog size (small, medium or large breed)). Within each strata, simple randomization is done. Restricted randomization could also be incorporated in each strata (see resricted (blocked) randomization).
What is Group or cluster randomization:
The experimental unit is the smallest unit in an experiment to which a treatment can be assigned, and whose response is independent of the responses of the other units. Generally, in human medicine, clinical trials take the individual person as the experimental unit although, occasionally, clusters of individuals, such as households, are used. However, we often regard the group as the most appropriate experimental unit in the veterinary and animal sciences. This is because food, drugs and vaccines are often administered to a group of animals in a litter, pen, paddock or barn, or to a complete herd or to all the fish in a tank. In this case, we apply the randomization procedure to the groups (i.e. group or cluster randomization), so that all animals or fish within each group receive the same treatment. The clusters are selected randomly.
What is Systematic sampling:
the researcher first randomly picks the first item or subject from the population. Then, the researcher will select each n’th subject from the list. The procedure involved in systematic random sampling is very easy and can be done manually. The results are representative of the population unless certain characteristics of the population are repeated for every n’th individual, which is highly unlikely. For example, the researcher has a population total of 100 individuals and need 12 subjects. He first picks his starting number, 5. Then the researcher picks his interval, 8. The members of his sample will be individuals 5, 13, 21, 29, 37, 45, 53, 61, 69, 77, 85, 93.
What is an observational study
In an observational study, we merely observe the animals in the study and record the relevant measurements on those animals. We make no attempt to intervene, for example, by administering treatments or withholding factors that we feel may affect the course of the disease. Clearly, we cannot randomly allocate animals to treatment groups in an observational study. A particular type of observational study is a survey in which we examine an aggregate of animals in order to derive values for various parameters in the population.
What is a population Survey
A population survey which includes the entire population, e.g. a census.
What is a sample survey
A sample survey in which we examine a representative sample of animals so that we may draw conclusions about the whole population of animals.
(t/f in an observational study) Random allocation is usually performed.
No, it is not possible because we do not intervene.
(t/f in an observational study) Will include a sample survey but not a population survey.
No, it includes both the sample survey and the population survey.
(t/f in an observational study) Will include laboratory experiments and clinical trials.
No, it cannot since we do not intervene.
(t/f in an observational study) The observation of a certain number of animals allows to draw conclusion about the whole population.
Yes, such as in a sample survey
Animals are randomly assigned to the treatment groups in clinical trials: To ensure that there is no allocation bias
Yes
Animals are randomly assigned to the treatment groups in clinical trials: To ensure that all animals have the same chance of receiving any treatment
Yes
Animals are randomly assigned to the treatment groups in clinical trials: So that a control group can be incorporated into the design
No, It is not because you do randomization that you need a control group. Acontrol group permits us to be comparative
Animals are randomly assigned to the treatment groups in clinical trials: So that the treatment groups are comparable with respect to any variables that are likely to influence response
Yes, this is why we do randomization.
What is the goal of randomization
a) All animals have same chance of receiving txt. b) The assignment of one animal has no influence on the assignment of any other animal. c) We cannot know in advance the txt that each animal is to receive. d) Permits us to use statistical inference (based on concept of random sampling). Randomization prevents allocation bias.
A study was conducted into the influence of spaying of bitches on their subsequent development of urinary incontinence. Young adult bitches presenting for spaying were randomly allocated into immediate ovariohysterectomy or to a deferred operation 6 months later. The bitches were followed during the 6 months period. Was this: A cross-sectional or longitudinal study and why?
Longitudinal study, because changes were investigated over time.
What is a cross sectional study?
Cross-sectional: All measurements done at one point in time. Does not take intoaccount temporal relationship between risk factors and disease state. Is descriptive.
What is a longitudinal study?
Longitudinal: Investigate changes over time. If prospective: values are measured after the start of the study. If retrospective: previous values are examined.
A study was conducted into the influence of spaying of bitches on their subsequent development of urinary incontinence. Young adult bitches presenting for spaying were randomly allocated into immediate ovariohysterectomy or to a deferred operation 6 months later. The bitches were followed during the 6 months period. Was this: An experimental or an observation study , and why?
Experimental, because we intervene, and measure the effects of the intervention.
What is an experimental study
Experimental: We intervene in a study. We observe and measure the effects of our intervention. Animals must be randomly assigned - there must be a control group as well as a treatment group(s). Laboratory experiments and clinical trials.
What is an observational study
Observational (see above also): Observe animals and make measurements of whatever values we are interested in. There is no intervention. Since there is no intervention, there is no control or treatment group.
A study was conducted into the influence of spaying of bitches on their subsequent development of urinary incontinence. Young adult bitches presenting for spaying were randomly allocated into immediate ovariohysterectomy or to a deferred operation 6 months later. The bitches were followed during the 6 months period. Was this: A case control study, a cohort study , or neither. Why?
Neither, since both are observational studies
What is a case control study?
Case-control study: In a case-control study of disease aetiology, we start by defining the groups of diseased and healthy animals; these are the cases and the controls, respectively. Then we assess whether the animals in the two groups have differences in past exposure to various risk factors. Retrospective because we have to go back in time in order to determine an animal’s exposure to the risk factor.
What is a matched design case control study
We choose the controls so that each control animal is matched with a case with respect to variables that may influence the development of disease, (breed, sex and/ or age) (matched design).
What is an unmatched design case control study
unmatched design in which the disease-free or control animals are selected from the population, but without any attempt at matching.
What is the advantage to a case control study
Advantage: quick, easy and less expensive, can be used when the diseaseoutcome is rare.
What is the disadvantage to a case control study
Disadvantage: losses to follow-up and, recall bias (there is a differential abilitybetween carers in remembering relevant facts about cases and controls relating to exposure), not suitable when exposures to the risk factor are rare.
What is a cohort study
Cohort study: In a cohort study of disease aetiology, we start by defining groups (cohorts) of disease-free animals according to the exposure of the animals in the groups to the factor(s) of interest. Generally, we follow these groups forward in time to see which animals develop the disease under investigation. (Prospective).
What is the advantage to a cohort study
Advantage: can be used to collect information on exposure to a wide range of factors, even rare ones, and on different outcomes.
What is the disadvantage to a cohort study
Disadvantage: If the disease outcome is rare, its time span can be quite long, it tends to be expensive and may suffer from inconsistencies.
What is a control group
Control group: In an experiment, a control group is a baseline group that receives no treatment, a placebo or a neutral treatment (or the standard procedure). To assess treatment effects, the experimenter compares results in the treatment group to results in the control group.
State the type of study this is: Study of the prevalence of feline leukemia in a cat colony
Cross-sectional
State the type of study this is: Review comparison of drug treatment vs radiation treatment for feline hyperthyroidism at a Referral center over last 10 years
Retrospective
State the type of study this is: Survey of cat owners on attitude to declawing.
Cross-sectional
Concerning a Case-Control study : are the following statements true or false. And explain why: We start with groups of diseased and healthy animals
Yes (that is the definition) as opposed to cohort (starts with two diesase-free groups).
Concerning a Case-Control study : are the following statements true or false. And explain why: Case control studies are prospective studies
No, they are retrospective. The cohort study is prospective.
Concerning a Case-Control study : are the following statements true or false. And explain why: We can determine relative risk
No, this concept (relative risk) is related to the cohort study.
What is the true risk of something
We usually analyse data from cohort studies by estimating the true risk of thedisease in the populations of animals that have been ‘exposed’ and ‘unexposed’ to the factor. The true risk of disease is the proportion of animals in a population of susceptible animals that develop the disease in the time interval under consideration;
What is the relative risk of something
The relative risk (RR), the ratio of the disease risks in the exposed and unexposed groups, provides a measure of the strength of the association between the disease and the exposure to the factor.
Concerning a Case-Control study : are the following statements true or false. And explain why: We can determine the odds ratio
Yes, the concept of odds ratio is related to the case-control study
What is the odds ratio
the ratio of two odds, usually the odds of disease in the group exposed to the factor divided by the odds of disease in the group not exposed to the factor.
Which of the following is not a type of Bias to avoid in a Clinical trial ? Allocation bias
Yes, avoided with randomization
Which of the following is not a type of Bias to avoid in a Clinical trial ? Assessment bias
Yes, avoided by blinding and having a control group
Which of the following is not a type of Bias to avoid in a Clinical trial ? Selection bias
Yes, avoided with randomization
- A new surgical implant is to be tested on dogs prior to human trials. a ) Why must the control dogs have the surgery done (except the implant is not put in)?
To negate the effect of the surgical procedure (and separate the effect of the surgery from the effect of the implant). (to remove one possible confounding variable).
b) If the carer and the assessor of the animals do not know which group the dogs belong to,
we say this is a Double blinded study.
What is a double blinded study
Double-blinded: neither the carer( s) of the animals nor the assessor of responseto treatment (test or control) is aware of which treatment each animal is receiving.
What is a single blinded study
Single-blinded: only one of these two parties, the carer or the assessor, is blind. If the response to treatment is objective, then it may be sufficient to have only the carer blind; if it is possible to distinguish the test and control regimens, perhaps because of experimental procedures, then it may not be feasible to make the carer blind.
What is meant by a positive control ?
A positive control group is a group of experimental units that receive a treatment that is known to cause an effect on the response.
What is meant by a negative control?
A negative control group is a group that receives a neutral or standard treatment, and the standard treatment may simply be nothing.
What are dependant variables
Dependant variable: A dependent variable is what you measure in the experiment and what is affected during the experiment. The dependent variable responds to the independent variable. It is called dependent because it “depends” on the independent variable. In a scientific experiment, you cannot have a dependent variable without an independent variable.Example: You are interested in how stress affects heart rate in humans. Your independent variable would be the stress and the dependent variable would be the heart rate. You can directly manipulate stress levels in your human subjects and measure how those stress levels change heart rate.
What are confounding variables
Confounding variables: Sometimes, we find that two or more variables are related to each other as well as to the response of interest, so that it is impossible to separate the effects of these variables on the response. The variables are then called confounders and the process is called confounding.
What is anthology
An anthology is a collection of literary works chosen by the compiler. It may be a collection of poems, short stories, plays, songs, or excerpts
What is a clinical trial
Clinical trial: any planned experiment that involves human or animal subjects, and is designed to assess the effectiveness of one or more treatments or preventive measures such as vaccines.
What is clinical pathology
Clinical pathology supports the diagnosis of disease using laboratory testing of blood and other bodily fluids, tissues, and microscopic evaluation of individual cells.
What is epidemiology
the study of disease patterns and their determinants in the population
(T or F)The null hypothesis for a test to compare two means states that:(a) The sample means are equal. T or F…
False: the null hypothesis concernsthe population means.
(T or F)The null hypothesis for a test to compare two means states that:(b) There is no significant difference between the population means. T or F…
True
(T or F)The null hypothesis for a test to compare two means states that: (c) There is no significant difference between the population means. T or F..
False: it is only the result of the test which is or is not significant, not the null hypothesis
How is a P-value obtained and how do we use/interpret this P-value?
A P value is obtained by running/calculating a statistical test. The P value indicates a probability, we use it to decide if we reject Ho or not. A very small P-value indicates that it is unlikely that we could have obtained the observed results if the null hypothesis were true, so we reject Ho.
What is descriptive statistics
Descriptive statistics are used to summarize data by using visual aids(graphs, diagrams and tables) and numerical measures to describe central location (mean, median, mode) and spread (eg. variance, standard deviation)
What are inferential statistics
- Inferential statistics are used to generalize from the sample something that can be applied to the whole population (estimation of a population’s parameter and hypothesis testing (to investigate a theory about the data))
Which is false?(a) Sampling error makes it “impossible” to prove something with statistics(b) If the difference between the sample mean and the population (true) meanis due to pure chance, we do not reject Ho(c) We can never accept Ho(d) Not rejecting Ho is equivalent to saying we accept Ho
(d) Not rejecting Ho is equivalent to saying we accept Ho (FALSE)
What is the sensitivity of a test? How is it determined? Give an example of a disease where we need a high sensitivity.
The sensitivity of a test is how well that test can it identify animals that have the disease. Sensitivity is determined with the following formula: true positives/(true positives+false negatives). We need a high sensitivity whenwe need to perform an exclusion test (such as urinary cortisol ratio for cushings) or when there is a great disadvantage to not diagnosing a disease early (basal cortisol level for addison).
When talking about sensitivity and specificity, what is a gold standard? Give one example of a gold standard test for a specific disease.
A gold standard is a diagnostic test with proven 100% sensitivity & specificity. Eg.: Kidney disease (renal clearance of a substance), Parvovirus (histology & IFA), Heartworm dz (necropsy).
What is the negative predictive value (NPV)? Why is it an interesting information to have for your clients?
Negative predictive value is the proportions of negative results in statistics and diagnostic tests that are true negative results. NPV = True negatives / (True negatives + False negatives). So a NPV indicates how likely is it that this patient really does not have the disease given that the test result is negative (how reassured your client should be)
What should you know about prevalence
What you should know is that with a low prevalence, you have an increase risk of having a false positive result. A positive heartworm test in kuujjuak (in a dog that has not traveled) has a lot of chances of being a false positive.
Which two graphs can you use to evaluate if your data is normally distributed?
Histogram and box plot
What does it mean when we say a statistical test is robust?
It means it is able to tolerate a certain deviation from the assumptions (for example, if the data is not quite normal, you can still use the test if it is robust).
Give an example where we would need to use a one sample t-test.
Any example involving one set of data that you want to compare to a known population mean.
What are the assumptions for the one sample t-test?
Normal distribution
Give an example where we would need to use an ANOVA
Any example where there is 3 or more sets of data.
What are the assumptions for the ANOVA?
Normal distribution, homoscedasticity
What do we mean when we say two sets of observations are dependent?
Animal or individual result that are related to any other, either within or between groups
If we have more than two samples to compare, why don’t we use multiple two- sample t-tests?
Because if we do multiple t-tests, we get multiple P-values so we increase our chance of finding a significant result by chance. (For one of those test we might be in the 5% chances of being wrong by rejecting Ho category)
What is a type I error?
It’s rejecting a true Ho (finding a significant difference where there is none).
What is the sampling error, why is it an important concept to understand when using the ANOVA?
The sampling error is the chance difference between a sample estimate and the true value of the population caused by random sampling. The ANOVA can estimate that sampling error and then uses this to tell us if there is a significant difference between the means of the samples we are testing.
Why do we need a second test (such as the Bonferroni or Tukey) after performing the ANOVA on our data?
The ANOVA tells us there is a significant difference somewhere between some of the groups, the second test pin points where.
What do we use the Pearson’s correlation coefficient for?
We use it to tell us if there is a (linear) relationship between two variables and to what degree does that relationship exist.
What does a correlation coefficient of 1 tell us, of -1, of 0?
If it is zero, there is no correlation (or some nonlinear relationship); if it is ± 1, it is perfect correlation (perfect positive or negative slope).
What does the slope of the linear regression tell us?
It is the slope of the line, gives the average change in y for a unit change in x
Why is it important to observe the scatter diagram when doing correlation and regression?
To identify outliers, atypical distribution or a non-linear relationship.
Why do we need non-parametric tests?
To enable us to run statistical tests on non-normally distributed data.
What is the disadvantage of using a non-parametric test?
Because we transform the data into ranks, we loose some precision.
What is PICO?
It the acronym that helps to ask a good answerable question in evidence based medicine. Patient or Population, Intervention, Comparison/control, Outcome.
What is a CAT (critically appraised topic)?
It is the systematic evaluation of clinical research papers in order to answer a clinical question.