Quantitative Data Analysis Flashcards
What is data analysis?
- Methodology by which individual data points are rendered into meaningful and intelligible information
- Product of data analysis in research is knowledge
- Techniques will differ based on design (descriptive or inferential stats)
- Analysis determines trends and patterns of relationships based on the data
What is inferential statistics?
- Allows researchers to estimate how reliably they can make predictions and generalize findings on the basis of the data (inference = conclusion based on evidence and reasoning)
- Use of a stat created from a smaller group to draw a conclusion about a population via math processes and logic to test hypothesis about a population on a sample
- Allows researchers to test hypotheses about a population using data obtained from probability and non-probability samples
- Parametric or non-parametric
What is descriptive stats?
- Description and/or summarization of sample data, reduced into manageable proportions
- Allow researchers to arrange data visually to display meaning and to help in understanding the sample characteristics and variable under study
- In some studies, descriptive stats may be the only results sought from stat analysis
What is the purpose of descriptive stats?
- Reduce data to manageable proportions by summarizing them
- Measure of central tendency; scatter plot; range; mean; average; percentages
What is a level of significance (alpha level)?
- Probability of making a type 1 error (0.05)
- The smaller the better, anything greater means it isn’t significant
- Researcher willing to accept that if study was done 100 times, decision to reject the null hypothesis would be wrong 5 times out of those 100 trials
- Can set probability at 0.01 if wanting a smaller risk of rejecting a true null hypothesis
What is the mean and median?
- Measure of an average score, most frequently reported in stats
- Median (score in middle of list) best indication of a typical score that accounts for outliers
- Mode is the most frequent score
How do we choose a statistical test?
Appropriate stat procedure is a function of:
1) The research design
2) The level of data provided by the data collection instrument
3) Sampling procedure
What designs might we see in descriptive stats?
- Exploratory descriptive designs (case studies)
- Correlational designs
What designs might we see in inferential stats?
- Correlational designs
- Comparative designs
- Experimental and quasi-experimental designs
What is nominal measurement?
- The assignment of numbers to simply classify characteristics into categories
- Sometimes called “dummy variables” (Used to quantify variables)
- E.g. Gender, Marital status, Religious affiliation. If you are a female you get a 1, male a 0, etc.
What is ordinal measurement?
- Permits the sorting of objects on the basis of their standing on an attribute relative to each other
- A higher score is better (or worse), but how much better (or worse) is not known (Class ranking, Likert scale responses)
- E.g. 1 if you strongly agree, 5 if you strongly disagree.
What is interval/ratio measurement?
- Determines both the rank ordering of objects on an attribute and the distance between those objects
- E.g. scores on an intelligence test; temp/BP, height/length
What is the range?
Describes the variability or differences between the highest and lowest scores
What is standard deviation?
Another indication of variability, calculated from the average differences from the mean
What is a p-value?
- Explains statistical significance
- Needs to be p<0.05 or <0.01 to have a significance; the smaller indicates that results are likely d/t experimental intervention, not chance
- This # states the researcher admits that there could a 5% chance the results found in a study are due to chance alone
Describe the relationship between stat significance and clinically meaningful results:
- Results can be stat significant but not clinically significant or important, and vice versa
- Findings should be stat significant and clinically meaningful before considering them in practice
What is psychometrics?
- Theory of measurement involved in development of measuring tools
- Psychometric assessment refers to eval of validity and reliability of an instrument before making it available for use
- Often indicated by Cronbach’s alpha (co-efficient alpha)
- Reliability of 0.80 is considered lowest acceptable coefficient for a well-developed tool
What is one-tailed test of significance?
- Used with a directional hypothesis (extreme stat values occur on a single tail of the normal curve)
- One-tailed more powerful but requires more knowledge to predict direction
- Two-tailed test of significance occurs with a non-directional hypothesis and assumes extreme scores occur in either tail of the normal curve
What is parametric?
- More powerful** and more flexible than non-parametric
- Assumes normal distribution of data (e.g. random sample)
- Used with interval and ratio variables
- Goes with our probability sampling
- E.g. t-test, ANOVA
What is non-parametric?
- Less powerful **
- Not based on the estimation of population parameters
- Does not assume normal distribution within sample (e.g. not random sample)
- Used with nominal or ordinal variables
- Non-p goes with our non-probability sampling
- E.. Chi-square test
What is confidence intervals?
- An estimated range of values that provides a measure of certainty about the sample findings
- Most commonly reported in research is a 95% degree of certainty, meaning 95% of the time, the findings will fall within the range of values given as the CI
- Gives an idea of how people were answering, the range of scores, etc.
How do we critique data analysis?
- Are data analysis procedures clearly described?
- Were appropriate statistical tests used given the level of measurement that is used to describe each of the major variables?
- Are results presented in an understandable way? Is it presented as just a table or is there actual paragraphs discussing it?
- Are the results significant?
How do researchers present their findings?
- Results section present the raw data and analysis
- Discussion section interprets the results and findings (what does the results tell us, limitations, implication for practice)
How do we interpret results?
- What are the major findings?
- Are the findings accurate and discussed in relation to problem/purpose, hypothesis, framework and lit review?
- Are various explanations for the findings examined?
- Do the conclusions fit with the findings?
- What do the results mean?
- Are the results important?
- Are the results generalize-able?
How do we critique implications and recommendations?
- Do the researchers discuss the study’s implications for clinical practice, education, administration and research?
- If yes, are the stated implications appropriate, given the study’s limitations?
- Are there important implications that the researchers neglected to include?
- How do the findings contribute to current knowledge?
Interpretation should take into account:
- The study’s aims
- Theoretical framework
- Existing related literature
- Limits of research method
Statistical procedures are used to:
Organize and give meaning to data
What makes up the results section?
The data generated from the testing of the hypothesis or research questions
What are different descriptive stats techniques?
- Measures of central tendency (average of the sample via mode, median or mean)
- Measures of variability (range and standard deviation)
- Correlation techniques, such as scatter plots (representation of strength and magnitude of relationship between variables)
What is levels of measurement?
- Categorization of the precision with which an event can be measured - from low to high are nominal, ordinal, interval and ratio
- Levels determine type of stats to be used in analyzing data
Define: Nominal measurement
- Organizing variables or events into categories that are mutually exclusive; a variable either has or does not have the characteristics of a particular category
- Numbers assigned to categories are simply labels and do not indicate anything about the characteristic (e.g. males are 1, females are 2)
- Frequency, range and mode
Define: Ordinal measurement
- Relative rankings of variables or events; numbers assigned to each category can be compared (e.g. members of a higher ranking category has more of an attribute compared to lower numbers) (e.g. class ranking, ability to carry out ADL’s, ranking of disease progression such as CKD)
- Range, percentile, rank order coefficients of correlation, mode, median
Define: Interval measurement
- Variables ranked on a scale with equal intervals between numbers, but zero is arbitrary and does not represent an absence (e.g. measuring temps on a C scale)
- Allows for more manipulation of data
- Mean, standard deviation, mode, median, range, percentile
Define: Ratio measurement
- Events or variables ranked on scales with equal intervals and absolute zeros; number represents actual amount of the property object possesses
- Common in the physical sciences (e.g. height, weight, pulse, BP)
- All mathematical processes can be performed with ratio scales data
- Mode, median, mean, range, percentile, standard deviation
Define: ANOVA
- Statistic that tests whether group means different from each other; instead of testing each pair of means separately, ANOVA considers the variation among all groups
- Tests for differences between means; can be used for 2+ groups
- Interval or ratio data
- Stat significance indicated by p-value
Define: Chi Square
- A non-parametric stat used to determine whether the frequency found in each category is different from the frequency that would be expected by change
- Determines if 2 variables are independent or related
- Nominal or ordinal data
- Significance indicated by p-value
- Not very powerful, requires large population
Define: Frequency distribution
A descriptive stat method that summarizes the occurrences of events under study
Define: Correlation
Degree of association between two variables
Define: Factor analysis
Strategy for assessing construct validity in which a stat procedure is used to determine the underlying dimensions or components of a variable, and to assess the degree to which the individual items on a scale truly cluster around one or more dimensions
Define: Level of significance (alpha level)
Risk of making a type 1 error, set by researcher before study begins
Define: Levels of measurement
Categorization of the precision with which an event can be measured (nominal, ordinal, interval and ratio)
Define: Mean
Measure of central tendency; average of all scores
Define: Median
Measure of central tendency; the middle score (50% of scores above and 50% of scores below)
Define: Mode
Measure of central tendency; the most frequent score ore result
Define: Measures of central tendency
Descriptive stat technique that describes the average member of a sample (e.g. mean, median and mode)
Define: Measures of variability
Descriptive stat technique that describes the level of dispersion (distribution) in sample data
Define: Non-parametric tests of significance
Inferential stats that make no assumptions about the population distribution
Define: Parametric stats
- Inferential stats that involve estimation of at least one parameter, require measurement at the interval level or higher, and involve assumptions about the variables being studied; usually include the fact that the variable is normally distributed
- Interval or ratio level of data
- More powerful **
Define: Non-parametric stats
- Stats that are usually used when variables are measured at the nominal or ordinal level because they do not estimate population parameters and involve less restrictive assumptions about the underlying distribution
- Nominal or ordinal level of data
- Less powerful **
What is a Pearson correlation coefficient?
- Statistic that is calculated to reflect the degree of relationship between two interval level variables
- Can use interval or ratio data
- R2 is the percentage of variance explained by relationship
Define: Standard deviation
- Measure of variability; measure of average deviation of scores from the mean
- Used in calculation of many inferential stats
Define: Range
The distance between the highest and lowest scores
What are the two purposes statistical inference is generally used?
1) Estimate probability that stats found in sample accurately reflect population parameter
2) Test hypotheses about a population
What are some stats used in non-parametric testing?
- Man-Whitney U-test
- Druskal-Wallis test
- Chi-square test
- Spearman’s rho
- Kendall’s tau
- Phi coefficient
What are some stats used in parametric testing?
- T-test
- Pearson’s correlation
- ANOVA
- Regression
Describe interpretation values of Pearson’s correlation’s:
- R between -1 and +1 (degree of relationship between variables)
- R = .1 to .3 is weak
- R = .3 to .5 moderate
- R > .5 strong
What is a regression analysis?
- Predicts value of one variable (DV) when we know the value of one or more other variables
- Interval or ratio level
- Stat significance indicated by p-value
Define: P-value
Conditional probability of obtaining, from study data, the value of the test stat that is at least as extreme as that calculated from the data, given that the null hypothesis is true
What is a t-test?
Tests for significant differences between 2 samples, using interval or ratio level data