Data Analysis Flashcards
What is quantitive data?
data presented with numbers which allows for quick comparison between individuals
What is qualitative data?
data presented with words
- provides depth/detail of situation
What are strengths of using quantitative data?
- means/ranges can be calculated
- easy to enter numbers into tables/display data in graphs or charts
- precise details used
- easy to check for reliability
- easy to test for hypotheses
- easy to analyse
What is a limitation of using qualitative data?
- can be difficult/time consuming to analyse as involves looking for trends and/or categorisation
- subjective
- hard to test hypotheses
What are strengths of using qualitative data?
- allows for detailed descriptions; rich/informative data
- useful for attitudes, opinions, beliefs
What are limitations of using quantitative data?
- reduces complex behaviour to a number
- important information may be lost
What is primary data?
data collected/observed directly from first-hand experience by researcher for the purpose of their particular investigation
What are the strengths of using primary data?
- control researcher has; data collected designed to fit aims and/or hypotheses of the study
- not been altered in any way by any other researchers, reduces likeliness of investigator bias or subjectivity
What are limitations of using primary data?
- lengthy and time consuming, possibly expensive
What is secondary data?
data collected by someone other than the researcher (usually for a purpose that differs from that of the researcher)
What are the strengths of using secondary data?
- no need to design study, go through ethical committees, collect participants etc; more convenient and less expensive to obtain
- possible may have already been subjected to inferential statistical testing, known whether or not it is significant
What are the limitations of using secondary data?
- for some studies, the data will not fit the specific aims and/or hypothesis of the current researcher, may not match their needs
- may be substantial variation in the quality and accuracy of secondary data, information may appear valuable initially, but turns out to be incomplete
What is meta-analysis?
method where, rather than conducting research, primary data from other studies is re-analysed and consequently, uses secondary data - data from a large number of studies is combined
What are the strengths of using meta-analysis?
- technique is useful when a number of small studies have found contradictory or weak results as by combining the data from these studies it may be possible to identify common trends that are not noticeable in a single study
- reviewing the results from a number of studies, rather than just one, can increase the validity of the conclusions drawn as they are based on a larger sample of participants
What are the limitations of using meta-analysis?
- individual studies may have different designs so may not be truly comparable, may lead to a misleading conclusion
- it’s difficult to come up with the right criteria for accepting/rejecting studies to be part of the meta-analysis
- problem of publication bias (file-drawer problem) studies that give positive results may be over-represented in meta-analysis and any conclusions based on these studies will not take into account the studies that failed to get published
What is nominal data?
or categorical; the lowest level - measuring the frequency of occurrence in each category
What is ordinal data?
measurements place in rank order or in terms of relative position (in relation to others in the group)
What is interval data?
when the data measured on a scale are made up of equal units
What is ratio data?
same as interval; when data measured on a scale is made up of equal units BUT, ratio has a fixed 0 (no negative values) e.g. weight, height, temperature
What is the mode?
when data is arranged in numerical order and the value which occurs most frequently is identified
When/why is using the mode useful?
- for nominal data
- not affected by outliers
- can make more sense than average (e.g. for age just saying 2 rather than 2.4)
When/why is using the mode not useful
- there can be more than one mode in a set of data (data is bimodal) making it more difficult to use the mode as a summary value in the data
- does not take into account all the other values, loses a lot of information
What is the median?
when data is arranged in numerical order and the middle value/mid-point is selected, if it lies between two numbers, work out the mean of these two values
When/why is using the median useful?
- most appropriate for ordinal data or skewed distributions
- not affected by outliers
When/why is using the median not useful?
- some information is lost as the raw scores are not used in the calculation
What is the mean?
when values are added up and then divided by the total number of values
When/why is the mean useful?
- most appropriate with interval/ratio data, symmetrical distributions with no extreme values
- includes information from all the items of data so is the most sensitive measure of central tendency (least information is lost)
When/why is the mean not useful?
- if the data is skewed (outliers)
- mean may not be one of the original values (e.g. family does not have 3.2 children) so may be misleading
- if the distribution is bimodal, again may be misleading
What are measures of dispersion?
how spread out data is from around the mid-point e.g. range, interquartile range, standard deviation
What is the range?
calculated by subtracting the lowest from the highest value in the data set (often researchers add 1)
What are the strengths of using the range?
- easy/quick to calculate
What are the limitations of using the range?
- includes end values, may be distorted by outliers
- only having information from end scores contains no information about whether the values are spread evenly or clustered
What is standard deviation?
measures how spread out a set of values are around the mean value - the larger the standard deviation, the larger the spread of scores are within a set of data
How do you calculate the SD? (very unlikely this will come up)
- calculate the mean
- subtract mean from each value in data set to find the difference between each value and the mean
- square each of these (get rid of -)
- find the sum of all of these squared differences
- divide by population/sample (variance)
- find the square root of the variance
What are the strengths of using SD?
- easy/quick to calculate
What are the limitations of using SD?
- includes end values, may be distorted by outliers
- only having information from end scores contains no information about whether the values are spread evenly or clustered
What is a summary table?
includes descriptive statistics, common to include a paragraph or two after explaining what results show
What is a contingency table?
all possible contingencies included, often for nominal data and shows the frequency of occurrences in each category (e.g. as well as showing those speeding, show also not speeding - so that wrong conclusions are not drawn)
What is a line graph and when do we use it?
show continuous data, how one variable changes in respect to another (e.g. time)
What are pie charts and when do we use them?
used to show the relative proportions of different categories, show the frequency of each category as as percentage
What are scattergrams/scattergraphs and when do we use them?
used to represent data from correlational research, each pair of values plotted, one against the other, to determine if a consistent trend is apparent
What are bar graphs and when do we use them?
shows data in the form of categories which the researcher wishes to compare (e.g. males with females) categories go alone x-axis, y-axis = IV, height of bar represents frequency; used for discrete variables
What is a histogram and when do we use it?
used for continuous variables, rather than discrete, continuous variable plotted on x-axis indicated by no space between bars, y-axis must show frequency with which value on the x-axis occurs
What is a frequency polygon and when do we use it?
very similar to histogram and one variable on the x-axis must be continuous, drawn by drawing line from midpoint of each bar in a histogram to the midpoint on the next
- advantage: 2+ frequency distributions displayed on the same graph, allow for comparisons to be made
What is a distribution?
the pattern that can be seen on a graph, normal, positively skewed or negatively skewed
What is a normal distribution?
an arrangement of data that is symmetrical and forms a bell shaped pattern where the mean, median and mode all fall in the centre at the highest peak (can be bimodal)
What is a skewed distribution?
an arrangement of data that is not symmetrical data is clustered to one end of the distribution
What order are mean, median, mode in a negatively skewed distribution?
Mean, median, mode - possibly when a task is too easy and so participants might be expected to get a high score (ceiling effect);(left foot)
What order are mean, median, mode in a positively skewed distribution?
Mode, median, mean - may occur if task is too difficult (floor effect);(right foot)
What are inferential statistics?
the ways of analysing data using statistical tests that allow the researcher to make conclusions about whether a hypothesis was supported by the results
What is the minimum level chosen for research and what does it mean?
P < 0.05 - the probability the observed value is down to chance is less than 5% chance
When might a level lower than P < 0.05 chosen?
P < 0.025 or P < 0.01 - more stringent levels used if study cannot easily be checked by replication or there is an aspect of risk involved
What is a type 1 error?
when we reject the null hypothesis but we shouldn’t and the result was actually down to chance
- increased chance when the we set the level of significance too low
What is a type 2 error?
when we retain the null hypothesis, but there was actually a real effect taking place and we should have rejected it
- increased chance when we set the level of significance too high
How do we calculate the sign test?
- collect data in a table
- make sure level is NOMINAL - look at difference between second and first rating and see if it is positive or negative
- add the number of times the less frequent sign occurs (this is S - the observed/calculated value)
- to see if the difference between the two conditions is significant, chose the correct statistical table - if the observed value (s) is less than/equal to the critical value for a given level of significance, the null hypothesis can be rejected