module 2 visualing data abd outliers Flashcards
bar graphs are a popular way to summarize which type of data
categorical data.
is bars represent mean score in each category lines (called error bars) may be shown on top of bars to represent____
standard deviation
histograms are used to depict what kind of data
scale data.
if data is skewed, which measure of central tendency would be used to best describe data
median, because it takes extreme values info account but is not greatly impacting by them since it is in the middle.
why would a frequency polygon be used over a histogram?
can be useful when comparing multiple groups as adding multiple lines to one growth is easier to interpret than multiple bars.
a one way scatter plot
used a single acid to display the relative position of each data point in a group. this type of figure can be used with categorical or scale data, can be presenting horizontal or vertically.
box plots
only has one axis, they show a summary of the data instead of each data point.
the center depicts interquartile range. lines or whiskers projecting from the box on either side extend to the adjacent values( the most extreme observation in the data set that are no more than 1.5 times the height of the box beyond either quartile) anything beyond the adjacent values are considered extreme values and are plotted as individual dots
when can a box plot be used
where there are too many overlapping data points and that would be difficult to interpret as a scatter plot or one way scatter plot
what type of data would you use for a two way scatter plot
scale variables can also depict the relationship between two scale variables
line graphs
similar to two way scatter plots in that they represent the relationship between two scale variables how ever for line graphs each point on the x axis has a corresponding y value, which is not a requirement for scatter plots
what’s an outlier
something unusual or different or outside the norm
how would you identify potential outliers
by visualizing my data. extremely positive or negative values are easy to spot in box plots, scatter plots, and histograms
what constitutes and outlier?
- values that are more the two standard deviations above or below the mean
- values that are more than 1.5 times the IQR above Q3 or below Q1 ( values outside of whiskers in a box plot)
what is a research population?
the group of objects events people procedures or observations that a researcher is interesting in studying
dependent variable
what is being measured or the outcome of a study
approaches to sampling
random sampling and non random sampling
random sampling
random selection is used to choose people, objects, events or observations to be included in each sample, each often of interest has an equal change of being included with the sample
none random sampling
the items included in the study are selected for a reason(proximity, feasibility) non probability sampling
which graph type is best for showing changes over time?
bar chart
line graph
pie chart
histogram
line graph, they are good for showing trends or patterns across time points, like monthly case numbers or yearly vaccination rates
which chart is best for showing the frequency distribution of a continuous variable
histogram, they show often values fall into specific ranges, ideal for continued variables like height, blood pressure or income
true or false a bar chart can be used to display both categorical and numerical data
true
true or false line graphs should only be used for categorical variables
false, they are used for continuous or ordinal data across a time axis- not categorical labels
what is discrete data and what kind of visualization should i use for it
discrete data are countable separate values, cannot be broken into smaller pieces, no decimals or fractions eg number of people in a household
use bars charts or pie charts don’t use histograms
what is continuous data and what visualization would I use for it
These are measurable values, that can be broken down into fractions or decimals. Eg height or weight blood pressure temperature
Use histograms, line graphs or scatter plots
Would a number of prescription meds someone takes be discrete or continuous
Discrete
Would systolic blood pressure be discrete or continuous
Continuous
Would number of er visits per year be discrete or continuous
Discrete
How to cho see e correct graph or statistical test depending on
Type of variable - categorical, ordinal, continuous
Number of groups
What you’re trying to compare
Best graph type for categorical nominal data
Bar chart or pie chart eg gender blood type
Best graph for ordinal data
Bar chart or boxplot if numeric eg pain scale satisfaction level
Best chart for one continuous variable
Histogram, line graph, scatter plot or boxplot eg weight income time
Best graph for 2 continuous variables
Scatter plot eg bmi vs blood pressure
What statistical test would you run if you are comparing 2 groups and its continuous outcome
Independent t test eg compare average bmi for males vs females
What statistical analysis would you use if you are comparing 3 or more groups with continuous outcome
ANOVA eg comparing cholesterol levels across the ss age groups
What statistical test would you use to look at the relationship between 2 continuous variables
Correlation or regression eg Time spent exercising and stress level
What statistical analysis would you usebti compare proportions with categorical variables
Chi square test eg compare smoking rates by gender
What statistical test would you use to predict one variable from another
Regression eg predict weight from calorie intake
Four types of bias in research
Sampling bias
Survivorship bias
Response bias
Recall bias
This bias occurs when each member or item of the relevant population does not have an equal chance of ending up in the sample, or selection bias
Sampling bias, eg only picking people at one location
This bias occurs when participants give answers they believe the researcher wants to hear or what they think are socially acceptable answers
Response bias eg on a survey asked about sexual behaviour or alcohol consumption they may lie
This bias occurs when individuals or objects leave the study and the researcher continuous to measure the remaining participants without considering those that left
Survivorship bias eg a study exploring 12 week excercise program to see if it removes your risk of falls, but the ones who fell dropped out skewing the results
This bias occurs when participants do not remember past events accurately or omit details
Recall bias eg asking mother their comfort levels giving birth years after they’ve given birth
True or false histograms are for distributions not relationships
True