module 2 visualing data abd outliers Flashcards

1
Q

bar graphs are a popular way to summarize which type of data

A

categorical data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

is bars represent mean score in each category lines (called error bars) may be shown on top of bars to represent____

A

standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

histograms are used to depict what kind of data

A

scale data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

if data is skewed, which measure of central tendency would be used to best describe data

A

median, because it takes extreme values info account but is not greatly impacting by them since it is in the middle.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

why would a frequency polygon be used over a histogram?

A

can be useful when comparing multiple groups as adding multiple lines to one growth is easier to interpret than multiple bars.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

a one way scatter plot

A

used a single acid to display the relative position of each data point in a group. this type of figure can be used with categorical or scale data, can be presenting horizontal or vertically.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

box plots

A

only has one axis, they show a summary of the data instead of each data point.

the center depicts interquartile range. lines or whiskers projecting from the box on either side extend to the adjacent values( the most extreme observation in the data set that are no more than 1.5 times the height of the box beyond either quartile) anything beyond the adjacent values are considered extreme values and are plotted as individual dots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

when can a box plot be used

A

where there are too many overlapping data points and that would be difficult to interpret as a scatter plot or one way scatter plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what type of data would you use for a two way scatter plot

A

scale variables can also depict the relationship between two scale variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

line graphs

A

similar to two way scatter plots in that they represent the relationship between two scale variables how ever for line graphs each point on the x axis has a corresponding y value, which is not a requirement for scatter plots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what’s an outlier

A

something unusual or different or outside the norm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

how would you identify potential outliers

A

by visualizing my data. extremely positive or negative values are easy to spot in box plots, scatter plots, and histograms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what constitutes and outlier?

A
  • values that are more the two standard deviations above or below the mean
  • values that are more than 1.5 times the IQR above Q3 or below Q1 ( values outside of whiskers in a box plot)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is a research population?

A

the group of objects events people procedures or observations that a researcher is interesting in studying

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

dependent variable

A

what is being measured or the outcome of a study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

approaches to sampling

A

random sampling and non random sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

random sampling

A

random selection is used to choose people, objects, events or observations to be included in each sample, each often of interest has an equal change of being included with the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

none random sampling

A

the items included in the study are selected for a reason(proximity, feasibility) non probability sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

which graph type is best for showing changes over time?

bar chart
line graph
pie chart
histogram

A

line graph, they are good for showing trends or patterns across time points, like monthly case numbers or yearly vaccination rates

20
Q

which chart is best for showing the frequency distribution of a continuous variable

A

histogram, they show often values fall into specific ranges, ideal for continued variables like height, blood pressure or income

21
Q

true or false a bar chart can be used to display both categorical and numerical data

23
Q

true or false line graphs should only be used for categorical variables

A

false, they are used for continuous or ordinal data across a time axis- not categorical labels

24
Q

what is discrete data and what kind of visualization should i use for it

A

discrete data are countable separate values, cannot be broken into smaller pieces, no decimals or fractions eg number of people in a household

use bars charts or pie charts don’t use histograms

25
Q

what is continuous data and what visualization would I use for it

A

These are measurable values, that can be broken down into fractions or decimals. Eg height or weight blood pressure temperature

Use histograms, line graphs or scatter plots

26
Q

Would a number of prescription meds someone takes be discrete or continuous

27
Q

Would systolic blood pressure be discrete or continuous

A

Continuous

28
Q

Would number of er visits per year be discrete or continuous

29
Q

How to cho see e correct graph or statistical test depending on

A

Type of variable - categorical, ordinal, continuous

Number of groups

What you’re trying to compare

30
Q

Best graph type for categorical nominal data

A

Bar chart or pie chart eg gender blood type

31
Q

Best graph for ordinal data

A

Bar chart or boxplot if numeric eg pain scale satisfaction level

32
Q

Best chart for one continuous variable

A

Histogram, line graph, scatter plot or boxplot eg weight income time

33
Q

Best graph for 2 continuous variables

A

Scatter plot eg bmi vs blood pressure

34
Q

What statistical test would you run if you are comparing 2 groups and its continuous outcome

A

Independent t test eg compare average bmi for males vs females

35
Q

What statistical analysis would you use if you are comparing 3 or more groups with continuous outcome

A

ANOVA eg comparing cholesterol levels across the ss age groups

36
Q

What statistical test would you use to look at the relationship between 2 continuous variables

A

Correlation or regression eg Time spent exercising and stress level

37
Q

What statistical analysis would you usebti compare proportions with categorical variables

A

Chi square test eg compare smoking rates by gender

38
Q

What statistical test would you use to predict one variable from another

A

Regression eg predict weight from calorie intake

39
Q

Four types of bias in research

A

Sampling bias
Survivorship bias
Response bias
Recall bias

40
Q

This bias occurs when each member or item of the relevant population does not have an equal chance of ending up in the sample, or selection bias

A

Sampling bias, eg only picking people at one location

41
Q

This bias occurs when participants give answers they believe the researcher wants to hear or what they think are socially acceptable answers

A

Response bias eg on a survey asked about sexual behaviour or alcohol consumption they may lie

42
Q

This bias occurs when individuals or objects leave the study and the researcher continuous to measure the remaining participants without considering those that left

A

Survivorship bias eg a study exploring 12 week excercise program to see if it removes your risk of falls, but the ones who fell dropped out skewing the results

43
Q

This bias occurs when participants do not remember past events accurately or omit details

A

Recall bias eg asking mother their comfort levels giving birth years after they’ve given birth

44
Q

True or false histograms are for distributions not relationships