module 4 Flashcards

1
Q

contingency tables

A
  • shows frequency of sampling units
  • tables of data frequencies within diff levels of categorical data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

types of contingency tables

A

one-way and two-way tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

calculate marginal distributions as frequencies

A
  • Row: sum frequencies across all columns for each row
  • Column: sum frequencies across all rows for each column
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

calculate marginal distributions as proportions

A
  • Table total: sum all frequencies in the table
  • Row: sum frequencies across all columns for each row and divide by table total
  • Column: sum frequencies across all rows for each column and divide by table total
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

marginal distribution

A
  • row and column sums of a two-way contingency table.
  • can be shown as frequencies or proportions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what do marginal distributions show

A
  • how many sampling units are in each level of a categorical variable w out the need of other categorical variables
  • they describe overall patterns in the sample
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

conditional distributions

A
  • two-way tables that show the proportion of sampling units for one variable within each level of the second variable
  • shows the relationship between two variables
  • shown as a separate table
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how do you create a conditional distribution table

A
  • calculated from contingency table and marginal distribution
  • select one of the categorical variables to be primary and one to be secondary (aka conditional)
  • take the frequency from the contingency table and divide it by the marginal distribution of the primary variable
  • basically: take the value and divide it by the sum of the row/column
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how do you choose primary and secondary variables in calculating a conditional distribution

A
  • depends on the question being asked
  • ex. are there more _____(primary) than _____ (primary) in the _____(secondary) category? or how many ppl like _____(secondary) when doing _____(primary)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what do the primary and secondary variables in calculating conditional distributions determine

A
  • if you use the row or column marginal distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what do conditional distributions show

A
  • relative frequency of secondary variables within each level of the primary variable
  • shows how the secondary variable changes across the primary variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

t or f: bar graphs are only used to visualize single variable categorical data

A

false, single and two variable categorical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

t or f: bar graphs are good at visualizing numerical data

A
  • false, only acceptable in one case as it only shows average numerical value
  • acceptable: stat datasets have categorical info on many sampling units, data is not statistical in nature
  • not acceptable: if data is from a statistical population w one numerical and one categorical value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

t or f: bar graphs can be horizontal or vertical

A

true, choice depends on focus of research question with more relevant info on the horizontal axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

how do you display data w two categorical measurement variables in a bar graph

A
  • designate one variable as grouping variable (base of the figure, level of other variable are shown within it) it is whichever variable shows the info more clearly
  • decide whether to create the figure as a grouped or stacked bar graph
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are the two types of two variable bar graphs

A
  • grouped: variables are separate but shown beside each other in groups for each variable on the x axis
  • stacked: variables are stacked on top of each other, just one bar per variable on the x axis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

when should you not use stacked bar graphs

A

when two different variables you want to compare have the same value

18
Q

each bar or group of bars in a bar graph should be separated by a ______

A

gap

19
Q

steps in making a histogram

A
  • divide the numerical variable into bins of equal size
  • count how many sampling units fit within each bin (frequency)
  • create a plot w each bin having a bar w a height equal to that bins frequency
19
Q

t or f: histograms show the separation of variables through a gap between the bars

A

false

19
Q

histograms

A
  • split numerical data into bins and display the number of sampling units of each bin
  • for numerical data
20
Q

what are the lines drawn from the edge of the box to the last data point within the extreme threshold in box plots called

A

whisker plots

21
Q

box plot

A
  • based on quartiles
  • shows 5 descriptive stats (minimun, 1st quartile, median, 3rd quartile, and maximum)
  • shows interquartile range
  • shows how numerical variables change across multiple categorical groups
  • equally spaced categorical groups across the x axis with a box for each group drawn
22
Q

t or f: box plots can sometime show extreme values

A

true

23
Q

what are the four parts of a box plot

A
  • a box (drawn between the 1st and 3rd quartile ranges, showing the interquartile range)
  • a solid line (drawn at the median)
  • whiskers (drawn from edge of box to the last data point within the extreme threshold)
  • extreme values (symbols drawn overtop data points outside the extreme threshold
24
Q

extreme threshold

A
  • temporary reference line used to draw the whiskers and extreme values
  • The thresholds are drawn at 1.5 X the interquartile range above the top of the box and below the bottom of the box. They are removed in the final graph.
25
Q

in an observational study, the categorical group is a ________, and in an experimental study the categorical group is _______

A

measured categorical value, the treatment factors

26
Q

secondary variable for box plots

A
  • shown within each level of the grouping variable
  • levels often shown in a legend
27
Q

what to do when there are two categorical groups for a box plot

A
  • designate one categorical variable as the grouping variable and one as the secondary variable
  • draw a boxplot using numerical data within each level of the two categorical variables
  • grouping variables have large gaps between levels
28
Q

grouping variable for box plots

A
  • shown on the x axis of grouped box plots
  • levels shown often on the x axis
29
Q

when to use box plot vs histogram

A
  • boxplot: if you have many categorical groups or arent interested in shape of data (show median, quartiles, and quartile ranges, easy to compare across categorical groups)
  • histogram: if you have a small number of categorical groups and want to see the shape of the data (info abt how data is distributed, shows the shape of distribution, difficult to look at numerical variables across categorical groups)
30
Q

scatterplot

A
  • used to show pattern between two numerical variables collected from different sampling units
31
Q

line plot

A
  • used when data is collected repeatedly from the same sampling unit
32
Q

each point on a scatter plot is a ____ _____

A

sampling unit

33
Q

name of the axis in scatterplots for experimental studies

A
  • when one variable is treatment and the other is response, x-axis=independent variable (treatment) and y-axis=dependent variable (response)
  • when both variables are measured quantities, both axes are called covariates (evaluating pattern)
34
Q

name of the axis in scatterplots for observational studies

A
  • both numerical variables are measured quantities called covariates
35
Q

scatterplot: for both observational and experimental studies, when the goal of a test is to evaluate whether one variable can predict the other, the x-axis is typically called the _____ and the y-axis the ______

A

predictor variable, response variable

36
Q

scatterplot: for both experimental and observational studies when the goal of a test is to evaluate the association between numerical variables, both axes are called ______

A

covariates

37
Q

in scatterplots, if extra variables are categorical they are differentiated using _____ but if they are numerical you use _____ to show the difference. both of these are shown in a ______

A

different symbols, different size or colour, legend

38
Q

discrete vs continuous numerical values

A
  • discrete= exact figure you can count
  • continuous= range of info, growing