Lesson 3 Flashcards

Question

variance- describe roughly what this is

Answer 1

roughly the average squared deviation from the mean

Answer 2

1)difference between mean and each observation 2)square each deviation and add them up 3)then divide by sample size by n-1 --answer is squared

Answer 3

doing this makes the statistics slightly more reliable and useful

Answer 4

s^2 = sample variance sigma ^2= population variance

Answer 5

-get rid of negatives so that negatives and positives don't cancel each other out when added together -increase larger deviations more than smaller ones so that they are weighted more heavily

Answer 6

distance of an observation from its mean

Answer 7

lowercase sigma

Answer 8

roughly the average deviation around the mean and has the same units as the data formula= square root of the variance

Answer 9

Distributions where less observations are clustered around the center are more variable.

Answer 10

-range of the middle 50% of the data, distance between first quartile(25th percentile) and third quartile (75th percentile) IQR = Q3-Q1 -best to use box plot to visualize -IQR is more reliable in looking at spread because it doesn't look at values which could be outliers

Answer 11

measures on which extreme observations have little effect eg data. mean. median 1,2,3,4,5,6 3.5. 3.5 1,2,3,4,5,1000. 169. 3.5 Here the median is more robust.

Answer 12

---------- robust. non-robust center. median. mean spread. IQR. SD, range Median and IQR= more robust when looking at data with skewed with extreme observations Mean and SD= best for looking at symmetric observations

Answer 13

-rescaling of the data using a function -when data are very strongly skewed, we sometimes transform them so they are easier to model

Answer 14

-often applied when much of the data cluster near zero(relative to the larger values in the data set) and all values are positive -can be easier to analyze because outliers become less extreme, data is more symmetric, less skewed -make the relationship between the variables more linear and easier to model with simple methods

Answer 15

plot the square root or the inverse square root

Answer 16

-see data structure differently -reduce skew to assist in modeling -straighten a nonlinear relationship in a scatterplot

Answer 17

mu (micro)

Answer 18

-summarizes data for two categorical variables -shows number of times a particular combination of variable outcomes occurred, along with column totals and rows totals

Answer 19

way to display single categorical variable -x-axis shows categories

Answer 20

bar plot -shows discrete or categorical variables -x-axis shows categories -bars can be rearranged histogram-depicts the frequency distribution of variables in a dataset -x-axis shows numbers -the bars cannot be rearranged

Answer 21

-counts divided by their row totals ie. 3496 renters/8505 total = 0.441 -can be displayed in a contingency table

Answer 22

stacked standardized version of stacked side-by-side

Answer 23

-useful if the primary variable in the stacked bar plot is relatively unbalanced -downside is that we lose all sense of how many cases each of the bars represents

Answer 24

- agnostic in their display about which variable if any represents the explanatory and which the response variable -easy to see the number of cases in the group combinations -downside is that it can require more horizontal space -downside is that it can be difficult to to discern if there is an association between two variables if two groups are of very different sizes

Answer 25

- when one variable is the explanatory variable and the other is the response

Answer 26

plot suitable for contingency tables that resembles a standardized stacked bar plot with the benefit that we still see the relative group sizes of the primary variable as well - the x-axis shows the width of the columns based on area representing relative proportion -the y-axis can be split into different variables

Answer 27

it can be difficult to see details in a pie chart which are more obvious in a bar plot, especially for comparing groups

Answer 28

good for comparing across groups

Answer 29

compare numerical data across groups -outlines of histograms of each group put on the same plot

Answer 30

a model to test when the variables are independent and any observed result is due to chance

Answer 31

a model to test when the variables are not independent, the observed result is not due to chance

Answer 32

testing whether a different randomization will affect result eg, take 20 notecards to represent 20 subjects

Answer 33

one field of statistics, evaluating whether differences are due to chance

Lesson 3 Flashcards

(58 cards)