Midterm 1 Flashcards

Question

Histogram

Answer 1

- -horizontal line to cover range of date - -divide range into classes of width - -count # ind. in each class - construction bar over each class with HEIGHT being a percentage of total (frequency) Advantage over stem plots? data can be any SIZE

Answer 2

Stem (all but last digit) vertical, leaf to right dot plot - -have x axis of values and dot by frequency - -usually used for discrete quantitative

Answer 3

Shape - --symmetric and bell shaped - --right/left skewed (left skewed has long tail to left and hump on the right) - --bimodal (two peaks) - --flat or uniform (flat and same across) center - --the median (half both sides) (not influenced by outliers) - --mean - -mode spread --how varied is data? - RANGE - look at min. and max.

Answer 4

summarize QUANTITATIVE variables center ---mode = value at "peak" - value with HIGHEST freq. ---median = the middle value, denote by M, 1/2 area to right and 1/2 to left ---mean = the center of gravity consumer alert - on news either median or mean can be called the "average" - -if symmetrical they are approx. equal - -median is "resistant" to outliers and long tails - -mean has desirable properties for inference - -use median of skewed or outliers are present and use mean if roughly symmetrical notation to find mean x(bar) = 1/n*(sum of xi)

Answer 5

Range = max. - min. - HIGHLY affected by outliers IQR - range occupied by middle 50% of data (3rd Q - 1st Q) ---highly clustered if small - if large compared to range it is less clustered (Q3 - Q1) --IQR is resistant to outliers IQR - -1st Quartile - with approx. 25% of observations and 75% above - -2nd quartile = the median

Answer 6

1. if distribution is long tailed and value is legitimate: keep outlier 2. if values produced under diff. conditions than rest of data set: remove outlier 3. if value is mistake or typo: - -correct if possible, otherwise remove 1.5 X IQR - then do Q1 - (total) and Q3 + total to see if outliers

Answer 7

``` 5 # summary --min. --Q1 --median --Q3 --Max (if Median to Q1 is SMALLER than median to Q3 can know it is right skewed) ``` box plot is made from 5th summary - -whiskers go to non-flagged values (no outliers) - -flagged = outliers advantage of box plot is can be used to compare several distributions next to each other easily

Answer 8

measures for both overall spread and clustering - --quantifies spread by measuring how far from mean - -NOT resistant to outliers - -should be paired with the mean s = radical((sum of (x - mean)^2) / n-1)

Answer 9

within 1 s = 68% within 2 s = 95% within 3 s = 99.7% so means: .15, 2.35, 13.5, 34, 34, 13.5, 2.35, .15

Answer 10

- -use prob. to take sample data and make inference | - -game with car and goats - 2x more likely to win if switch (bc odds of winning greater with less options)

Answer 11

ind. outcome unpredictable but outcome from large # of reps follows a regular pattern (rolling a die)

Answer 12

sample space --set of all possible outcomes (for # of dots on die) event --collected of possible outcomes (we can write event "rolling an odd die"

Answer 13

proportion of times that an outcome occurs in many, many repetitions of random phenomenon ``` P(A) = 0 will not happen P(A) = .5 1/2 chance will to 1/2 chance won't P(A) = 1 = will happen ```

Answer 14

Empirical - -approximate by playing the game many times and OBSERVING the frequency of occurrence - --find by DOING law of large numbers --as # trials (Repetitions) of experiment/game increase, the relative frequency gets closer to the theoretical prob. of the event

Answer 15

set of possible outcomes in sample space and Prob! associated with each outcome (as a percent) - -prob. must sum to 1 - -can be represented by table, formula or graph

Answer 16

random variable --characteristics measured on each indiv. (cost, height, gender) cont. random var. - -variable that can take on any value in an interval so that all possible values cannot be listed (time, height, temp.) discrete random var. --var. whose possible values are a list of distinct values (gender, opinion, # arrests, shoe size)

Answer 17

1. discrete categorical ---random var. = major --distribution table - bar graphs also used (list the majors and the prob. of grad students in them underneath like in math 118) --or in a bar graph with list of categorical majors on x axis and prob. on y axis --DO compare percentages for outcomes...DON'T calc. measures of center/spread (no mean, med., st. dev. IQR) 2. discrete quantitative variable (distributions) - -random variable = household size - -same in table but with numbers instead of categories and percentage prob. on the second line - -or histogram - x axis is number persons in households and y axis is a percentage prob. - -compare % and CAN calc. measures of center or spread

Answer 18

- -can take on any value within range of variable with no gaps - -focus on prob. that value is in a specific interval - -(Ex. prob. height is btw 67.5 in. and 68.5 in.) - -histogram with intervals on x axis and % prob. rate on Y axis - -cant compare % and calc. measures of center and spread

Answer 19

- -model prob. distribution with smooth curve - -smooth curve is a model on or above the horizontal x axis - -area under curve is 1 - -where crib os HIGH - data values are more dense - -more accurate estimates of prob. that using histogram of sample data prob. that x occurs in any interval is equal to AREA under curve for that interval - -LOOK AT NOTES AGAIN median of density curve - the value that divides the area of the density curve in half

Answer 20

center - -name - mean - -for a density curve - mu (the M-looking symbol) - -histogram notation is - x(bar) spread - -name - st. dev. - -for density curve - (sigma - o with line) - -for histogram notations is - s

Answer 21

allows us to compare diff. normal distributions --math conversion of normally distributed variable to STANDARD normal variable z = (x - mean) / st. dev. --z score gives # of st. dev. above or below the mean of normal distribution ex. z = 4122.5 - 3485 / 425 = 1.5 (means 1.5 st. dev. above mean) --> z score means. better score on the test

Answer 22

1. Producing data 2. exploratory data analysis 3. Probability 4. inference

Midterm 1 Flashcards

(46 cards)