STA8170 Flashcards

Question

Pie charts are used to display...? | Plus one disadvantage

Answer 1

categorical data | visual comparisons between categories are more difficult than in eg a bar chart

Answer 2

how cases are distributed along each variable, dependent on the other variable

Answer 3

the totals displayed (as counts or %) in the bottom row and last column of contingency tables

Answer 4

show the distribution of one variable for just those cases that satisfy a condition on another variable

Answer 5

the distribution of one variable is the same for all categories of another ie there is no association between them

Answer 6

Bar chart for quantitative data Counts (y) grouped into bins (x) that make up the bars No gaps between bars - or gap indicates no values for that bin

Answer 7

Use percentage on y-axis instead of counts

Answer 8

Similar to histogram, but shows the individual values Useful for doing by hand or in Word, for <100 values Stem values on the vertical axis, leaves across the horizontal

Answer 9

Like a stem and leaf, but with dots | Can be vertical (like stem plot) or horizontal

Answer 10

Data is counts or percentages of individual cases in categories Categories do not overlap

Answer 11

Data ar values of a quantitative variable whose units are known`

Answer 12

shape - symmetry, skew, gaps outliers centre - median spread - range, interquartile range roughly sketch the distribution

Answer 13

the peaks in distributions unimodal bimodal multimodal

Answer 14

a distribution with longer tail on one side | skew is described as to the side with the longer tail

Answer 15

the middle value that divides a histogram into two equal areas appropriate description of centre for skewed distributions or with outliers always pair with the IQR if n is odd, median is the middle value if n is even, median is the average of the two middle values

Answer 16

difference between min and max values in a distribution

Answer 17

the dividing points of the number of values/cases in a distribution divided by four

Answer 18

= upper quartile - low quartile | the data between the 25th and 75th percentile

Answer 19

the value that leaves that percentage of data below it | eg, 25th percentile has 5% of data below it

Answer 20

``` minimum q1 median q3 maximum ```

Answer 21

display of the five number summary vertical axis from min to max of data box around q1 and q3 horizontal line inside box at the median 'fences' at 1.5 IQRs beyond lower and upper quartiles (not displayed, just for working) whiskers from box to most extreme data values found within the fences add dots for any values found outside the fences

Answer 22

average of all values in a distribution appropriate description of centre for roughly symmetrical/normal data sets always pair with SD notation - a bar above the symbol, eg ū = the mean of u, pronounce u-bar

Answer 23

describes the spread of a distribution root of the average of squared deviation of each value from the mean (average of deviations would cancel each other out)

Answer 24

the average of the squared deviations of each value from the mean

Answer 25

outliers only have a small effect on it | eg median and IQRs

Answer 26

a display of values (y) against time (x) | discern patterns by applying the lowess method - makes a smooth trace line of best fit

Answer 27

method for smoothing timeplots to identfiy trends | find the average value for a given time window, then move the window along by one timepoint and take a new average

Answer 28

method for smoothing timeplots to identify trends more sophisticated than moving average method gives more weight to recent values, and less as they recede into the past

Answer 29

applying a simple function to make a skewed distribution more symmetrical enables better use of centre and spread distribution descriptors can facilitate the comparison of groups with very different distributions of scores

Answer 30

variables that skew to the right often helped by square roots, logs, reciprocal Skew to the left often helped by squaring the data

Answer 31

shape centre spread

Answer 32

shape - symmetric, skewed, diffs between groups medians - which group has higher centre, any pattern to medians IQRs - groups with more spread, patterns to change in IQRs outliers - identify, consider, check for errors

Answer 33

context - what is extreme in one context may be normal in another

Answer 34

mean>median>mode

Answer 35

skew to the right, | ie longer tail to the right

Answer 36

skew to the left, | ie longer tail to the left

Answer 37

Subtract the mean form the value, | Divide the difference by the standard deviation

Answer 38

the distance of a value from the mean in standard deviations

Answer 39

model parameters

Answer 40

statistics

Answer 41

a normal distribution with mean = 0 and SD = 1 | ie after you've standardised/calculated z-scores

Answer 42

shape of distribution is unimodal and symmetric | check with histogram or Normal probability plot

Answer 43

68% 95% 99.7%

Answer 44

is adding a constant to each value, | does not change SD or IQR

Answer 45

is multiplying each value by a constant | also multiplies mean, median, quartiles, SD and IQR by the constant

Answer 46

a numerically valued attribute of a model

Answer 47

a value calculated to summarise data

Answer 48

that corresponds to a z-score gives the percentage of values found at that z-score and below

Answer 49

plots actual vs expected score if straight, distribution is normal Called P-P plots in SPSS

Answer 50

sigma | standard deviation

Answer 51

meuw | mean

Answer 52

Normal model | Parameters are mean and SD

Answer 53

y = μ + z * σ

Answer 54

dot point graph of two variables on x and y axes describe with positive/negative direction/trend, form/shape of dots (straight, curved, no pattern?), strength of relationship (how close together dots are) and unusual features/outliers

Answer 55

put the variable of interest (DV), that you want to predict and responds to levels of the other var, on y-axis put the explanatory or predictor var (IV) on x-axis

Answer 56

quantitative variables condition - can't use categorical data straight enough condition - check the scatter plot for linear relationship no outliers condition - can distort strength or direction of a correlation

Answer 57

Calculating non-parametric association (correlation) when distribution is not straight enough or has outliers

Answer 58

``` Calculating trend (monotonic relationship - correlation) when relationship is not linear eg when data not truly quantitative ```

Answer 59

A hidden variable that influences both variables in our relationship/correlation

Answer 60

unimodal distriubution is skewd to the left | scatterplot bends downwards

Answer 61

data is a count of something

Answer 62

measurements cannot be negative, or grow by percentage increases nb, if there are zeros in the data try adding a small constant first

Answer 63

you want to preserve the direction of the relationship

Answer 64

your data is the ratio of two quantities, eg miles per hour | nb, if there are zeros in the data try adding a small constant first

Answer 65

order that the effects of transformations have on data if transformation make data worse, move in the other direction on ladder Power 2 - squaring the data Power 1 - no change, going further down or up from here increases effect Power 1/2 - square root Power 0 - we place log in this spot Power -1/2 - negative reciprocal root (-1 over root of y) Power -1 - negative reciprocal (-1 over y)

Answer 66

the value predicted by a regression equation/line of best fit

Answer 67

the difference between predicted (y-hat) and observed/actual (y) value residual = observed value - predicted value

Answer 68

the line of best fit in regression/scatterplot | the line for which the sum of the squared residuals is smallest

Answer 69

because some of them will be negative

Answer 70

coefficients

Answer 71

always measured/interpreted as units of y per unit of x how rapidly y-hat responds to changes in x b1 (1 is subscript)

Answer 72

where the line hits the y-axis the starting point/baseline for our predictions b0 (0 is subscript)

Answer 73

y-hat = b0 + b1x | predicted y = intercept plus slope times x

Answer 74

``` b1 = r x (SDy/SDx) slope = correlation times (standard deviation of y over the standard deviation of x) ```

Answer 75

``` b0 = meany - b1 x meanx intercept = the mean of y minus the (slope times the mean of x) ```

Answer 76

the linear model fit by least squares

Answer 77

same as for correlation: quantitative data straight enough relationship no outliers

Answer 78

You can never predict that y will be further away from the mean than x was because equation for predicting z-scores is z-hat of y = r times the z of x and r can only be between -1 and 1

Answer 79

r (the correlation coefficient)

Answer 80

find the root of (sum of error squared over (n - 2))

Answer 81

the variation/portion accounted for by the linear model

Answer 82

the variation/portion not accounted for by the linear model (the residuals/error)

Answer 83

quantitative variables condition For both data and residuals, check: straight enough condition (linear relationship on scatterplot) does the plot thicken? condition (even scatter around the line of best fit, or across scatterplot for residuals) outlier condition investigate any - they strongly affect r)

Answer 84

variables are quantitative their relationship is linear error is approximately normally distributed variance of the error is constant

Answer 85

the fact that the further a given point is from the meanX, the more strongly they pull on the regression line

Answer 86

removing it from the analysis makes a meaningful difference to the model

Answer 87

Two-way table

Answer 88

Scatterplot

STA8170 Flashcards

(115 cards)