additional Flashcards

Question

frequency distribution

Answer 1

a tabular summary of data showing number (frequency) of observations in each of several nonoverlapping categories or classes

Answer 2

``` gives a tabular summary of data showing the relative frequency for each class total = 1 ```

Answer 3

``` summarizes the percent frequency of the data for each class total = 100 ```

Answer 4

1. tabular or graphical displays

Answer 5

Frequency distribution table - the # of (frequencies) or observations in each of several non overlapping categories - how many times an element appears

Answer 6

1. pie charts | 2. bar charts

Answer 7

use relative frequency or % frequency | - not generally the best display, usually people can better judge differences in length compared to slices

Answer 8

shows categorical data in frequency, relative frequency or % frequency

Answer 9

when the bars are arranged in descending order of height from left to right with the most frequently occurring cause appearing first

Answer 10

is the same as the number of categories | ie. coke, diet coke, ....

Answer 11

grouped into an aggregate class called "other" - classes with frequencies of 5% or less

Answer 12

the number of observations

Answer 13

1. tabular | 2. graphical

Answer 14

Frequency distribution | - but we need to be more careful in defining the non overlapping classes

Answer 15

1. determine # of non overlapping classes (5-20) 2. width of classes - use the same for each class large data value - smallest / # of classes - class widths can be rounded 3. determine class limits (so each data belongs to one class) - upper limit and lower limit

Answer 16

1. Dot Plot 2. Histogram 3. Stem and Leaf

Answer 17

(largest data value - smallest data value) / # of classes

Answer 18

is the value halfway between the lower and upper class limits

Answer 19

- one of the simplest graphical summaries - horizontal axis shows the range for the data - useful for comparing the distribution of the data for two or more variables

Answer 20

- for quantitative data (categorical use bar chart) - similar to a bar chart but does spaces between the boxes - common for quantitative data

Answer 21

the shape or the skewness of the distribution

Answer 22

1. skewed left 2. skewed right 3. symmetrical

Answer 23

SAT scores, heights and weights of people

Answer 24

Data from applications in business and economics often tend to be skewed right example: 1. housing prices, salaries, purchase amounts, etc

Answer 25

graphical display used to show the rank order and shape of a distribution of data

Answer 26

1. easier to construct by hand 2. within a class interval, the stem and leaf provides more information than the histogram because the stem and leaf shows the actual data

Answer 27

1. easier to construct by hand 2. within a class interval, the stem and leaf provides more information than the histogram because the stem and leaf shows the actual data

Answer 28

does not have an absolute number of rows or stems

Answer 29

1. 0-4 | 2. 5-9

Answer 30

yes, note a single digit is used to define each leaf and that only the first 3 digits of each data lvae have been used to construct the display example the number 1565 - add info however, it is not possible to reconstruct the exact values

Answer 31

open-end class requires only a lower class limit or an upper class limit - ex. suppose two of the audit times had taken 58 and 65 days. rather than continue with the classes of width 5 with classes 35-39, 40-44 etc, we could simplify it - we could show an open end class of 35 or more

Answer 32

at the upper end of the distribution sometimes they are seen at the lower end and occasionally at both ends

Answer 33

the total number of observations

Answer 34

1. Cross tabulations | 2. graphically

Answer 35

- both variables can be either categorical or quantitative | - can have one cate and one quant. or combinations of

Answer 36

Restaurant Quality Rate Meal $ 1 good $18 2 very good $22 3 excellent $28 4 bad $38 etc.

Answer 37

- need to decided # of classes to use when making a freq. dist. for quantitative variables - margins provide info about each of the variable individually - primary value - provide insight about the relationship b/w 2 variables

Answer 38

1. scatter Diagrams and | 2. Trendlines

Answer 39

- graphical display of the relationship b/w 2 quantitative variables

Answer 40

a line that produces an approximation of the relationship

Answer 41

1. positive relationship 2. negative relationship 3. no apparent relationship

Answer 42

- used to display rel. frequency of each class, similar to a pie chart based on % can also be used to show frequencies

Answer 43

1. stacked | 2. side by side

Answer 44

data in 2 or more corsstabulations are often combined to produce a summary crosstabulation showing how 2 variables are related - conclusions from 2 or more separate crosstabs can be reversed when the data are aggregated into a single cross tab = the reversal of conclusions based on aggregated and unaggregated data is simpson's paradox - investigate whether aggregate or unaggreate provides a better insight into the cross tabs

Answer 45

- categorical data | - freq. and rel. freq distributions

Answer 46

- categorical data | - rel freq and % freq

Answer 47

- Quantitative data | - used to show the distribution over the entire range of the data

Answer 48

- quantiative data | - used to show freque dist data over a set of class intervals

Answer 49

- quantitative data | - to show rank order and shape of the distribution

Answer 50

1. scatter diagram (the relationship b/w 2 quantitative variables) 2. trendlines (used to approximate the relationship of data in a scatter diagram)

Answer 51

1. side by side chart (used to compare 2 variables) | 2. stacked bar chart (used to compare the rel freq or 5 freq of 2 varialbes)

Answer 52

1. Radar charts 2. Bubble charts - recommend not using them b/c can be over complicated - use bar charts and scatter diagrams

Answer 53

1. mean 2. weighted mean 3. GEOMETRIC mean 2. median 3. mode 4. percentiles 5. quartiles

Answer 54

average or arithmetic average

Answer 55

add up and divided by how many

Answer 56

arithmetic mean where some data values contribute more than others

Answer 57

sum (wi x xi)/ Sum wi

Answer 58

finding the nth root of the product of n values

Answer 59

used to analyze growth rates in financial data | - in these cases, the arithmetic mean will provide misleading results

Answer 60

data value that occurs the most often (greatest frequency)

Answer 61

called multimodal | - don't report the mode b/c listing 3 or more modes would not be helpful in describe the location of the data

Answer 62

how data is spread over the interval from the smallest value to the largest

Answer 63

1. arrange data in ascending order 2. compute an index i = (p/100)n 3. p - pereentile of interest n = # of observations if the I is an integer (add i and iplus 1 then divided by 2 to find the location

Answer 64

- values that divide the data set into quarters - each containing 25% of the observations - always start with Q2 or the median

Answer 65

- it is not affected by outliers - the middle of a sorted list of data values 1. arrange in ascending order 2. odd # - the median is the middle 3. even # - the median is that number plus the next divided by 2

Answer 66

annual income and property value data | b/c very low or very high values can inflate the mean

Answer 67

1. range 2. interquartile range 3. variance 4. standard deviation 5. coefficent of variation

Answer 68

largest value - smallest value

Answer 69

since it is only based on 2 observations it is highly influenced by extreme values

Answer 70

Q3 - Q1 - overcomes the dependancy of extreme values - the range of r the middle 50% of the data

Answer 71

- utilizes all of the data - based on difference b/w each observation (xi) and the mean - called deviation about the mean

Answer 72

sum (x-mean)squared / n-1 (sample)

Answer 73

0 | sum (xi-mean) = o

Answer 74

- positive square root of the variance | - easier to interpret than the variance b/c sd is measured in the same units as the data

Answer 75

used measure of risk associated with investing in stock and stock funds

Answer 76

- used when interested in a descriptive statistics that indicates how large the SD is relative to the mean - usually expressed as %

Answer 77

(sd/mean) x 100

Answer 78

it tells us the sample sd is x% of the value of the sample mean - useful for comparing the variability of variables that have different sd and different means

Answer 79

mean absolute error

Answer 80

sum the absolute values of the deviations of the observations about the mean and divide it by the # of observations

Answer 81

wi = pounds, dollars, volume or GPA by number

Answer 82

Median because it is NOT influenced by extremely small and large data values

Answer 83

as an additive process

Answer 84

any time you want to determine the mean rate of change over several successive periods or changes in populations of species, crop yields, pollution levels and birth and death rates (applied to changes that occur over any number of successive periods of any length)

Answer 85

deviation about the mean

Answer 86

it measures the standard deviation relative to the mean

Answer 87

comparing the variability of variables that have different standard deviations and different means

Answer 88

1. Skewness 2. Chebyshev's theorem 3. z-Scores 4. Empirical Rule 5. Detecting outliers

Answer 89

can be 1. skewed left 2. skewed right 3. symmetrical, skewness is zero

Answer 90

skewness is negative | - mean is usually less than the median

Answer 91

Skewness is positive | - the mean is usually more than the median

Answer 92

- the skewness is zero | - the mean and median are equal

Answer 93

to find relative locations of values within a data set - also called standard value

Answer 94

how far a particular value is from the mean

Answer 95

z-transformation

Answer 96

make statements about the proportion of data values that must be within a specified # of sd of the mean

Answer 97

1 - (1/zsqured)

Answer 98

1. at least 75% of the data must be within 2 sd of the mean 2. at least 89% is within 3 sd of the mea 3. at least 94% is within 4 sd of the mean

Answer 99

based on a normal prob distribution - used for symmetrical or bell shaped distribution - to determine % of data values that must be within a specified # of sd from the mean

Answer 100

1. approx 68% of the data values will be within 1 sd of the mean 2. approx 95% of the data values will be within 2 sd of the mean 3. almost all of the values will be within 3 sd of the mean

Answer 101

1. based on 1st and 3rd quartiles (lowe limit Q1 -1.5 (IQR), upper limit Q3 + 1.5(IQR) - if the value is outside of these ranges, it is considered an outlier 2. Z-scores - see empirical rule, treat any data values within a score of less -3 or greater than 3 SD as an outlier (double check)

Answer 102

1. smallest value 2. first Quartile 3. Median - Q2 4. 3rd Quartile 5. Largest value

Answer 103

1. covariance | 2. Correlation Coefficient

Answer 104

a descriptive measure of the linear association between 2 variables

Answer 105

of how much TWO random variables vary together - similar to variance but where variance tells you for a single variable, covariance tells you for TWO variables together

Answer 106

of how much TWO random variables vary together - similar to variance but where variance tells you for a single variable, covariance tells you for TWO variables together - if a relationship exists b/w the two variables

Answer 107

1. negative 2. near zero - or no association 3. postive

Answer 108

- a positive number, can be any positive number (doesn't tell us much, just that they are positive related) - the value of x increase the value of y increase the - slanted toward the right hand corner of this page - it doesn't tell us if the dots are close to the trendline or far away from the trendline, this is why we use correlation coefficient

Answer 109

- can be any negative number, doesn't tell us much just that they are negatively related) - the value of x increases the value of y Decreases the closer to -1 the stronger

Answer 110

``` no association - no linear association b/w x and y - the number after calculation will be 0 or Covariance = 0 - no trend ```

Answer 111

covariance / sd of x and sd y

Answer 112

1. covariance can take on any number while a correlation is limited to -1 to +1 2. more useful for determining how strong the relationship is b/w the TWO variables 3. it does not have units, covariance has units 4. it isn't affected by changes in the centre (ie mean) or scale o the variables

Answer 113

a statistical measure that shows whether two variables are related by measuring how the variables change in relation to each other - tells you if there is a relationship between two things and the relationship (+ -)

Answer 114

a measure of how two variables change in relation to each other, but it goes one step further than covariance in that correlation tells HOW STRONG the relationship is

Answer 115

as one increase the other also increases

Answer 116

as one increase, the other actually Decreases

Answer 117

because covariance values are sensitive to the scale o f the data. It does not tell us how close to the line the data is

Answer 118

describes relationships and is not sensitive to the scale of the data

Answer 119

- it can tell us how strong the relationship is so if we know the value of lets say x, we can estimate the value of y pretty easily (within a range) (make predictions and inferences (aka educated guesses) - strong relationship (smaller range)

Answer 120

when a straight line with a positive slope can go through the centre of every data point

Answer 121

large and when the slope is small

Answer 122

be careful because we should not have much confidence in predictions made with this line (need more data)

Answer 123

there is a very small chance that we will be able to draw a straight line through all 3 points. You can always draw a straight line between 2 points. 3 points gives us more confidence in the trend

Answer 124

straight line with a negative slope goes through the centre of EVERY data point - strong Negative Relationship - if we know the value of x we can estimate within a narrow range the value of y

Answer 125

no relationship

Answer 126

negative relationship but since it is close to zero, it is not a strong relationship

Answer 127

Pearson Product Moment correlation Coefficient

additional Flashcards

(161 cards)