CRP 109 Stats Lecture 1 Flashcards

Question

Class Width

Answer 1

The difference between two consecutive lower class limits (or boundaries) in a frequency distribution. class width = (max value - min value) / number of classes

Answer 2

relative freq = freq of class / sum of all freq *100 to get percentage freq

Answer 3

frequency for each class is the sum of the frequencies for that class and all previous classes - class limits are replaced by “less than” expressions that describe the new ranges of values

Answer 4

A graph consisting of bars of equal width drawn adjacent to each other (unless there are gaps in the data). The horizontal scale represents classes of quantitative data values, and the vertical scale represents frequencies. The heights of the bars correspond to frequency values.

Answer 5

A graph that has the same shape and horizontal scale as a histogram, but the vertical scale uses relative frequencies instead of actual frequencies (i.e. proportion or percent)

Answer 6

Exists between two variables when the values of one variable are somehow associated with the values of the other variable. Correlation does not imply causation.

Answer 7

Exists between two variables when there is a correlation and the plotted points of paired data result in a pattern that can be approximated by a straight line

Answer 8

Measures the strength of the linear correlation between the paired quantitative x and y values in a sample. It is sometimes referred to as the Pearson product moment correlation coefficient

Answer 9

A plot of paired (x , y ) quantitative data with a horizontal x -axis and a vertical y -axis

Answer 10

-The value is always between −1 and 1, inclusive -If r is close to −1, there appears to be a strong negative correlation. -If r is close to 1, there appears to be a strong positive correlation. -If r is close to 0, there appears to be a weak or no linear correlation. -A value of exactly −1 or 1 implies that all of the data fall exactly on a line (perfect correlation) -If all values of either variable are converted to a different scale, the value of r does not change - Interchange all x values and y values, and the value of r will not change - not designed to measure the strength of a relationship that is not linear -sensitive to outliers

Answer 11

Given a collection of paired sample data, the regression line or line of best fit or least-squares line is the straight line that “best” fits the scatterplot of the data

Answer 12

Methods and tools that summarize or describe relevant characteristics of data

Answer 13

Methods and tools that make inferences, or generalizations, about populations

Answer 14

The measure of centre found by adding all of the data values and dividing the total by the number of data values -not resistant to outliers

Answer 15

The measure of centre that is the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude. -resistant to outliers (only changes slightly)

Answer 16

-The value(s) that occurs with the greatest frequency. -The mode can be found with qualitative data. -A data set can have no mode, one mode (unimodal), or multiple modes

Answer 17

sample standard deviation

Answer 18

sample variance

Answer 19

population standard deviation

Answer 20

population variance

Answer 21

-The difference between the maximum data value and the minimum data value -Very sensitive to outliers (not resistant) - does not truly reflect the variation among all of the data values

Answer 22

A measure of how much data values deviate away from the mean T-he value is never negative. It is zero only when all of the data values are exactly the same. -Larger values indicate greater amounts of variation. -not resistant to outliers - units are the same as the units of the original data values

Answer 23

-A measure of variation equal to the square of the standard deviation. -The units are the squares of the units of the original data values -not resistant to outliers -The value is never negative. It is zero only when all of the data values are the same number. -s2 is an unbiased estimator of σ2

Answer 24

for any data set: -at least 75% of data lies within 2 standard deviations of the mean. -at least 89% of data lies within 3 standard deviations of the mean

Answer 25

The empirical rule states that for bell-shaped data sets, -approximately 68% of data lies within 1 standard deviation of the mean. -approximately 95% of data lies within 2 standard deviations of the mean. -approximately 99.7% of data lies within 3 standard deviations of the mean

Answer 26

Measures of location, denoted P1, P2, . . . , P99, which divide a set of data into 100 groups with about 1% of the values in each group -The 50th percentile, P50, has about 50% of the data values below it and about 50% of the data values above it, corresponding to the median

Answer 27

percentile of value x = (number of values less than x) / (total number of values) *100

Answer 28

percentile being used

Answer 29

locator that gives the position of a value in a sorted list

Answer 30

kth percentile

Answer 31

1. arrange values lowest to highest 2. L = (k/100)n 3. if L whole number, kth percentile is midway between Lth value and the next value in the sorted set of data. i.e. Pk = (Lth value + next value) / 2 4. if L is not whole number, round L up. Pk is the Lth value counting from the lowest in the data set.

Answer 32

Measures of location, denoted Q1, Q2, and Q3 which divide a set of data into four groups with about 25% of the values in each group Q1 = P25 Q2 = P50 Q3 = P75

Answer 33

(IQR) = Q3 − Q1 -another measure of spread that is less sensitive to outliers

Answer 34

For a set of data, consists of these five values: 1. Minimum 2. First quartile, Q1 3. Second quartile, Q2 (same as the median) 4. Third quartile, Q3 5. Maximum

Answer 35

Can be used to identify skewness 1. Find the 5-number summary. 2. Construct a line segment extending from the minimum data value to the maximum data value. 3. Construct a box (rectangle) extending from Q1 to Q3, and draw a line in the box at the median.

Answer 36

1. Find the quartiles. 2. Find the IQR. 3. Evaluate 1.5×IQR. 4. In a modified boxplot, a data value is an outlier if it is: above Q3 by an amount greater than 1.5×IQR; or below Q1 by an amount greater than 1.5×IQR. -A special symbol (such as an asterisk or point) is used to identify outliers as defined previously. -The solid horizontal line extends only as far as the minimum data value that is not an outlier and the maximum data value that is not an outlier

CRP 109 Stats Lecture 1 Flashcards

(60 cards)