Unit 1 Flashcards

1
Q

What is Statistics?

A

A branch of science which deals with collecting, organizing, analyzing, interpreting, summarizing and presenting data. (Understanding the world when we have limited information)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a Unit/Individual?

A

An object on which we take a measurement or observation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a population?

A

The collection of all individuals or units under consideration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a sample?

A

A subset of the population from which we obtain data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Ho do statisticians commonly get lists of adults living in specific areas?

A

Voter registration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the point of statistics?

A

It is impractical to observe every value of our population, so we rely on sample data to make a representation of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a variable?

A

Any characteristic or property of an individual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Quantitative Data?

A

Takes numerical characteristics of an individual for which arithmetic operations make sense.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Categorical Data?

A

Puts individuals into groups based on common characteristics for which numerical operations do not make sense

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Categorical and Ordinal Data?

A

Categorical data for which there’s a logical, commonly accepted ordering with a sense of increasingness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Categorical and Nominal Data?

A

Categorical data for which there is no logical ordering of increasingness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a Distribution?

A

The distribution of a variable tells us what values a variable takes on and how often it takes these values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a frequency distribution?

A

A count of how many of our data values fall into predetermined classes of intervals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Relative Frequency?

A

Aka. a proportion: the number of data values in a class divided by the total number of data values in the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are continuous random variables?

A

Quantitative variables that can take on any value within a given range.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are discrete random variables?

A

Quantitative variables that can take on a countable number of values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a bar chart?

A

Used to represent the frequency distribution of categorical and nominal data.
- x axis displays the variable values
- y axis displays the frequencies
- bars are separated so as to not suggest continuity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a histogram?

A

Used to represent the frequency distribution of quantitative data.
- x axis has equal length intervals showing the class boundaries
- y axis shows the frequencies or relative frequencies
- surface area of the bars are in ratio with relative proportions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the location of the data?

A

Tells us where the centre of our data is located. (mean, median and mode)

20
Q

What is mean?

A

What people commonly call the average. It is the sum of the observations divided by the number of observations (x bar)

21
Q

What are properties of the mean?

A
  • the mean is highly affected by outliers
  • if we multiply each observation by a fixed quantity c then the mean is also multiplied by c
  • if we add a fixed quantity c to each observation, the mean is increased/decreased by c
22
Q

What is the median?

A

The middle value in an ordered data set. Half of the values are larger than the median and half of the values are smaller than the median

23
Q

What is the mode?

A

The most frequently observed value

24
Q

What is a symmetric distribution?

A

If the values to the left and right of centre are mirror images of each other

25
Q

What is a right-skewed distribution?

A

if the right tail extends out much longer than the left side

26
Q

What is a left-skewed distribution?

A

if the left tail extends out much longer than the right side

27
Q

What is the behaviour of the mean, median and mode in different distributions?

A
  • perfectly symmetric: mean = median = mode
  • right-skewed: the mean will be to the right of the median
  • left skewed: the mean will be to the left of the median
28
Q

What is the variability of data?

A

how “spread out” our observations are (range, quartiles, interquartile range, variance/standard deviation)

29
Q

What is the range?

A

the distance between the smallest and largest data value (heavily influenced by outliers)

30
Q

What are quartiles?

A

A measurement of spread that divide our data set into 4 equal parts (25% of observations <= Q1, 50% of observations <= 50%, 75% of observations <= Q3)

31
Q

What is the five number summary?

A

min, Q1, Q2, Q3, max

32
Q

What is the IQR?

A

Inter-Quartile Range: Q3-Q1: measures the spread of only the middle 50% of all observations (not affected by outliers)

33
Q

What is variance?

A

Loosely speaking, the “average” squared distance of the data from its mean.

34
Q

What is standard deviation?

A

The positive root of the variance: roughly the average absolute distance from the mean

35
Q

What are the properties of the variance?

A
  • reported in units squared
  • If we multiply each observation by a fixed quantity c, the variance is multiplied by c squared
  • if the add a fixed quantity to each observation, then the variance does not change
  • sensitive to outliers
36
Q

What are the properties of standard deviation?

A
  • reported in units
  • if we multiply each observation by a fixed quantity c, the standard deviation is multiplied by c
  • if we add a fixed quantity to each observation, then the standard deviation does not change
  • sensitive to outliers
37
Q

If our data is reasonably symmetric with no outliers, which method should we use to report a numerical summary of our data?

A

the sample mean and sample standard deviation

38
Q

If our data is skewed or has outliers, which method should we use to report a numerical summary of our data?

A

the five-number summary

39
Q

What is a box plot?

A

A technique to visually describe the behaviour of our observations in the middle and ends of our data set using the five-number summary

40
Q

What are the features to the box plot?

A
  • a centre line at the median
  • two lines around the median at Q1 and Q3 which form a box
  • a line which extends form the left side of the box to the minimum (whisker)
  • a line which extends from the right side of our box to the maximum
41
Q

What do the tails of a box plot tell us about the shape of the distribution?

A
  • if both tails are relatively equal, we have a symmetric distribution
  • if the right tail is longer than the left, we have a right-skewed distribution
  • if the left tail is longer than the right, we have a left-skewed distribution
42
Q

What is the modified (outlier) box plot?

A

a box plot where the whiskers extend to the most extreme values within the lower fence and the upper fence (outliers appear as points on the axis)

43
Q

What are the lower fence and upper fence?

A

The outlier cutoff points.
LF = Q1 - 1.5(IQR)
UF = Q3 + 1.5(IQR)

44
Q

What is an outlier?

A

Any point greater than the upper fence or less than the lower fene

45
Q

How does the sample size affect the sample distribution in comparison to the population distribution?

A

Larger sample sizes will make the distribution more similar to the original. Smaller sample sizes will appear more random