Data Unit 1 Test Flashcards

1
Q

Statistics is the process of:

A
  1. Collecting data
  2. Organizing data
    3: Interpreting data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data:

A

Facts or pieces of information. A single fact is called datum.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Raw Data:

A

Unprocessed information (i.e. Not yet compiled in a frequency table, chart or graph).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Aggregate Data:

A

Data that is organized or grouped such as finding the sum over a given period or time, for example, monthly or quarterly. Data can be organized into any grouping such as geographic area. The data is not individual records.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Micro Data:

A

Non-aggregated data about the population sampled. For surveys of individuals, microdata contains records for each individual interviewed; for surveys of organizations, the microdata contains records for each organization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Experimental Data:

A

Data gathered through experimentation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Observational Data:

A

Data gathered by observation of the “subject.” For example, the subject is recorded then the behaviours are noted over a period of time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Primary Data:

A

Data gathered directly by the researcher in the act of conducting research or an experiment. Data can be gathered by surveys or through experimentation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Secondary Data:

A

Data gathered by someone other than the researcher.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Numerical Value:

A
  • A quantitative variable that describes a numerically measured value
  • These variables can be either continuous or discrete.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Continuous Variable:

A
  • A numeric variable which can assume an infinite number of real values
  • i.e. the unit of measure can be broken down into smaller units or decimals.
  • Example: age, distance, temperature, and school marks
  • A histogram graph displays continuous data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Discrete Variable:

A
  • A numeric variable that takes only a finite number of real values
  • i.e. can only have separate values, of integers (no decimals)
  • Example: number of people, animals, x can equal only 0, 1, 2, 3, etc.
  • A Bar graph displays discrete data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Categorical Data:

A
  • Consists of data that can be grouped by specific categories (also known as qualitative variables).
  • Categorical variables may have categories that are naturally ordered (ordinal variables) or have no natural order (nominal variables).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Nominal Variable:

A
  • Type of categorical variable that describes a name, label, or category with no natural order.
  • Example: subjects in school, hair colour,
  • Alphabetical order is nominal because you can put the names in alphabetical order, but the names have no rank. A is not “better” than B
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Ordinal Variable:

A
  • Type of categorical variable that has a natural ordering of its possible values, but the distances between the values are undefined.
  • Example: Excellent, Good, Fair and Poor to rate something, the answer is only a category but there is a natural ordering in those categories.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Frequency Table:

A

a table which shows the distribution of values of the variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Key Features of a Frequency Table:

A

3 columns: Range, Tally, Frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Range column

A

Make sure the magnitude ofthe all the intervals are the same
Square brackets: includes the value.
Round brackets: up to but not including the value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Cumulative Frequency Table -

A

the running total of the frequencies from the top down to the corresponding row.
- (add the total frequencies as you go down the columns.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Relative Frequency Table (%)

A

shows the frequency of a range (data group) as a percentage of the whole data set. (used for pie graph)
- Take each frequency and divide it by the total frequency then multiply by 100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Bar Graph:

A

Bars don’t touch
Each bar is a different colour
Categorical/discrete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Frequency =

A

Histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Relative frequency =

A

Pie chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Cumulative Relative Frequency =

A

Ogive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Histogram:

A
  • Coloured in with one colour
  • This type of graph is best suited to show a continuous range of values; hence, the bars touch
  • The area of the bars is proportional to the frequencies of the variable.
  • To determine the interval ranges, take the range of the entire set and divide it by the number of bars that you want. (Range = Highest-Lowest)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Frequency Polygon (or Line Graph)

A
  • Can illustrate the same information as a histogram or bar graph.
  • Points are plotted with the midpoints of the intervals versus the frequency. Then a line is drawn connecting the points.
  • This type of graph is best suited to illustrate (changing) trends.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Cumulative Frequency Polygon (or Ogive)

A
  • Illustrates the running total of the frequency from the lowest value up.
  • Plot the x-values on the upper end of the range.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Circle (Pie) Graph

A
  • Best suited to illustrate categorical data relative to the whole or to each other (using relative frequencies).
  • Need a protractor to draw the sections accurately (each segment size = relative frequency x 360)
    –To get the angle multiply the relative frequency by 360
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Population:

A

All individuals that belong to a group being studied. e.g. For a survey to find out what sport was the favourite of students of SDSS, all students are asked.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Sample:

A

A group of items or people selected from a population (to represent the whole population). e.g. For a survey to find out what sport was the favourite of students of SDSS, 20 random students from each grade were asked.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Simple Random Sample:

A

Every member of the population has an equal chance of being selected for the survey. Non-biased.
- The risk is you could pick too many of the same category.
- Roland writes the apartment number of each occupied apartment on a piece of paper. He puts them in a hat and randomly selects 15 of them.

32
Q

Systematic Sample:

A
  • Sort the population sequentially.
  • The intervals are determined by the following formula: Interval = population size/sample size
  • A starting point is selected at random.
  • Go through the population sequentially, selecting members at regular intervals.
  • Roland will sort the apartment numbers from lowest to highest. Interval = 54/15 = 4. He picks a random number between 1 & 4. He starts at the apartment in this position and jumps every 4 for the next apartments.
33
Q

Stratified Sample:

A
  • A strata is a group of people that share a common characteristic such as gender, age, education level. (natural grouping)
  • A stratified sample contains the same proportion of members from each stratum as the population does.
  • The proportion of each stratum taken is the proportion of the sample size to the population size.
  • Roland uses the floors on which her tenants live on as strata.
  • If the sample is 15, proportion = 15/54 = 0.278. So, Roland would multiply 0.278 by how many people live on each floor.
34
Q

Cluster Sample:

A
  • Use a random sample of one representative group (all from one strata)
  • Clusters are not necessarily representative of the population.
  • Roland would select 15 of the 18 apartments on the 5th floor (since the 5th floor is the only group that’s large enough for a cluster).
35
Q

Multi-Stage Sample:

A
  • Several levels of random sampling.
  • Roland randomly selects 3 floors and then randomly selects 5 tenants from each floor.
36
Q

Convenience Sample:

A
  • Easily accessible members are selected.
  • May not be random, so results are not always reliable.
  • Roland talks to the next 15 tenants who come through the door.
37
Q

Voluntary Response Sample:

A
  • Invite members of the entire population to participate in the survey.
  • Respondents are not necessarily representative of the population.
  • Roland sends out a letter to all the tenants seeking a response.
38
Q

Bias:

A

An inclination or prejudice for or against one person or group, especially in a way
considered to be unfair.

In terms of sample selection, bias would be present where there are factors that favour certain outcomes or responses. Bias can be intentional or unintentional.

39
Q

Sampling Bias:

A

When the chosen sample does not reflect the population. The bias is in direct reference to the sample itself.

Example: Ask “What is Canada’s favourite sport?” outside the ACC after a Leafs’
game.

To avoid this, the sample must include a variety of respondents

40
Q

Non-Response Bias (a form of sampling bias):

A

When surveys are not returned, thereby influencing the result. Particular groups are then
under-represented.

  • It may also be that insufficient time was given to respond and/or not enough information was given to the respondents to respond to.
  • It is a significant bias if the number of non-respondents is greater than the difference between two other responses.
    Example: Conduct a survey about snowmobiling at a school in Florida.
  • To avoid this, researchers must choose a sampling process that is random and provide enough time and information so that respondents are able to make an informed decision.
41
Q

Measurement Bias:

A

When there is a bias with the survey question and/or the method of data collection.

  • The person collecting the data has biased the results of the survey.
    Example: A police officer records how fast cars are travelling down a stretch of highway.
  • Use of leading questions -> Surveys give suggested answers to a question
    Example: What is your favourite candy: a) smarties b) Aero c) Kit Kat d) Coffee Crisp
  • Use of loaded questions -> Surveys which use words or information intended to influence a respondent’s decision.
    Example: Are you in favour of the government controlling a woman’s freedom to choose whether or not to have an abortion?
42
Q

Response Bias:

A

When respondents give false or misleading answers because they want to influence the
results or are afraid/embarrassed to answer truthfully.
- Always in response to a question posed.
Example: Have you ever accidentally taken something home from work that you didn’t pay for?

43
Q

‘E” means

A

‘sum’ “Sigma”

44
Q

Mean:

A

most commonly referred to as “the average”

45
Q

Mean Usage:

A

with numeric data having no outlines and/or a large sample size.

46
Q

Mean of a population:

A

µ “mu” - add up all the numbers and divide by the total of numbers

47
Q

Mean of a sample:

A

x̄ “x bar” - add up all the numbers and divide by the total of numbers

48
Q

Median:

A

the middle value

49
Q

Median Usage:

A

with numeric data where outliers exist or the sample size is small.

50
Q

How to find Median:

A
  • After arranging the data in ascending order, then
    – If there are odd number of datum, then use the middle value
    – If there are even number of datum, then find the middle 2 values and find the midpoint
51
Q

Mode:

A

the value that occurs the most frequently

52
Q

Mode Usage:

A

With categorical data

53
Q

Wi is the

A

weighting factor

54
Q

Class sizes could be a ?, they would appear at the ? of the formula being multiplied by the ?, and at the ?.

A

weighting factor
top
marks
bottom

55
Q

Mi is the ? of an interval

A

midpoint

56
Q

Fi is the ? for that interval

A

frequency

57
Q

Steps for frequency mean:
1. Find the ? of the intervals
2. ? the ? of the intervals by the ? in that interval
3. Get the total ?
4. Get the total frequency ?
5. ? the total frequency mean by the total frequency

A
  1. midpoint
  2. Multiple, midpoint, frequency
  3. Frequency
  4. Midpoint
  5. Divide
58
Q

To determine the frequency median, you have to go to the interval where the ? occurs. If we have 18 data values, the median is the ? of the 9th and 10th values. So ? up the frequencies until they are 9 or 10. Since they point occurs in the 2-3 hours interval, the median is the ? of that interval - ? hours a day.

A

middle value
mean
add
midpoint
2.5

59
Q

Range:

A

The difference between the highest value and the lowest value.

60
Q

Interquartile Range:

A

Found by putting the data in ascending order and then breaking it into four groups. The IQR is the distance between the extreme groups (1st and 3rd quartiles).

61
Q

Steps for IQR:

A
  1. Put data in ascending order
  2. Find the median and call it Q2
  3. Find the median of the first group and call it Q1
  4. Find the median of the second group and call it Q3
  5. The interquartile range is Q3 - Q1
  6. The IQR is where 50% of the data is clustered relative to the median.
62
Q

Box & Whisker Plot:

A

Illustrates how clustered the data is around the median.

63
Q

Steps for Box & Whisker Plot:

A
  1. Draw a horizontal line and write down the lowest and highest values, labelling equal intervals (like a ruler) between them to represent the range.
  2. Draw another horizontal line and mark off the start and end of the data with small vertical lines
  3. Represent Q1 Q2 & Q3 with big vertical lines at the appropriate locations in the range
  4. Draw a box from Q1 to Q3
  5. Note: If a data value is 1.5 times the box length (ie. IQR) from the box, it is considered an outlier.
64
Q

Deviation:

A

The difference between an individual value in a set of data and the mean for the data.

65
Q

Deviation Formulas:

A

Population x - µ
Sample x - x̄

66
Q

Standard Deviation:

A

The square root of the mean of the squares of the deviation

67
Q

Standard Deviation Formulas:

A

Population: σ (lower case sigma) = √Σ(x - µ)^2/N
Sample: s = √Σ(x - x̄)^2/n - 1

68
Q

In a population formula, ? a is used, the ? symbol is used, and it is divided by the ?

A

lower case sigma
mu
number of terms

69
Q

In a sample formula, ? is used, the ? symbol is used, and it is divided by the ?.

A

s
x bar
number of terms minus one.

70
Q

*The sample formula denominator is

A

n — 1 in order to compensate for the fact that a sample taken from a population tends to underestimate the deviations in the population.

71
Q

Variance:

A

The standard deviation squared σ^2 or s^2

72
Q

To find the Standard Deviation:

A
  1. Add all the data up
  2. Count the number of data points
  3. Divide the data by the amount of data points to get the mean
  4. Take every data point and minus the mean (x - mean)
  5. Take every new number from the subtraction and square it
  6. Add all of the squared numbers
  7. Divide the total by the number of data points or if sample the number of data points minus 1
  8. SQUARE ROOT THAT NUMBER
73
Q

Z-Scores:

A
  • The z-score is the number of standard deviations from the mean
  • Variable values below the mean (left) have negative z-scores, values above(right) the mean have positive z-scores, and values equal to the mean have a zero z-score.
74
Q

Z-Score Formulas:

A

Population: z = x - µ/σ
Sample: z = x - x̄/s

75
Q

Percentiles:

A
  • Divides the data into 100 intervals with the same number of values in each interval.
  • As with quartiles, when dividing the data up, the actual value of each data is unimportant but rather the physical count of all of the individual datum is required.
  • First, order the data.
76
Q

To find the percentile with a mark, the formula is:

A

Percentile = # of marks below value given + (0.5 x # of marks equal to value given)/ Total # of marks all times 100

77
Q

To find the mark from percentile:

A
  • Count each line in between the data (from 10 to 90) the mark will be the mean of the two marks in between the line
  • OR do N (mark) = K (number of marks) x percentile (e.g 0.9 for 90)