Data Management All Units Flashcards
100 baby weights studied; 1 baby was 7lbs!
variable; baby weights
data; 7lbs
I am 24 years old, Canadian, and size petit.
quantitativenumerical; 24 years old
qualitativenon-numerical; Canadian
categorical; petit
I have 1 dog, he is 16kg
discrete; 1 dog
continuous; 16 kg
I conducted a research paper, Judy read it and published it on her blog. I published it at the university
primary data; I conducted a research paper
secondary data; Judy read it
secondary source; published on her blog
primary source; published it at the university
I want to know how many citizens have allergies. Now I want to know how many since a factory was made.
1 variable; allergies
> 1 variable; allergies since a factory was made
A swim race to determine the best is performed three times during the day. There are three timers.
inherent variability; three times during the day
measurement variability; three timers
A survey asked university students how they felt about tuition increase, for a paper regarding the general public.
sample; university students
population; general public
non-representative sample; university students to represent general public
Picking sixty fish from five spots at the lake, not putting them back in to determine weights.
replication; picking 60
randomization; 5 spots
control; not putting them back
Names of family members are placed in a box after picked
simple random sample
Names of family on list, sample size divided by the total population = ‘k’ value, every kth member is picked
systemic random sample
Names of family members are divided into groups based on similarities, then placed in boxes, mixed, replaced if chosen.
stratified random sample
Suburbs within a city, placed in a box and mixed, replaced if picked, all picked are surveyed.
clusters; suburbs within a city
cluster random sample
suburbs within a city, placed in a box, and mixed, replaced if picked, all picked are placed in new box and mixed, replaced if picked, final picked are chosen.
multi-stage random sample
stand at convenience and ask first 40 people
convenience random sample
survey posted on door of convenience store
voluntary random sample
in own words
open question
choose from alternatives
closed question
circle 1
information question
rate according to scale
rating question
rank alternatives
ranking question
choose any number of alternatives
checklist question
sample does represent population
sampling bias
not all questions are answered
non-response bias
disproportionally polled
household bias
misleading question
response bias
Neighbours are asked the number of plants they own, the responses are sorted into a list, the list is divided into 10 groups, the groups are graphed (number in group=yfrequency, group numbers=xinterval)
frequency table; sorted into a list
intervals; 10 groups
histogram; graphed
qualitative graph
histogram
quantitative graph
bar chart
midpoints of histogram or barchart connected into a line only
frequency polygon
interval / total number of data points
relative percent frequency
90 degrees / 360 degrees = 0.25 = 25% of the circle graph( ? )
pie graph
1, 2, 2, 3, 4, 5, 8
- mode = 2
1, 2, 2, 3, 4, 5, 8
- median = 3
(1, 2, 2, 3, 4, 5, 8) / 7 = 3.5* = x̄
- mean = 3.5
mean, median, mode
central tendency
1, 2, 2, 3, 4, 5, 8 -> 8-1 = 7*
range
1, 2, 2, 3, 4, 5, 8
|1—–|2–|3———–|5*–|8
2* = Q1 <25% below median, >75% above median 3* = Q2 (median) 5* = Q3 >25% above median, <75% below median
1, 2, 2, 3, 4, 5, 8
20th percentile = 7 x 0.2 = 1.4 number in 2nd place = 20th percentile
2 = 20th percentile
σ^2
variance
(1, 2, 2, 3, 4, 5, 8) / 7 = 3.5
σ^2 = [(1-3.5)^2 + (2-3.5)^2 + (2-3.5)^2 + (3-3.5)^2 + (4-3.5)^2 + (5-3.5)^2 + (8-3.5)^2 ] / 7 = 4.79
*4.79 = variance
(1, 2, 2, 3, 4, 5, 8) / 7 = 3.5
σ^2 = [(1-3.5)^2 + (2-3.5)^2 + (2-3.5)^2 + (3-3.5)^2 + (4-3.5)^2 + (5-3.5)^2 + (8-3.5)^2 ] / 7 = 4.79
σ = square root (4.79) = 2.19*
*2.19 = standard deviation
4-7 | 14 | 2 | 4 | 20 |
———————————————–
8-12 | 10 | 4 | 6 | 20 |
———————————————–
13-18| 5 | 10 | 5 | 20 |
———————————————–
Total | 29 | 16 | 15 | 60 |
contingency table* typically categorical data
A line that goes through as much data as possible on a graph
line of best fit* regression line
y= dependent / response variable x= independent / explanatory variable
scatter plot
dots tend to increase left -> right and upwards looking like an arrow
correlation coefficient r = 1 positive slope
dots tend to increase left -> right and upwards looking like a stretched oval
correlation coefficient r = 0.8 (r > 0 [max = 1], increase in one increases the other)
dots tend to decrease left -> right and downwards looking like an arrow
correlation coefficient r = -1 negative slope
dots tend to decrease left -> right and downwards looking like a sphere
correlation coefficient r = -0.4 (r < 0 [max = -1], increase in one decreases the other)
dots look like a w
correlation coefficient r = 0 no linear correlation, change in one, does not change the other
dots look like a circle
correlation coefficient r = 0 no linear correlation, change in one, does not change the other
dots look like a horizontal line
correlation coefficient r = 0 no linear correlation, change in one, does not change the other
dots look like a circle outline
correlation coefficient r = 0 no linear correlation, change in one, does not change the other