Stats and Mechs Flashcards

1
Q

Population

A

Whole set of items that are of interest. Information can be obtained from a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Raw data

A

Unprocessed information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data vs information

A

Data: collection of raw unorganised facts
Information: collection of processed, organised facts placed into context

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Census
- Observes/measures entire population
- Pros: should give completely accurate result
- Cons: time consuming, expensive, cannot be used when testing process destroys item, hard to process large quantity of data

A

“Testing process will destroy…, so a census would destroy all the…”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sample

A

Selection of observations taken from a subset of the population to use to find information about population as a whole
Pros: less time consuming n expensive than a census, less people have to respond/less data to process than a census
Cons: data may not be as accurate, sample may not be large enough to give information about small sub-groups of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How the size of the sample can affect the validity of any conclusions drawn
- size depends on required accuracy + available resources
- larger the sample, more accurate it is + more accurate predictions, but greater resources needed
-if population is very varied, need large sample than if population were uniform
– as natural variation in pop: different samples -> different conclusions

A

“they could take a larger sample, for example… this would give a better estimate of the overall proportion of…”
“full coverage”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sampling units

A

Individual units of of a population e.g an university student/house. Often individually named or numbered to form a list (sampling frame - list of units a sample can be drawn from) e.g list of university students/total number of houses in the locality//phone book/a map/electoral roll

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Random sampling
-every mem
-equal chance
-of selection
-sample representative of pop
-removes bias

A

simple random sampling, systematic sampling, stratified sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Simple random sampling
- every sample of same size has an equal chance of being selected
- no bias, ez cheap implement for small, each s unit known equal selecton chance,
- frame needed, large= not suitable (time expense disruptive)

A
  • frame
  • each member in frame allocated a unique number from 1 to pop size
  • selection of these numbers chosen at random for n sample size
  • by generating w random number generator/calculator/computer/random number table or by lottery sampling (members are written on tickets and placed in a hat, required number of tickets drawn out).
  • go back to pop, select mems corresponding to the generated nums
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

random number table

A
  • assign unique digit identifies e,g 3-digit
    so 000, 001…
    -work along rows of random number tables generating 3-digit numbers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Systematic sampling
- required elements chosen at regular intervals from an ordered list.
- simple, quick to use, for large
- frame needed, introduce bias if frame not random e.g MFMF, patterns in sample data might occur when taking every _ person

A

’- allocate a number from 1 to pop size
- use a random number generator to select the first person from 1 to interval calculated
- “Select every (interval calculated)th person thereafter.”
e.g first person chosen random at 2, remaining would be 7,12,17 etc for interval 5th

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Stratified sampling
- pop divided into mutually exclusive strata e.g F and M, random sample taken from each.
- sample accurately reflects pop structure, proportional representation of groups within pop guarantee
- clearly classify pop into distinct strata, each stratum selection = same CONS of simple

A

stratified sample for that strata = (stratum size/pop size) x req overall sample size

e.g working out layout
cricket : 121/370 x 30 = 9.8 ≈ 10

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Quota sampling
- interviewer selects a sample that reflects the characteristics of the whole population. pop / into groups according to given chars. size of each group determines proportion of sample that should have that chars. meet, assess their group and allocates them into the appropriate quota. continue until quotas filled.
- allows small sample to be still representative of pop, no frame required, quick ez, allows for ez comparision between diff grps in pop
- non random so bias. pop must be divided into group (costly, inaccurate ++ increasing scope -> +groups -> +time +expense), non-responses not recorded

A

Maddison has a list of 210 pupils, and wants to find out which musical instrument they prefer listening to amongst the flute, the clarinet, the guitar and the saxophone. To take a sample of size 30, Maddison surveys the first 15 girls and the first 15 boys to arrive at the school.

non-responses elaboration: means that the people who refuse to participate or cannot be reached which can affect the representativeness of the sample + not included in the sample, potentially introducing bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Opportunity/convenience sampling
- taking sample from people available at the time the study is carried out and who fit the crit
- ez to carry out, cheap
- unlikely to provide representative sample, highly independent on individual researcher (time, place)

A

“sample is likely to be biased towards … who …”
“improvements by interviewing ppl at diff locations + times, + increase sample size”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

types of data

A

quantitative: associated with numerical observations
qualitative: associated with non-numerical observations
continuous variable: can take any value in a given range e.g height or time
discrete variable - can take only specific values in a given range e.g number of people cant be 2.65

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

when data presented in a grouped frequency table

A
  • groups = classes
  • specific data values are not shown
  • class boundaries: max and min values belonging in each class
  • midpoint = average of the class boundaries
  • class width = difference between upper and lower class boundaries
17
Q

measure of location

A

value that describes a position in dataset.
if its the centre of the data, its the measure of central tendency

-mean, mode, median

18
Q

mode/modal class
- value/class that occurs most often
- for both qualitative and quantitative data.
- with a single mode or 2 modes (bimodal).
- not very informative if each value occurs only once

A

?explain why a shirt manufacturer might use the mode when planning production numbers
“it provides information on the most common size or item that is in demand among customers”

?write down the modal class
“34-36”

19
Q

median (Q2)
- the middle value where the data values are put in order. the middle of the data set - splits set into 2 equal (50%) halves
- for quantitative data. usually used when there’s extreme values, as they do not affect it

A
  • for discreet values: median is (n+1)/2
  • for grouped values, median is n/2
20
Q

mean
x̄ = (∑ x) / n
- for quantitative data. uses all the pieces of data therefore gives a true measure of the data, but is affected by extreme values

A

(be specific in answer for the outliers)
“the mean is affected by the extreme value 26”

estimate the mean in a grouped freq table:
take the midpoint of each class interval and then work out normally
e.g
(30.5 x2) +(32.5 x25) /27

21
Q

Quartiles
Lower quartile - Q1, 1/4 of the way through the dataset (25%)
Upper quartile - Q3, 3/4 of the way through the dataset (75%). 3/4 of n

A

for discrete data, you round UP the value if its a decimal and if its a whole number, its 0.5+
e.g 16/4=4 so Q1= 4.5th
56/5=11.2 so Q1=12
same goes for upper quartile too

22
Q

Percentiles
-split the dataset into 100 parts.
- the 10th percentile lies 1/10 of the way through the data(10%). Written with P e.g P₁₀
- 85% of the dataset are less than the 85th percentile and 15% are greater

A

calculate so
85th percentile is basically 85% so
85/100 x n

23
Q

Measure of spread/dispersion/variation= measure of how spread the data is

A

range - difference between largest and smallest value in dataset.
interquartile range (IQR) - difference between the upper quartile and lower quartile. not affected by extreme values but only considers the spread of the middle 50% of the data
interpercentile range - difference between the values for two given percentiles. e.g 10th to 90th interpercentile range is often used since its not affected by extreme values but still considers 80% of the data in its calculation

24
Q

spread of data set measure: variance, makes use that each data point deviates from the mean by the amount x - x̄

A

variance - msmsm
standard deviation - square root of the variance

25
Q

experiment

A

repeatable process that gives rise to a number of outcomes

26
Q

event

A

collection of one or more outcomes

27
Q

sample space

A

set of all possible outcomes

28
Q

venn diagrams

A

can be used to represent events graphically. frequencies or probabilities can be placed in the regions of the venn diagram

29
Q

intersection, union, complement

A

intersection - AnB
A and B
union - AuB
A or B
complement - A’
not A (also as 1-P(A))

30
Q

large data set facts
-daily mean temp in celsius - avg of hourly temp doing 24-hr period
- daily total rainfall - amounts less than 0.05mm are recorded as tr/trace
- daily total sunshine recorded to the nearest tenth of a hour
- daily mean wind direction and windspeed in knots - knots is nautical miles per hr.
direction as bearings, windspeed to beaufort scale
- daily max gust (in knots) - highest instantaneous windspeed recorded + its direction
-daily maximum relative humidity - given as a percent of air saturation with water vapour. relative humidities above 95% give rise to misty and foggy conditions
- daily mean cloud cover - measured in oktas ( in eights of sky covered by cloud, highest value it can be is 9)
- daily mean visibility - measured in decametres (Dm). greatest horizontal distance an object can be seen at in daylight
- daily mean pressure - measured in hectopascals (hPa)
- for overseas locations, the only data recorded are daily mean temp, daily total rainfall, daily mean pressure, daily mean windspeed

A
  • if need to do a numerical calculation involving a trace amount, can treat it as 0
  • 1kn (knot) =1.15 mph
  • missing data is represented as n/a or not available
  • give the answer to the same degree of accuracy as the data values, and units.
31
Q

beaufort scale

A

beaufort scale - descriptive term - avg speed at 10m above ground
0 - calm - less than 1 knot
1-3 - light - 1 to 10 knots
4 - moderate 11 to 16 knots
5 - fresh 17 to 21 knots