Stats and Mechs Flashcards

Question 1

Q

Population

Answer

A

Whole set of items that are of interest. Information can be obtained from a population

Question 2

Q

Raw data

Answer

A

Unprocessed information

Question 3

Q

Data vs information

Answer

A

Data: collection of raw unorganised facts
Information: collection of processed, organised facts placed into context

Question 4

Q

Census
- Observes/measures entire population
- Pros: should give completely accurate result
- Cons: time consuming, expensive, cannot be used when testing process destroys item, hard to process large quantity of data

Answer

A

“Testing process will destroy…, so a census would destroy all the…”

Question 5

Q

Sample

Answer

A

Selection of observations taken from a subset of the population to use to find information about population as a whole
Pros: less time consuming n expensive than a census, less people have to respond/less data to process than a census
Cons: data may not be as accurate, sample may not be large enough to give information about small sub-groups of the population

Question 6

Q

How the size of the sample can affect the validity of any conclusions drawn
- size depends on required accuracy + available resources
- larger the sample, more accurate it is + more accurate predictions, but greater resources needed
-if population is very varied, need large sample than if population were uniform
– as natural variation in pop: different samples -> different conclusions

Answer

A

“they could take a larger sample, for example… this would give a better estimate of the overall proportion of…”
“full coverage”

Question 7

Q

Sampling units

Answer

A

Individual units of of a population e.g an university student/house. Often individually named or numbered to form a list (sampling frame - list of units a sample can be drawn from) e.g list of university students/total number of houses in the locality//phone book/a map/electoral roll

Question 8

Q

Random sampling
-every mem
-equal chance
-of selection
-sample representative of pop
-removes bias

Answer

A

simple random sampling, systematic sampling, stratified sampling

Question 9

Q

Simple random sampling
- every sample of same size has an equal chance of being selected
- no bias, ez cheap implement for small, each s unit known equal selecton chance,
- frame needed, large= not suitable (time expense disruptive)

Answer

A

frame
each member in frame allocated a unique number from 1 to pop size
selection of these numbers chosen at random for n sample size
by generating w random number generator/calculator/computer/random number table or by lottery sampling (members are written on tickets and placed in a hat, required number of tickets drawn out).
go back to pop, select mems corresponding to the generated nums

Question 10

Q

random number table

Answer

A

assign unique digit identifies e,g 3-digit
so 000, 001…
-work along rows of random number tables generating 3-digit numbers

Question 11

Q

Systematic sampling
- required elements chosen at regular intervals from an ordered list.
- simple, quick to use, for large
- frame needed, introduce bias if frame not random e.g MFMF, patterns in sample data might occur when taking every _ person

Answer

A

’- allocate a number from 1 to pop size
- use a random number generator to select the first person from 1 to interval calculated
- “Select every (interval calculated)th person thereafter.”
e.g first person chosen random at 2, remaining would be 7,12,17 etc for interval 5th

Question 12

Q

Stratified sampling
- pop divided into mutually exclusive strata e.g F and M, random sample taken from each.
- sample accurately reflects pop structure, proportional representation of groups within pop guarantee
- clearly classify pop into distinct strata, each stratum selection = same CONS of simple

Answer

A

stratified sample for that strata = (stratum size/pop size) x req overall sample size

e.g working out layout
cricket : 121/370 x 30 = 9.8 ≈ 10

Question 13

Q

Quota sampling
- interviewer selects a sample that reflects the characteristics of the whole population. pop / into groups according to given chars. size of each group determines proportion of sample that should have that chars. meet, assess their group and allocates them into the appropriate quota. continue until quotas filled.
- allows small sample to be still representative of pop, no frame required, quick ez, allows for ez comparision between diff grps in pop
- non random so bias. pop must be divided into group (costly, inaccurate ++ increasing scope -> +groups -> +time +expense), non-responses not recorded

Answer

A

Maddison has a list of 210 pupils, and wants to find out which musical instrument they prefer listening to amongst the flute, the clarinet, the guitar and the saxophone. To take a sample of size 30, Maddison surveys the first 15 girls and the first 15 boys to arrive at the school.

non-responses elaboration: means that the people who refuse to participate or cannot be reached which can affect the representativeness of the sample + not included in the sample, potentially introducing bias.

Question 14

Q

Opportunity/convenience sampling
- taking sample from people available at the time the study is carried out and who fit the crit
- ez to carry out, cheap
- unlikely to provide representative sample, highly independent on individual researcher (time, place)

Answer

A

“sample is likely to be biased towards … who …”
“improvements by interviewing ppl at diff locations + times, + increase sample size”

Question 15

Q

types of data

Answer

A

quantitative: associated with numerical observations
qualitative: associated with non-numerical observations
continuous variable: can take any value in a given range e.g height or time
discrete variable - can take only specific values in a given range e.g number of people cant be 2.65

Question 16

Q

when data presented in a grouped frequency table

Answer

A

groups = classes
specific data values are not shown
class boundaries: max and min values belonging in each class
midpoint = average of the class boundaries
class width = difference between upper and lower class boundaries

Question 17

Q

measure of location

Answer

A

value that describes a position in dataset.
if its the centre of the data, its the measure of central tendency

-mean, mode, median

Question 18

Q

mode/modal class
- value/class that occurs most often
- for both qualitative and quantitative data.
- with a single mode or 2 modes (bimodal).
- not very informative if each value occurs only once

Answer

A

?explain why a shirt manufacturer might use the mode when planning production numbers
“it provides information on the most common size or item that is in demand among customers”

?write down the modal class
“34-36”

Question 19

Q

median (Q2)
- the middle value where the data values are put in order. the middle of the data set - splits set into 2 equal (50%) halves
- for quantitative data. usually used when there’s extreme values, as they do not affect it

Answer

A

for discreet values: median is (n+1)/2
for grouped values, median is n/2

Question 20

Q

mean
x̄ = (∑ x) / n
- for quantitative data. uses all the pieces of data therefore gives a true measure of the data, but is affected by extreme values

Answer

A

(be specific in answer for the outliers)
“the mean is affected by the extreme value 26”

estimate the mean in a grouped freq table:
take the midpoint of each class interval and then work out normally
e.g
(30.5 x2) +(32.5 x25) /27

Question 21

Q

Quartiles
Lower quartile - Q1, 1/4 of the way through the dataset (25%)
Upper quartile - Q3, 3/4 of the way through the dataset (75%). 3/4 of n

Answer

A

for discrete data, you round UP the value if its a decimal and if its a whole number, its 0.5+
e.g 16/4=4 so Q1= 4.5th
56/5=11.2 so Q1=12
same goes for upper quartile too

Question 22

Q

Percentiles
-split the dataset into 100 parts.
- the 10th percentile lies 1/10 of the way through the data(10%). Written with P e.g P₁₀
- 85% of the dataset are less than the 85th percentile and 15% are greater

Answer

A

calculate so
85th percentile is basically 85% so
85/100 x n

Question 23

Q

Measure of spread/dispersion/variation= measure of how spread the data is

Answer

A

range - difference between largest and smallest value in dataset.
interquartile range (IQR) - difference between the upper quartile and lower quartile. not affected by extreme values but only considers the spread of the middle 50% of the data
interpercentile range - difference between the values for two given percentiles. e.g 10th to 90th interpercentile range is often used since its not affected by extreme values but still considers 80% of the data in its calculation

Question 24

Q

spread of data set measure: variance, makes use that each data point deviates from the mean by the amount x - x̄

Answer

A

variance - msmsm
standard deviation - square root of the variance

Question 25

Q

experiment

Answer

A

repeatable process that gives rise to a number of outcomes

Question 26

Q

event

Answer

A

collection of one or more outcomes

Question 27

Q

sample space

Answer

A

set of all possible outcomes

Question 28

Q

venn diagrams

Answer

A

can be used to represent events graphically. frequencies or probabilities can be placed in the regions of the venn diagram

Question 29

Q

intersection, union, complement

Answer

A

intersection - AnB
A and B
union - AuB
A or B
complement - A’
not A (also as 1-P(A))

Question 30

Q

large data set facts
-daily mean temp in celsius - avg of hourly temp doing 24-hr period
- daily total rainfall - amounts less than 0.05mm are recorded as tr/trace
- daily total sunshine recorded to the nearest tenth of a hour
- daily mean wind direction and windspeed in knots - knots is nautical miles per hr.
direction as bearings, windspeed to beaufort scale
- daily max gust (in knots) - highest instantaneous windspeed recorded + its direction
-daily maximum relative humidity - given as a percent of air saturation with water vapour. relative humidities above 95% give rise to misty and foggy conditions
- daily mean cloud cover - measured in oktas ( in eights of sky covered by cloud, highest value it can be is 9)
- daily mean visibility - measured in decametres (Dm). greatest horizontal distance an object can be seen at in daylight
- daily mean pressure - measured in hectopascals (hPa)
- for overseas locations, the only data recorded are daily mean temp, daily total rainfall, daily mean pressure, daily mean windspeed

Answer

A

if need to do a numerical calculation involving a trace amount, can treat it as 0
1kn (knot) =1.15 mph
missing data is represented as n/a or not available
give the answer to the same degree of accuracy as the data values, and units.

Question 31

Q

beaufort scale

Answer

A

beaufort scale - descriptive term - avg speed at 10m above ground
0 - calm - less than 1 knot
1-3 - light - 1 to 10 knots
4 - moderate 11 to 16 knots
5 - fresh 17 to 21 knots