stats Flashcards
what is a population?
the whole set of items that are of interest
what is a sample?
some subset of items chosen from the population
what is sampling unit?
each individual thing in the population that can be sampled
what is a sampling frame?
a numbered list of the entire population - individually named or numbered
what is a cenus?
data collected from an entire population
compare census and sample:
census:
✓gives a 100% accurate result
✖time consuming
✖expensive
✖cannot be used when testing involves destruction
✖large volume of data to process
sample:
✖data may not be large enough to represnt small sub-groups
how could you improve sample size
use a larger sample size
mention number (10% of population size)
what is a sampling error?
the difference between the actual value and the value to got from a sample
eg: comparing your sample results to the census results
What is bias?
the systematic error in the collection of the sample
what can result in a biased sample?
- sample not representative of the population
- leading questions- are you a law abiding citezen
- the wrong person asking questions
- small sample size
what is random sampling?
every item has an equal chance of being selected for sample
what is non-random sampling?
sample selection is based on other factors than just random chance
what is simple random sampling?
SRS
every sample has an equal chnace of being seleted
how would you carry out simple random sampling?
- allocate a number between 1 ans N to each individual- so tht every item in sampling frame has an identifying number
- using a random number generator to select ‘15’ different numbers between 1 and ‘120’ discard any repeated numbers or numbers above ‘120’, select 2 digit numbers at a time
- **individulas corresponding to these numbers become the sample **
what are the advantages and disadvantages of simple random sampling?
✓bias free
✓cheap and eay to implement
✓each individual has a known equal chance of being selected
✖not suitable when population size is large
✖sampling frame needed
what is systematic sampling?
required elements are chosen at regular intervals in ordered list
(first perosn is also chosen at random)
the sampling frame must be random, there shpuld be no patterns
how would you carry out systematic sampling?
- determine k
- randomly select a number between 1-k
- start with the individulal of this number
- select the every kth person
- these people will be in the sample
what are the advantages and disadvantages of systematic sampling?
✓simple
✓suitable for large samples
✖sampling frame needed
✖can introduce bias if the sampling frame is not random
what is stratified sampling?
the population is dividied into groups (strata) and a simple random sample is carried out in each group
same proportion is taken from each group
how would you carry out stratified sampling?
- perform calculation to know how many you want from each group
- label each group from ‘1- 15’
- use random number generator to select ‘2’ different numbers from1-15
- those with the corresponfong numbers become the sample
what are the advantages and disadvantages of stratified sampling?
✓reflects population structure
✓gurantees proprtional respresentation of groups within population
✖population must be clearly classified into distinct strata
✖sampling frame needed
what is convenience sampling?
samples are taken from people who are available at time of study, who meet criteria
what are the advantages and disadvantages of opportunity sampling?
✓easy to carry out
✓inexpensive
✖unlikley to provide a respresentative sample, reflect the
✖highly dependant on individual reseracher
what is cluster sampling ?
each cluster is defined (should be reprentative of population)
collect samples from each cluster
eg: taking samples from each grammar school
what are the advantages and disadvanatges of cluster sampling?
✓no sampling frame ✓inexpensive
✖unlikley to provide a representative sample because cluster tend to have similar characteristics
what is quota sampling?
population is divided into groups according to characteristic.
A quota of items are set to try and reflect the groups prprtion in the whole population
interviewer selects the actual sampling units
non random stratified sampling
what are the advanatges and disadvantages of quota sampling?
✓sample is representative of population
✓no sampling frame required
✓easy/inexpensive
✓allows easy comparision between different groups in population
✖non random so can introduce bias
✖population must be divided into group s- cosyly or innacurate
✖non-responses are not recorded
What are the different types of data?
Which type of data is the following
How do you conduct linear interpolation?
How would you compare or describe a data set?
Measures of central tendency Describe the centre of data.
Measures of spread suggest how consistent the data is
What would you consider an outlier (using IQR)?
Anything that is more than 1.5 × IQR
What would you do when there at gaps between each interval
What is the notation for mean?
How do you calculate the mean on your calculator?
Plot these values into the calculator to find mode mean and median
Because there are two columns, make sure you tell it that the frequency column is list 2
what is self selected sampling?
people are asked wehather they want to take part
what are the rpos and cons of self selected sampling?
cheap/ easy
biased results
Find the standard deviation
Set the 1Var Frequency to List 2
Then calculate Var1
sx= (our exam board ) =5.02
Calculate the mean
What is the position of the median?
(n+1) /2
Why would median be used over the mean?
If there are more extreme values, you would not used the mean as they account for them.
Which position would be used for the 57th percentile
Find an estimate for the median
How would you determine the “skew” of the data?
The side to which the least values are (the tail)
If mean > median: positively skewed
If mean < median: negatively skewed
What are the two different formulas for standard deviation?
sx=
What is considered an outlier using standard deviation?
Any value more or less than 2 standard deviations from the mean
2(sx) + mean
Or mean -2(sx)
What would be considered an outlier using the IQR?
IQR x 1.5 from UQ and LQ
What does variance measure ?
The average degree to which each squared point differs from the mean
What does standard deviation measure?
Looks at how spread out a group of numbers is from the mean
What are some common mistakes when calculating standard deviation?
Using (Σx)^^2 instead of Σ(x)^^2
Using a rounded version of the mean which gives rounding errors