Statistics Flashcards
Define population
The whole set of items that are of interest to
Define census
Observes or measures every member of a population
Define sample
A selection of observations taken from a subset of the population which is used to find out information about the population as a whole
Advantages of disadvantages of a census and of a sample
Advantages of census
• It should give a completely accurate result
Disadvantages of census
• Time consuming and expensive
• Cannot be used when the testing process destroys the item
• Hard to process large quantity of data
Advantages of sample
• Less time consuming and expensive than
a census
• Fewer people have to respond
• Less data to process than in a census
Disadvantages of sample
• The data may not be as accurate
• The sample may not be large enough to give information about small subgroups of the population
Define sampling units
Individual units of a population
Define sampling frame
List where sampling units of a population are individually named or numbered
3 methods of random sampling
•simple random sampling
•systematic sampling
•stratified sampling
Define and give Advantages and disadvantages of simple random sampling
the researcher randomly selects a subset of participants from a population
Advantages
• Free of bias
• Easy and cheap to implement for small populations and small samples
• Each sampling unit has a known and equal chance of selection of workers is not a whole number round to the nearest whole number.
Disadvantages
• Not suitable when the population size or the sample size is large as it is potentially time consuming, disruptive and expensive.
• A sampling frame is needed
Advantages and disadvantages of systematic sampling
Advantages
• Simple and quick to use
•Suitable for large samples and large populations
Disadvantages
• A sampling frame is needed
•It can introduce bias if the sampling frame is not random
Advantages and disadvantages of stratified sampling
Advantages
• Sample accurately reflects the population structure
• Guarantees proportional representation of groups within a population
Disadvantages
• Population must be clearly classified into distinct strata
• Selection within each stratum suffers from the same disadvantages as simple random sampling
Define a simple random sample of size n
Every sample of size n has an equal chance of being selected
Define systematic sampling
The required elements are chosen at regular intervals from an ordered list
Define stratified sampling
The population is divided into mutually exclusive strata (e.g. males and females) and a random sample is taken from each
Formula to calculate the number of people we should sample from each stratum
The number samples in a stratum = (number in stratum / number in population) x overall sample size
2 types of non-random sampling
•quota sampling
•opportunity sampling
Define quota sampling
an interviewer or researcher selects a sample that reflects the characteristics of the whole population
Define opportunity sampling
consists of taking the sample from people who are available at the time the study is carried out and who fit the criteria you are looking for
Advantages and disadvantages of quota sampling
Advantages
• Allows a small sample to still be representative of the population
• No sampling frame required
• Quick, easy and inexpensive
• Allows for easy comparison between different groups within a population
Disadvantages
• Non-random sampling can introduce bias
• Population must be divided into groups, which can be costly or inaccurate
• Increasing scope of study increases number of groups, which adds time and expense
• Non-responses are not recorded as such
Advantages and disadvantages of opportunity sampling
Advantages
• easy to carry out
• Inexpensive
Disadvantages
• Unlikely to provide a representative sample
• Highly dependent on individual researcher
Define quantitative variables/data
Variables or data associated with numerical observations
Define qualitative variables/data
Variables or data associated with non-numerical observations
Define continuous variable
A variable that can take any value in a given range
Define discrete variable
A variable that can take only specific values in a given range
Define mode or modal class
The value or class that occurs most often
Define median
The middle value when the data values are put in order
Formula of mean
_
x = Ex / n
Formula for mean in frequency table
_
x = Efx / Ef
Find median of both listed data and of grouped data
listed data:
Find n
-if decimal round up
-if whole - halfway between this item and the one after
Grouped data:
find n/2 then use linear interpolation
Linear interpolation
Lower class boundary + ((amount into frequency / frequency of class) x class width)
P_57
n=43
43x0.57=24.51
Q_1 of 100 numbers
100/4=25th
Interpolation using 25th number
P_10 of 41 numbers
41 x 10%=4.1 4.1st
Interpolation using 4.1st number
Variance formula
Small sigma squared = (sum of squared values / number of values) - mean^2
Standard deviation
Sigma = root of variance
Coding standard deviation
Only multiply or divide affect
Common definition of an outlier
Either greater than Q_3 + k(Q_3 - Q_1)
Or less than Q_1 - k(Q_3 - Q_1)
Cleaning the data
= the process of removing anomalies from a data set
Formula to calculate the height of each bar (frequency density) on a histogram
Area of bar = k x frequency
Frequency polygon from histogram
Join the middle of the top of each bar with equal class widths
When comparing data sets comment on:
A measure of location
A measure of spread
What is Bivariate data
data which has pairs of values for two variables
What does Correlation describe
the nature of the linear relationship between two variables
causal relationship
regression line
The coefficient b tells you the change in y for each unit change in x
How does correlation change b
• If the data is positively correlated, b will be positive
• If the data is negatively correlated, b will be negative
When should you use the regression line
to make predictions for values of the dependent variable that are within the range of the given data
Venn diagram
Mutually exclusive events
P (A or B) = P(A) + P(B)
Independent events
P (A and B) = P(A) x P(B)
Tree diagram