Stats- DATA Flashcards
What are range and iQR a measure of?
A measure of the spread around the median
What are variance and SD a measure of?
Measure of spread around the mean
When using histograms, what is the formula for frequency?
Frequency= frequency density x class width (width of the boxes)
When using histograms, what information is needed top calculate mean and SD?
Frequency, midpoint
What are the equations for each quartile?
Q1 = (n+1)/4
Q2 = (n+1)/2
Q3 = 3(n+1)/4
What is the equation for the IQR?
Q3 - Q2
How could removing outliers be a useful or not a useful thing?
Good if the outliers are data errors
Not useful if it is the actual data
What are the equations for if data is an outlier?
Outlier if x:
> Q3 + (Q3-Q1)
< Q1 + (Q3-Q1)
When comparing two sets of data, what can you talk about with median and means?
Average
When comparing two sets of data, what can you talk about with SD and IQR?
The spread of data, consistency and variability
How can you tell if data is symmetrical, positively skewed or negatively skewed?
Symmetrical- mode= median= mean
Q3-Q2 = Q2-Q1
Positively skewed- mode<median<mean> Q2-Q1</mean>
Negatively skewed- mode>median>mean
Q3-Q2 < Q2-Q1
What is bivariate data?
Data that comes in pairs
(X,Y)
What is the PMCC and how can it be interpreted?
Product moment correlation coefficient
r= 0 means there is no correlation, and the data does not have linear patterns
-1 means strong weak correlation
+1 means strong positive correlation
The closer to zero, the weaker the correlation
What is a regression line? What is its equations?
The exact line of best fit
Y = a + bx
Where a is the y intercept and b is the gradient
How do you interpret the gradient of a regression line?
For every 1 (unit of x), the (unit of y) increases/ decreases by (the gradient)
On regression lines, what are the units for the gradient?
The units of y per the units of x
What are the three types of random sampling?
Simple random
Stratified
Systematic
What is a simple random sample?
Each member of the population is allocated a number, and a random number generator is used to randomly select individuals (ignoring repeats of numbers)
Each member of the population has equal chances of being selected
What are advantages and disadvantages of simple random sampling?
Advantages-
Easy and cheap
Removes bias
Disadvantages-
May be time consuming if the population is large
A sample frame is needed
What is stratified sampling?
Where a population is divided into different groups, that represents the proportion of groups within the population. Random sampling is then used to select individuals from each strata
No individual can be in more than one stratum (groups are called strata/stratum)
What are advantages and disadvantages of stratified sampling?
Advantages-
The strata reflects the proportions within the population, and the structures of the population
Disadvantages-
The groups within the population must be very clear
What is systematic sampling?
Where a sample size n is chosen out of the population size.
Use K= n/N
Every Kth person is chosen for the sample
What are advantages and disadvantages of systematic sampling?
Advantages-
Simple and quick
Suitable for large populations
Disadvantages-
A sample frame is needed
It can be bias if the sample frame is not random
What is a census?
A measure/ observation of the whole population
State three types of non-random sampling
Opportunity
Quota
Cluster
What is cluster sampling?
Where the population is divided into equally sized groups called clusters
One or two clusters are chosen at ransom to be the sample
What is opportunity sampling?
Where a researcher samples from people they have easiest access to until their desired sample size is reached
What are advantages and disadvantages of oppertunity sampling?
?
Advantages- easy and cheap
Disadvantages- not likely to be representative of population. Dependent on the individual researcher
What is quota sampling?
Where researchers are given a quota of types of people to interview. The quotas are in proportion to the relevant subgroups in the whole population
Opportunity sampling is then used to sample each quota
What are advantages and disadvantages of quota sampling?
Advantages-
Sample frames are not needed
Quick and easy
Small samples can still be representative of the whole population
Disadvantages-
Can be bias
When dividing the population, the quota can be inaccurate