Stats Flashcards
Population
A population is a group that we want to find information about. It might be a group of people or it could be simply a group of numbers.
Census
Disadvantage of collecting data as a census
A census is when information about every member of the population is collected.
The disadvantage of this method is that if the population is large, it can be difficult to collect and process so much information.
Sample
A sample survey is when information is collected from a small representation of the population.
Sampling unit
A sampling unit is a person/object to be sampled
Sampling frame
A sampling frame is the collection of all of the sampling units. Ideally, this should cover the whole population.
formula for the interquartile range
upper quartile(Q3) - lower quartile(Q1)
an outlier is described as?
a value that is greater than Q3+ k(Q3-Q1)
or
less than Q1-k(Q3-Q1)
where k is a constant which is normally fixed 1.5( occasionally changes)
data cleaning
the process of removing anomalies from data sets
Cumulative frequency
Adding up the frequencies
Bar chats are for ………….. data while histograms are for ………….. data
Discrete
Continuous
In histograms, the area is………. To the frequency.
So the formula for that is?
Proportional
Area=k * frequency
To find the median in a histogram
We divide the total frequency by 2
When drawing to scatter graphs, on which axis does the dependent variable and independent variable go?
Dependent on the y axis
Independent on the x
for independent events, the P(A) * P(B) is equal to?
the P(A n B)
Formula for the P(AuB)
P(A)+P(B) - P(AnB)
What does it mean for A and B to be mutually exclusive
When A and B cannot happen at the same
For independent events , what is the formula for P(AnB)
P(A)*P(B)
For mutually exclusive events , what is the value of (AnB)
0
Because A and B cannot happen at the same time
Formulae for P(A|B)
P(AnB)/ P(B)
Formula to determine whether an event is independent using conditional probability
P(A|B) = P(A)
In cumulative distribution functions , what is always the value of the last probability
1
What is meant by uniform distribution
When all probabilities in a discrete random distribution are equal
All the probabilities in a discrete random distribution add up to??
1
The four Conditions required for a binomial distribution
There must be a fixed number of trials
Each trial is independent of the others
There are only two outcomes(failure or success)
The probability of each outcome remains constant from trial to trial
For a binomial distribution, the formulae for the mean(expected value) is?
Number of trials * probability of success
N*p
In a binomial distribution, the formulae for the variance is
Np (1-p)
Where N is the number of trials and p is the probability of success
Is the expected(np) value ever rounded?
No
Normal distribution deals with ………. Values
Continuous
In a normal distribution, the total area under the curve is
1
Formula for Z in normal distribution
(X-u)/standard deviation
What is the value of the mean and standard deviation in a standard form(in normal distribution)
Mean=0
S.D= 1
The P(X=x) in a normal distribution is always?
0
What is the expression of the variance variance of a binomial distribution
np (1-p)
What is the expression for the mean of a binomial distribution
np
In two tail tests what do we do to the significance level
We halve it
Significance level
The probability of rejecting the null hypothesis when in fact it is true
When the probability is greater than the significance level, then what do we do
We fail to reject the null hypothesis
If our value is in the critical region, then what do we do
If our value is outside the critical region, what do we do
We reject the null hypothesis
We fail to reject the null hypothesis
What does it mean to reject the null hypothesis
It means we have enough evidence to support the alternative hypothesis. So there is a significant difference between the groups being compared
The value of r In PMCC is between which values , and what does it tell us
It’s between -1 and 1
It tells us how correlated our data is
The closer it is to 1, the closer it is to perfect positive correlation.
When calculating the standard deviation of a distribution from another distribution, do we add the constant ?
No