Stats 1 Flashcards
Define population
A complete set of data where every element is included
Define sample
A selection of the population
Define census, why is it rare?
Every element is surveyed, rare because it is expensive and labor-intensive
Define sampling units
These are the elements of data in a sample
Define sampling frame
A list of sampling units
Define random sampling methods
Every element of the population has an equal chance of being selected
What are the 3 random sampling methods?
Simple random sample, systematic sample and stratified sampling
Define simple random sample
Using random numbers or ‘names out of a hat’
Define systematic sample
All elements numbered, the first element is chosen randomly then every nth element is chosen
Define stratified sampling
The proportion of groups in the population are represented in the survey
What is the danger of non-random sampling?
Bias
Define opportunity sampling
Your sample is whoever turns up
Define quota sampling
Numbers are calculated for each group then when groups are filled, others are ignored
Define qualitative data with examples
Groups of data, non-numerical, qualities like colours
Define quantitative data with examples
Numerical data, quantities like temperature
What two types of data can quantitative data be sectioned into?
Discrete and continuous
What is discrete data?
Whole numbers, can’t be decimals, like when you’re counting something
What is continuous data?
Data that can take any value, including decimals, for example, height, it can continue between each whole number of kilograms
What does n/a mean?
Not available, used when the data needed is not available
If one cell of data has n/a in it, what should you do to the whole row?
Ignore it
What does tr mean?
Trace, there was some but not enough to provide a reading and more than 0
If a cell of data has tr in it, what should you do to it? Why?
Change it to 0 so it can be included in means and other calculations
How is daily mean temperature measured?
Values are noted in degrees (Celsius) and tenths, values below 0 are preceded by a minus sign
What is the daily mean temperature?
The average of the hourly air temperature readings during the period 0900-0900 GMT
How is daily total rainfall measured?
Millimetres (mm)
What is daily total rainfall?
24-hour periods commencing at 0900 GMT on the day of entry and includes any solid precipitation, such as snow or hail, which is melted and measured in the same way as rainfall
How is daily total sunshine measured?
In hours and tenths
What is daily total sunshine?
The amount of bright sunshine recorded on the day of entry, measured by an instrument that measures the amount of solar radiation exceeding a threshold
How is daily maximum relative humidity measured?
As a percentage
What is daily maximum relative humidity?
A measure of how close the air is to being saturated with water vapour. Values greater than 95% are associated with mist and fog
How is wind speed measured?
Knots
What is knots to mph?
1 knot = 1.15 mph
How is the daily mean wind speed measured?
Averaged over the 24 hours from 0000 GMT on the date given
What is maximum gust speed?
The maximum instantaneous speed that occurred during the 24 hours from 0000 GMT on the date given
How is the daily mean wind direction measured?
Averaged over the 24 hours from 0000 GMT on the date given, averaged to the nearest 10 degrees
How is daily maximum gust direction measured?
In degrees from true north
What is daily maximum gust direction?
The direction from which the wind was blowing when the maximum gust during the hour commencing at the time of entry occurred
What is cloud cover?
The fraction of the celestial dome covered by cloud
How is cloud cover measured?
Measured in eighths (Oktas)
What is visibility?
The greatest distance at which an object can be seen and recognized in daylight, or at night could be seen and recognized if the general illumination was raised to daylight level
How is visibility measured?
Measured horizontally, values noted in decametres (Dm), a dash indicates data not available
How is pressure measured?
By the SI unit of pressure, the pascal (Pa)
Which station is the furthest north?
Leuchars
Which station is the furthest south?
Perth
5 UK Weather Stations
Leuchars, Leeming, Heathrow, Hurn and Camborne
3 International Weather Stations
Perth (Australia), Beijing (China) and Jacksonville (USA)
What is better, range or IQR?
IQR
What is variance?
A calculation of spread using every piece of data
Standard deviation is the ____ _____ of variance
Square root
Variance =
Mean of squares - square of means
Symbols for variance and standard deviation
σ for standard deviation and σ² for variance
What does an x with a bar on top mean?
The mean of x
Variance equation
((Σ(x²))/n) - ((Σx)/n)²
Standard deviation equation
√(((Σ(x²))/n) - ((Σx)/n)²)
How does coding by +/- effect mean?
If y = x - a, then mean(y) = mean(x) - a
How does coding by x/’/. effect mean?
If y = x/b, then mean(y) = mean(x)/b
How does coding by +/- effect standard deviation?
If y = x - a, then xσ² = yσ²
How does coding by x/’/. effect standard deviation?
If y = x/b, yσ² = (xσ²)/b
Another way to write variance
Variance = (sxx)/n
Probability rule for mutually exclusive events
P(A or B) = P(A) + P(B)
Probability rule for independent events
P(A and B) = P(A) × P(B)
Define random variable
A variable whose value depends on the outcome of a random event
Define sample space
The range of values that a random variable can take
What makes a variable random?
If the outcome is not known until the experiment is carried out
What does a probability distribution do?
Fully describes the probability of any outcome in the sample space
What is a discrete uniform distribution?
When all of the possibilities are the same
Sum of P(X=x) =
1
Binomial distribution
B(n, p)
Binomial distribution formula
(n choose r)(p^r)((1-p)^(n-r))
What symbol is the index?
n
What symbol is the parameter?
p
When can you model x with a binomial distribution?
.When there are a fixed number of trials, n
.there are two possible outcomes
.There is a fixed probability of success, p
.The trials are independent of each other
What does a cumulative probability function for a random variable x do?
Tell you the sum of all the individual probabilities up to and including the given value of x in the calculation for p(X≤x)
How do you find the cumulative probability function?
Either through the given table or function on the calculator
This is the phrase, what does it mean and what is the calculation used? greater than 5
x > 5
1 - p(x ≤ 5)
This is the phrase, what does it mean and what is the calculation used? no more than 3
x ≤ 3
p(x ≤ 3)
This is the phrase, what does it mean and what is the calculation used? at least 7
x ≥ 7
1 - p(x ≤ 6)
This is the phrase, what does it mean and what is the calculation used? fewer than 10
x < 10
p(x ≤ 9)
This is the phrase, what does it mean and what is the calculation used? at most 8
x ≤ 8
p(x ≤ 8)
A hypothesis test is …
A statement made about the value of a population parameter, which you can test by carrying out an experiment or taking a sample from the population
A test statistic is …
The result of the experiment or the statistic that is calculated from the sample
What is the null hypothesis?
H₀, the hypothesis you assume to be correct
What is the alternative hypothesis?
H₁, tells you about the parameter if your assumption is shown to be wrong
One tailed tests are …
When your alternative hypothesis is in the form > or <
Two tailed tests are …
When your alternative hypothesis is in the form ≠
When do you reject the null hypothesis?
You assume it is correct, then if the likelihood is less than a given threshold, called the significance level, then you reject the null hypothesis
A critical region is …
A region of probability distribution which, if the test statistic falls within it, would cause you to reject the null hypothesis
The critical value is …
The first value to fall inside the critical region
The acceptance region is …
The region where we accept the null hypothesis
The actual significance level is …
The probability of incorrectly rejecting the null hypothesis
How many critical regions are there in a two tailed test?
Two, one at each end of the distribution
If you have to carry out a one tailed hypothesis test you need to:
- Formulate a model for the test statistic
- Identify suitable null and alternative hypothesis
- Calculate the probability of the test statistic taking the observed value, assuming the null hypothesis is true
- Compare this to the significance level
- Write a conclusion in the context of the question
- Alternatively, you can find the critical region and see if the observed value of the test statistic lies inside it
For a two tailed test, what must you do to the significance level?
Halve it