Definitions Flashcards
DATA
Raw information from which statistics are created
POPULATION
The pool from which a statistical sample is drawn. eg. total number of tech start ups in Asia
SAMPLES
Samples are units collected from the statistical population
SYSTEMATIC SAMPLING
Systematic sampling is where units are collected at regular intervals eg. every 10th person.
STRATIFIED SAMPLING
Dividing population into strata (SUB GROUPS) and then selecting units from each strata. Random samples are then taken from each strata, normally in proportion to the actual percentage of occurrence of the strata in the population.
CLUSTER SAMPLING
Cluster sampling begins by dividing population into clusters. eg suburbs. Then randomly select clusters. Every unit in the clusters selected are included.
CATEGORICAL DATA
Categorical variables are variables that put them into categories, eg. male/female, black/white, age group.
NUMERICAL DATA
Numerical data is data that can me measured such as time, height, weight or amount.
DISCRETE DATA
A discrete variable is one where data is counted eg. How many eggs a hen lays each day. The variable can never be negative, and there will never be half an egg. All numbers can be written down, and are whole numbers. Can be qualitative or quantitative.
CONTINUOUS VARIABLE
A continuous variable is where data is measured. How many litres of milk will a cow give daily.
ORDINAL DATA
Ordinal measure of data is where data is arranged in order, however differences between data have no meaning. eg on a scale of 1-10 how happy are you.
QUANTATTIVE
Quantitative variable has a value or numerical measurement.
QUALITATIVE
Qualitative variable describes an individual by placing it into a category or group, eg male or female.
SIMPLE RANDOM SAMPLE
Sample taken from a population randomly where each unit has the same chance of being selected.
REPRESENTATIVE
A representative sample is a sample that represents the population.
BIAS (Statistics)
The opposite of representative, this is where there is bias in a sample.
Co-efficient of variation.
CV= Sample mean / sample standard deviation X 100%. Used to compare the spread of two different data types. eg. pounds to rupees.
Variance in regards to standard deviation.
The variance tells us the square of standard deviation.
Descriptive statistics.
The explanation of data from a sample through the use of graphs and other descriptive tools. eg averages, modes, etc
Statistics
Collection Organisation Analysis Interpretation of DATA
Inferential statistics
Using the data from a sample to infer information about a population.
Sampling frame
List of individuals that make up the sample.
Sampling error vs non-sampling error
Sampling error is the difference between the measurements from the sample and population. Non-sampling error is from poor sample design, sloppy data collection or faulty measuring equipment etc.
Observational study vs Experiment.
Observational study is where observations and measurements are taken in a way that doesn’t change the response of the variable. Experiment is where a treatment is deliberately imposed on the individuals in order to observe a possible change.
Control group
This is the group that receives a dummy treatment to compare against the test group.
Lurking variable
Will generally have an effect on both the explanatory and response, will generally be difficult to measure.
Confounding variable
A variable that cannot be controlled but will have an effect on what is being measured and is taken into account when conducting an experiment. A variable that can produce effects that are confused of confounded with the effects of the independent variable
Discrete probability distribution
A discrete probability distribution is a distribution where the possible outcomes are discrete ie. roll of the dice or a toss of the coin.
How do you know that a probability distribution is valid
It will add up to 1.
≤
Less than and equal to
How to write “Probability of between 1 and 3 happening?”
P(1≤X≤3)
µ or x bar.
Mu. In statistics represents the population mean. Xbar represents the sample mean.
Σ
The sum of
σ
Population Standard deviation
E(X) statistics
Expected value of X
What is a probability distribution
Describes the values that could occur and the probability that each value might occur.
X~Bin (n,p)
5 properties?
Binomial distribution.
- Must have set number (n) trials
- Each trial has only two possible outcomes, “success” or “failure”.
- Results of each trial are independent of other trials.
- Fixed probability (p) “success” in each trial.
- (x) is defined as a number of successes in (n) trials.
At most
At least
At most is up to and including the number ≤.
At least is greater than and including ≥.
Short cut formulas for µ and σ of binomial distributions.
Mean µ = (np)
STDEV σ = √np(1-p)
X~N(µ,σ)
Formula for normal distribution
Standardise formula (z score)
x - µ
_____
σ
What is the standard deviation of the SAMPLING DISTRIBUTION OF THE MEAN called?
Standard error.
n=
sample size
SAMPLING DISTRIBUTION OF THE MEAN formula for changing standard deviation to sample error.
σ / √n
e
e is the error amount.
Rules for CLT?
Sample must be large enough.
Must be random sample. (30)
k
Critical value
Is it a random sample?
Not sure, read the question, ask the question.
What is the rule for T distribution use?
If n> 30, use normal distribution. If n< 30, use T-distribution.
T-distribution must come from a normally distibuted population.
Quartile
Decile
Percentile
Quartile distribution divided by 4 0.25
Decile distribution divided by 10 0.1
Percentile distribution divided by 100 0.01
Difference between x-bar and p-hat
You need to be a little cautious about assuming that particular symbols like xbar and phat will always have the same meaning, as they are just symbols. However, those two are quite common and consistent. The first is a mean which is the sum of the observations divided by the number of observations. The second is a proportion, the number of ‘successes’ divided by the number of ‘attempts’.
p
p is considered to be the exact probability of an event happening on a given trial.
Conditional probability
Where one variable effect the next ie if you have a bag of red and blue marbles, pulling one out changes the probability of the colour of the next one
Contingency table
Table where frequency proportions of events can be plotted and then cross calculated
Statistical independence
When one outcome does not effect another outcome or event.