Stats Flashcards
Most common observation study?
Surveys
What are surveys? (Observational study)
Questionnaires presented to individuals, selected from a POPULATION OF INTEREST
What is the role of surveys (what they can and can’t do)?
- Can only report relationships between variables
- Cannot claim CAUSE and EFFECT
What is an experiment?
The systematic procedure carried out under controlled conditions
What is the role of experiments (3)?
- To discover an unknown effect
- To illustrate a known effect
- To test OR establish a hypothesis
What should experiments be designed to do?
Minimise BIASES that might occur
When analysing a process, experiments are used to evaluate…
- Which PROCESS INPUTS have a significant impact on the PROCESS OUTPUTS
What’s the process called behind the several different ways to collect experimental process input/output information?
Design of Experiments (DOE)
Purpose of experimentation… (6)
- Comparing alternatives
- Identifying the significant inputs (factors) which affect the outputs response
I.e. separating vital many from the trivial few - Achieving an OPTIMAL PROCESS OUTPUT (response)
- Reduce Variability
- Minimizing, Maximizing, or Targeting an Output
- Achieve product & process robustness
To minimize bias, you need to…
Select your sample of individuals randomly!
What are the three data collection types? (3)
- Categorical data
- Numerical data
- Ordinal data
What is Categorical data?
Records qualities or characteristics about the individual, such as eye color or opinions (agree/disagree)
(NB Numbers do not have “real numerical meaning”)
What is Numerical data?
Records measurements or counts regarding each individual
What is Ordinal data?
Are in between categorical and numerical: data appear in categories, but the categories have a meaningful order (E.g. Rankings 1st - 5th (best to worst))
If the data set contains an even number of values… (median)
The median is the average of the two values that are in the middle
Standard Deviation? (definition)
Quantifies the typical distance from any value in the data set to the centre
Standard Deviation (equation)
sigma = sqrt (sum: xi - mean x)^2/n-1
Properties of standard deviation
- Is always +ve
- Smallest possible value is zero
- Affected by OUTLIERS
- Has the same UNITS as the original data
A random variable is…
a variable whose possible values are numerical outcomes of a RANDOM PHENOMENON
Types of random variables:
- Continuous
- Discrete
A probability of distribution is…
a list of possible values of a random variable,
together with their probabilities
A binomial distribution is…
a frequency distribution of the possible number of
successful outcomes in a given number of trials in each of which there is the same probability of success… (I.e. SUCCESS/FAILURE)
Characteristics of a Binomial Distribution (4)
- Must be a fixed number of trials (n)
- Only two outcomes: SUCCESS/FAILURE
- The probability of success,p, must remain the same for each trial (p)
- The outcomes of each trial must be INDEPENDENT of each other
If a random variable X has a binomial distribution, PROBABILITIES for X can be calculated using the following formula:
(n choose x) (p^x)(1-p)^n-x
Binomial Distribution parameters:
n = no. trials x = no. successes n-x = no. fails p = success probability (any trial) 1-p = failure probability
Probabilities of a binomial distribution hold between…
0 to n (least/most no. successes in a trial)
For a binomial random variable the mean is:
µ = n.p
The variance of a random variable is…
The weighted average of the squared distances from the mean
The variance of a random variable is… (formula)
sigma^2 = n*p(1-p)
Discrete random variable:
A variable which can only take a countable number
of values
Continuous random variable:
A random variable takes on values within AN INTERVAL (has so many possible values that they might as well be considered continuous)
The most adopted distribution for continuous
random variables:
The normal distribution
The Normal Distribution: Definition
Random Variable X follows a normal distribution if its values fall into a bell-shaped continuous curve that is symmetric
The Normal Distribution: Fundamental characteristics (3)
- The area under the curve is EQUAL TO UNITY
- It has symmetry about the centre (i.e., it has 50% of values less than the mean and 50% greater than the mean)
-Each normal distribution is described via the mean,
µ, and the standard deviation
Saddle Points:
Where the bell-shaped curve changes from concave down to concave up.
Distance between the mean and the saddle points
1 σ