Overall Flashcards
What are the two major types of data?
Categorical (qualitative) and metric (quantitative)
What are the two subtypes of categorical (qualitative) data?
Nominal and ordinal
What are the two subtypes of metric (quantitative) data?
Continuous and discrete
What does nominal data relate to?
It is used to label variables without any order or quantitative value. It usually relates to named things and there are no units of measurements. We allocate each value to a specific category
What does ordinal data relate to?
The values can be meaningfully ordered and it is categorical because each value is assigned to a specific category
What does discrete data relate to?
The values are distinct and can have units of measurements. The data can have finite values and they are integers
What does continuous data relate to?
Fractional numbers that result from measurement and they can have units of measurement
In a box (and whisker) plot, what are the adjacent values (defined in this specific course)?
Furthest away from the median but still within 1.5 times the interquartile range
In a box (and whisker) plot, what are the points outside the adjacent values?
Potential outliers
What is the interquartile range?
Upper quartile value (3/4) subtracted by the lower quartile value (1/4)
What is the sample standard deviation?
Square root of the summation of (each value minus the mean) squared then divided by the sample size - 1
What are the residuals in the standard deviation equation?
Value minus the mean
What is the variance when the sample standard deviation is s?
s squared
What is skewness and how is it measured?
A measure of symmetry of a distribution and it is measured by the skewness coefficient that can vary between -1 and +1
What is the skewness coefficient for a symmetric distribution?
0
What is the skewness coefficient for a distribution with the mean to the left of the mode (most values are larger values in the range, long tail to the left in the negative direction)
Closer to -1 (left or negative skew)
What is the skewness coefficient for a distribution with the mean to the right of the mode (most values are smaller values in the range, long tail to the right in the positive direction)
Closer to +1 (right or positively skewed)
Probability theory is based on set theory, what is contained in set S (called space)?
All sets are subsets of set S
What is the null set?
The set that contains no elements
For experimental events, what is an event represented by and what is an impossible event?
An event is a set and an impossible event is the null set
If sets A and B are mutually exclusive, what is P(A+B) and the intersection of A and B?
P(A+B) = P(A)+P(B), and AB ={}
The conditional probability of A given B is defined as: P(A|B) =
P(AB)/P(B)
For conditional probability, should there be a causal or temporal relation between A and B?
They may or may not
What does it mean if conditional probability has no effect on the probability of an event P(A|B)=P(A)?
Events A and B are statistically independent
If A and B are statistically independent, what is P(AB) equal to?
P(AB) = P(A)P(B)
What is Bayes’ theorem? P(A|B) = ?
P(B|A)P(A) / P(B)
What does Bayesian probability include?
It incorporates any prior knowledge that a researcher might have about a hypothesis
Why is Bayes’ theorem also called the theorem of probability of causes?
A is the cause and B the effect
What is a random variable (RV)?
A number X(z) assigned to every outcome z of an experiment
What is the cumulative distribution function F(x)?
P{X<=x}
If the cumulative distribution function F(X) is continuous, what is its derivative?
The probability density function f(x) = dF(x)/dx
If the cumulative distribution function is discrete, what is f(x)?
A discrete distribution function, where f(x) = the sum of P{X=x_i}delta{x-x_i}, where delta is an impulse function
Can we calculate P(X=x) for a continuous cumulative distribution function and what do we do?
No because P(X=x) = 0 when continuous. We have to calculate the probability that X lies in a small interval around x by integrating f(x) across a small interval
What is the expectation or mean of a random variable when the cumulative distribution function is continuous?
The integral of xf(x)
What is the expectation or mean of a random variable when the cumulative distribution function is discrete?
The sum of x_i multiplied by P(X=x_i)
What is the variance of a random variable in terms of the mean or expectation (E)?
sigma squared = E[X^2] - E[X]^2
When do the sample-based approximations of mean and variance converge to the theoretical quantities?
When the sample size tends to infinity
Are probability mass functions for discrete or continuous variables?
Discrete
Are probability distribution functions for discrete or continuous variables?
Continuous
What are the three types of probability mass functions that this course deals with?
Bernoulli, binomial and uniform (discrete)
What are the four types of probability distribution functions that this course deals with?
Normal (gaussian), poisson, exponential, uniform (continuous)
The Bernoulli distribution is a special case of what type of distribution and what is the special case?
Binomial distribution with a single trial (n=1)
What is the outcome of a Bernoulli triart with outcome 0 or 1l?
A single experiment with outcome 0 or 1
For a Bernoulli distribution, what is the probability of X=1 (P(1))?
p
For a Bernoulli distribution, what is the probability of X=0 (P(0))?
1-p
What is the mean (expected value) of a Bernoulli random variable?
p
What is the variance of a Bernoulli random variable?
p(1-p)
Since the Bernoulli distribution is a special case of the binomial distribution, what could the binomial distribution be thought of as?
The number of successes in a sequence of independent Bernoulli trials
What are the parameters of the binomial distribution?
n = number of trials, p = probability of success
How are the Bernoulli probability distribution random variables denoted?
X ~ Bernoulli(p)
How are the Binomial probability distribution random variables denoted?
X ~ B(n,p)
What is the mean (expected value) of a binomially distributed random variable?
np