Basic Concepts Flashcards
What is Statistics?
The art and science of answering questions and exploring ideas through the process of gathering data, describing data and making generalizations about a population on the basis of a smaller sample.
Basic concept: Unit
The basic objects on which the data is collected.
When conducting a research study, information is collected concerning units.
Basic concept: Variable
Characteristic of units that can take on different values (in other words, something that can vary).
Variables can be classified as (2 options)
Categorical (or qualitative) and Quantitative.
Basic concept: Categorical variables
Categorical (or qualitative) variables refer to variables with calues that can’t be quantifiable. These can be classified as:
Nominal (describes a name, label, or category without logical order);
Ordinal (whose values are defined by order between the different categories)
Basic concept: Quantitative variables
Quantitative (or numeral) variables have numerical values with magnitudes that can be places in a meaningful order with consistent intervals. These can be classified as:
Discrete (can only take on a set number of values - whole numbers)
Continuous (can take on any value or any value beteween values)
Categorical or Quantitative?
- Weight
- Favorite ice cream flavor
- Children per household
- Running distance
- Religion
- Satisfaction rating
- Quantitative
- Categorical (nominal)
- Quantitative (discrete)
- Quantitative (continuous)
- Categorical (nominal)
- Categorical (ordinal)
Basic concept: Sample
A smaller subset of the population.
Basic concept: Statistics vs Parameters
Values concerning a sample are referred to as SAMPLE STATISTICS while values concerning a population are referred to as POPULATION PARAMETERS.
Basic concept: Sampling Bias
Systematic favoring of certain outcomes due to the methods employed to obtain the sample.
How to avoid sampling bias?
The most common probability-based sampling method is the SIMPLE RANDOM SAMPLING method.
Basic concept: Simple random sampling
A method of obtaining a sample from a population in which every member of the population has an equal chance of being selected.
Basic concept: Central tendency measures
Describe the “center” arounf which the data is distributed. Mean and Median are two of the most commonly used.
Basic concept: Variability measures
Describe “data spread” or how far away the measurements are from the center.
Variance and Standard deviation are the most commonly used.
Basic concept: Relative standing measures
Describe the relative position of specific measurements in the data. Ex - percentile and quartiles
Basic concept: Mean
The numerical average; calculated as the sum of all of the data values divided by the number of values.
The sample mean is represented as ¯x (”x-bar”) and the population mean is denoted as the Greek letter µ (”mu”).
Basic concept: Median
The middle of the distribution that has been ordered from smallest to largest; for distributions with an even number of values, this is the mean of the two middle values.
Basic concept: Standard Deviation
Roughly the average difference between individual data values and the mean.
The standard deviation of a sample is denoted as s.
The standard deviation of a population is denoted as σ.
The standard deviation is equal to the square root of the variance.
Basic concept: Correlation
A measure of the direction and strength of the relationship between two variables.
The correlation between two quantitative variables of a sample is denoted r.
The correlation between two quantitative variables of a population is denoted ρ (Greek letter ”rho”).
6 properties of correlation
- −1 ≤ ρ ≤ +1.
- For a positive association, ρ > 0, for a negative association ρ < 0, if there is no relationship ρ = 0.
- The closer ρ is to 0 the weaker the relationship and the closer to +1 or -1 the stronger the relationship.
- The sign of the correlation provides direction only.
- Correlation is unit free; the x and y variables do NOT need to be on the same scale (ex: correlation between height in cm and weight in lbs).
- It doesn’t matter which variable you label as x or y. The correlation between x and y is equal to the correlation between y and x.