Why lean statistics? Flashcards
Prime my brain for higher level concepts and understanding
“Statistics” means…
Statistical procedures
The uses of statistics
- Organize and summarize information
- Determine exactly what conclusions are justified based on the results that were obtained
Goals of statistical procedures
- Accurate and meaningful interpretation
- Provide standardized evaluation procedures
Variable
Characteristics or condition that changers or has different values for different individuals
Data (Plural)
Measurements or observation of a variable
Data set
A collections of measurements or observations
A datum (singular)
- A single measurement or observation
- Score or Raw score
Parameter
- A value, usually a numerical value, that describes a population
- Derived from measurements of the individual in the population
Statistic
- A value, usually a numerical value, that describes a sample
- Derived from measurements of individuals in the sample
Descriptive Statistics
- Summarize data
- Organize data
- Simplify data
E.g.,
- Tables
- Graphs
- Averages
Inferential Statistics
- Study samples to make generalizations about the population
- Interpret environmental data
Common terminology
- “Margin of error”
- “Statistically significant”
Sampling error
The Sample is never identical to the population.
Sampling error is the discrepancy, or amount of error, that exists between a sample statistic and the corresponding population parameter.
Example: Margin of error in polls
“This poll was taken from a sample of registered voters and has a margin of
error of plus-or-minus 4 percentage points”
Data Structure I: The correlational Method
- One group of participants
- Measurement of two variables for each participant
- Goal is to describe the type and magnitude of the relationship
- Patterns in the data reveal relationships
- Non-experimental method of study
**Can not establish causation
Characteristics:
Strength
Form (Usually linear)
Direction
Data Structure II: Comparing two (or more) groups of scores
- One variable defines groups
- Scores are measured on the second variable
- Both experimental and non-experimental studies use this structure
E.g., T-test and ANOVA
Experimental Method
Goal
- To demonstrate a cause and effect relationship
Manipulation
- The level of one variable (IV) is determined by the experimenter
Control - rules out influence of other variables (confounds)
- Participant variables
- Environmental variables
Independent variable
The variable manipulated by the researcher (independent because no other variable influences its value - e.g., sex)
Dependent variable
The variable that is observed to assess the effect of treatment (dependent because it is thought to depend on the value of the IV)
Experimental method: Methods of control
- Random assignment
- Matching of subjects
- Holding level of some potentially influencing variables constant
Experimental Method: Control condition
- Individuals do not receive the experimental treatment.
- They either receive no treatment or they receive a neutral, placebo treatment
- Purpose: to provide a baseline comparison with the experimental conditon
Experimental Method: Experimental condition
Those who receive the experimental treatment
Non experimental Methods
Non-equivalent Groups
- Researcher compares groups
- Researcher cannot control who goes into which group
Pre-test/Post-test
- Individuals measured at two points in time.
- Researcher cannot control influence of the passage of time
Independent variable is ‘Quasi-independent’
Quasi-independent
Cannot be controlled
e.g., age, sex, traits
Constucts
- Internal attributes or characteristics that cannot be directly observed
- Useful for describing and explaining behavior
Observational Definition
- Identifies the set of operations required to measure an external (observable) behavior
- Uses the resulting measurements as both a definition AND a measurement of a hypothetical construct
Discrete variable
- Has separate, indivisible categories
- No values can exist between two neighboring categories
Distinct numbers
e.g., 35, 1, 8, 3
Continuous variable
- Has infinite number of possible values between any two observed values
- Every interval is divisible into an infinite number of equal parts
e.g., height, weight, temp, time
Scales of measurement
- Nominal
- Ordinal
- Interval
- Ratio
Nominal Scale
Characteristics:
- Label and categories
- No quantitative distinctions (no numbers just names)
Examples:
- Gender
- Diagnosis
- Experimental or control
Ordinal Scale
Characteristics:
- Categorize observations
- Categories organized by size or magnitude
Examples:
- Rank in class
- Clothing sizes (S, M, L, XL)
- Olympic metals
Interval Scale
Characteristic:
- Ordered categories
- Interval between categories of equal size
- Arbitrary or absent of zero point
Examples:
- Temperature
- IQ
- Golf scores (above/below par)
Ratio Scale
Characteristics:
- Ordered categories
- Equal interval between categories
- Absolute zero point
Examples:
- Number of correct answer
- Time to complete task
- Gain in height since last year
X
Independent variable
Y
Dependent variable
N
Number of scores in the population
n
Number of scores in the sample
Σ
Summation
- Done after operations in parenthesis, squaring, multiplication and division.
- Done before other addition or subtraction
What is central tendency?
A statistical measure
A single score to define the center of the distribution
Purpose: find the single score that is the most typical or best represents the entire group
Central Tendency Measures
There’s no single concept of central tendency that always the “best”
Different distribution shapes require different conceptualizations of “center”
Choose the one which best represents the scores in a specific situation
The mean is…
the sum of all scores divided by the number of scores in the data
Population:
Sample:
Mean as a balance point
The Weighted Mean
- Combine two sets of scores
Three steps:
1. Determine the combined sum of all the scores.
2. Determine the combined number of scores
3. Divide the sum of scores by the total number of scores
Overall Mean =
Computing the Mean from a Frequency Distribution Table
Characteristics of the Mean
- Changing the value of a score changes the mean
- Introducing a new score or removing a score ‘changes’ the mean (unless the score added or removed is ‘exactly’ equal to the mean)
- Adding or subtracting a constant from each score changes the mean by the same constant
- Multiplying or dividing each score by a constant multiplies or divides the mean by that constant
The Median
- The median is the midpoint of the scores in a distribution when they are listed in order from smallest to largest
- The median divides the scores into two groups of equal size
Locating the Median (odd n)
- Put scores in order
- Identify the “middle” score to find median
3 5 8 10 11
“Middle” score is 8 so median = 8
Locating the Median (even n)
- Put scores in order
- Average middle pair to find median
1 1 4 5 7 9
(4 + 5) / 2 = 4.5
The Mode
The mode is the score or category that has the greatest frequency of any score in the frequency distribution
- Can be used with any scale of measurement
- Corresponds to an actual score in the data
It is possible to have more than one mode
Bimodal Distribution
Symmetrical Distributions
- Mean and median have same value
- If exactly one mode, it has same value as the mean and the median
- Distribution may have more than one more or no mode at all
Central Tendency in Skewed Distributions
Mean, influenced by extreme scores, is found far toward the long tail (positive or negative)
Median, in order to divide scores in half, is found toward the long tail, but not as far as the mean
Mode is found near the short tail.
If Mean – Median > 0, the distribution is positively
skewed.
If Mean – Median < 0, the distribution is negatively
skewed
Positive Skew
Negative Skew
Overview of variability
Variability is defined as:
1. Quantitative ‘distance’ measure based on the differences between scores
2. Describes ‘distance’ of the spread of scores or ‘distance’ of a score from the mean
Purpose
- Describe the distribution
- Measure how well an individual score represents the distribution
What are the Three Measures of variability?
- The range
- The variance
- The Standard deviation
The range
- Distance covered by the scores in the distribution (smallest value to the highest)
- For continuous data, real limits are used
- Based on two score, not all data (an imprecise, unreliable measure of variability)
The most common and most important measure of variability is…
Standard deviation
- A measure of standard, or average, distance from the mean
- Describes whether the scores are cluster closely around the mean or widely scattered
The calculation differs for sample and population
Variance is a necessary ‘companion concept’ to standard deviation but NOT the same concept
ADD equation image
Defining the Standard Deviation
Step one: determine the deviation score
X — μ
Step two: Find a sum of deviations
∑(X — μ)
Step 2 (revised):First square each deviation then sum the ‘squared’ deviations (SS)
SS= ∑(X — μ)^2
Step 3: Variance = Average the ‘squared’ deviations
Step 4: SD=SqrT(Variance)
Variance is known as…
mean squared deviation
Standard deviation
Averaged distance of the scores from the mean
Sum of squares formulas
continue
slide 67