8- Statistics and hypothesis testing in ABA Flashcards
You work Top : Large to small- Theory Hypothesis Test hypothesis Specific answer
Requires statistics to interpret large amounts of data (Quantitative/hard number)
majority of Social science researchers have a ____ orientation
Deductive Research Paradigm
AKA deductive approach
Work from Bottom Up, small to large:
Generalize Analysis - (Results: come to conclusions that you can generalize to other people about. ) Data
Fluid, qualitative approach
Examples of qualitative research:
interviews
observation of cultures
Focus groups
.
Inductive approach – research
Research in ABA is typically ……..in that we do not test hypotheses
but we are also quantitative
Reversal designs are : -flexible (ABA vs. ABAC) -quantitative, - without a pre-determined outcome Why the differences? Not withstanding the differences can we use the tools?
inductive
•Goal:
- To “Describe” Properties of the sample(s) you’re working with
- can talk about the central tendency of the sample or population in terms of what the most typical score in your sample or population look like.
- can talk about the variability Around the measure of typicalness be it mean median or mode. What is the variability around that measure of central tendency
- and talk about Effect size
”Descriptive” statistics
-Complements visual analysis
Already use them to describe:
•level change
• IOA
Can use in Program evaluation By aggregating data across clients
May open doors for Funding.
Ex. Effect size (Can be compared to other effect sizes)
Descriptive Statistics in ABA- Reasons for using
May hide Trends in behavior
Descriptive statistics in ABA: reasons for not using
Goal:
• To Use a sample data as a basis for Answering questions about the Population. (Can’t access whole populations. Instead we collect samples.)
• Since we rely on samples, we must to better understand how they relate to populations.
••Then we use HYPOTHESIS testing to make those inferences : T-tests, ANOVA etc
(The inferences about the samples are about the population from which the sample was drawn.)
(And the inferences are about relationships or features Of the population.)
Inferential statistics- Goal
Appropriate for certain types of research-
ex. When ABA does not use single case design such as contingency management – group
May open doors to funding
• hypothesis testing
Perceived weakness of reliance on Visual analysis in ABA.
Inconsistent?
Reasons for using Inferential statistics - ABA
• Do not tell us how likely the results are to be replicated.
- in ABA We use an ABA design or Multiple BASELINE design.
- INFERENTIAL statistics, we’re not Operating under circumstances that allow us to REPLICATE effect.
Do not tell us the probability that the results were due to Chance
Tells us The Probability is a CONDITIONAL probability event under true null hypothesis
- Very few situations in which there is only randomness in data.
- Best way to increase your chances of significance is increasing number of participants.
- A large number of variables that will have very small effects become important.
- Limits the reasons for doing experiments.
- Reduce scientific responsibility.
- Emphasizes population parameters at the expense of behavior.
“Behavior is something an individual does not what a group average does.”
•We should be attending to:
- value/social significance,
- durability of changes
- Number and characteristics of participants that improve in a socially significant manner.
Inferential - Some reasons for not using it in ABA
Looked at behavioral treatment and normal educational and intellectual functioning and young autistic children (Journal of consulting and clinical psychology, 1987)
Hypothesis: the construction of a special, Intense, and comprehensive learning environment for very young children with autism would allow them to catch up with their normal peer is by first grade.
Subjects were young children diagnosed with autism.
- Group one : 19 subjects – 40 hours a week of ABA
- Group twi: 19 subjects- 10 hours a week of ABA
- Group 3:21 subjects – other treatments
Groups of one and two received two or more years of therapy
Lovaas
Statistical analysis (MANOVA) used to compare the DV (IQ) To show that the intensive group demonstrated a large increase relative to the other conditions
He was a behavior analyst. Why hypothesis testing, statistics, and IQ as a dependent variable?-
- Intensive, long-term study that used measures and analysis that others NOT in our field would pay attention to.
- Control groups allowed for strong conclusions
Inferential Statistics
Lovaas Study
- Nominal (name) refers to categories
Ex. School districts and colors - Ordinal (order), Quantities that have an order
Ex. Physical fitness and pain scale
(Not a lot you can do with these two types of data) - Interval - difference between each value is Even
Ex. Degrees Fahrenheit - Ratio: when the difference between each value is even, has a true Zero
Ex. Time, weight, temperature in kelvin
Practically, interval and ratio are types of data we are interested in
data used in statistics 4x
- Mean
- Median
- Mode
More than one because many different types of Distributions are possible.
Three measures of central tendency
Descriptive statistics
The sum of the score is divided by the number of scores
Advantage: every number in the distribution is used in its calculation
However changing a single score or adding a new score will change it, except when the new score equals it
Most preferred measure
- Every score used it it’s calculation
- used to calculate other statistics
However Some situations in which mean cannot be calculated or is not most Representative measure.
Remember, the goal is to find a single value that best represents the entire distribution (median and mode)
Mean
The score that divides the Distribution exactly in half
A ____ Splits gives researchers two groups of equal sizes..
- Low Scores - High Scores
Median
- Collect all Odd number of scores
- List from Lowest to Highest
- It’s the Middle score
Ex., (10, 11, 12, 13, 14. )____. = 12
Even number of scores:
- List from lowest to highest
- Add the middle 2 scores and divide by two
Example, 2, 3, 5, 8, 10, 12, = 5+8/2 = 6.5
Calculate Median
Use when:
there are Extreme scores/skewed distribution’s
Undetermined Values
Open ended distribution’s
Median: When to use
Is the score or category that has the greatest flexibility ( Peak)
A distribution can have more than one mode,
• bimodal
•multimodal
Easy to find in basic frequency distribution tables
NOT A frequency. It’s a score or category
Mode
Two modes/peaks;
Can be equal or major/minor
BiModal
More than two modes
Multimodal
Use when It can be used in place of or in conjunction with other measures of central tendency. That is, when there are:
1. NOMINAL Scales; (only measure of central tendency for nominal Scales),
Ex. Are you male or female. 40 are male, 60 female. Can’t calculate the mean or median but can say the most TYPICAL participant is a female because thats 60% of the sample.
- Use when there are: Discrete Variable: “What is most typical” score; remember the goal of measures of central tendency
Ex. to know the number of golf clubs – calculate the mean.. Most typical score - Describing shape: easy to figure out
Mode
Describes the distribution in terms of Distance;
How far is that person from the central tendency whether mean, median, or mode
Distance between one score and another or,
Distance between one score and the mean
Describes how well each score or a group of scores describes the entire distribution.
Provides A quantitative measure of the degree to which scores in a distribution are spread out or clustered together.
Variability
- Range
2 interquartile
- standard deviation - Most important
Three measures of variability
The distance between the Largest score and the “Smallest” score plus 1
A crude, unreliable measure of variability because:
-Does not consider ALL the scores in the distribution
Calculate:
Ex. 1
1, 4, 5, 8, 9, 10
10 - 1 + 1 = 10
Ex. 2:
10, 15, 20, 25, 30, 35, 40
40- 10+ 1 = 31
Take Highest and lowest, ignore the others in the range. Not detailed variability.
Range – variability Measure
Most important measure of variability
Measures the “Typical” DISTANCE from the MEAN and uses ALL Of the scores in the distribution
How far is Score from the mean.
Using an ABA:
- can be used to identify variability in behavioral data (Autocorrelation can be used for this too).
- Can be described to identify important variability in IOA Data.
Mean and range tell us nothing about which set of circumstances we have which is why we should always report standard deviation over IOA scores along with mean.
Standard deviation – variability measure
The relationship between samples of populations
Cannot talk about the Exact Relationship between samples of populations…
But we can talk about Potential outcomes (I.e. Probability)
Probability - inferential statistics
To make ”inferences” about Populations based on sample data
We are Sampling the population with a certain Probability
Two kinds:
- Subjective
- Objective
Inferential statistics – Role
Based on experience or intuition
-Chance of rain, likelihood of recession, chance of getting married in the next year, likelihood of Miami Heat winning another championship
Subjective probability
Based on mathematical concepts and theory
Objective probability – inferential statistics
P(event) =. # of outcomes classified as the event divided by/ total number of Possible outcomes
The probability of event A, p(A), Is the ratio of the number of outcomes that include event A to the total number of possible outcomes
Example What is the probability that a selected Person has a birthday in October, assume 365 days in a year?
Step 1: how many chances are there to have a birthday in a year?
Step 2: how many chances are there to have a birthday in October?
Step 3: the probability that a randomly selected person has a birthday in October is:
P (October birthday) = 31/365 = 0.0849
Probability formula
Contained in a limited range 0-1.
If P = 0, the event will not occur
If P = 1, the event will always occur
Can be expressed as fractions, decimals, or percentages.
These values are always positive.
Ex, P = 3/4, P = 0.75, P = 75%
(All these values are equal)
In order To apply these rules to samples and populations, we must satisfy two requirements:
- Each individual in the population must have an equal Chance of being selected
- If more than one individual is to be selected for sample, there must be Constant probability for each and every selection (Sampling with replacement)
Example: you draw a number out of a hat and record it, you put the number back and it can be chosen again.
Remember, probability and proportion or equivalent. Thus, whenever a population is presented in a frequency distribution grass, it will be possible to represent probabilities of proportions of the graph.
Ex., if a population is presented in a graph, it is possible to represent probabilities as proportions. What is the probability of drawing an exam of B or better out of the pile of 31? Many students getting B or better = 24. 24/31 = 77 proportion or 77%. Can convert from a frequency distribution to probability
Probability values
Normal shape distributions are the most common occurring shape for population distribution’s.
Identify sections of a normal distribution using Z scores (Eg, 1 or 2 SD above mean)
The normal shape can also be described by the proportions of area contained in each section of the distribution.
Ex., 1). Left and right sides of distribution have the same proportions
2) Proportions apply to any normal distribution
Why is this important? We can now describe X values ( Raw scores) In terms of probability.
Ex., What is the probability of randomly selecting a person who is taller than 80 inches? 2.28% (See slide)
Ex., Raw score of 118 on IQ test converts to Z = 1.02. Look for corresponding proportion in table
Where Do These Percentages Come From?
Example: Raw score of 118 on IQ test converts to z = 1.02
Look for corresponding proportion in table
Probability And frequency distribution
A tool that allows you to see how you’ve done in a
normal distribution.
Identify sections of a normal distribution using Z scores (Eg, 1 or 2 SD above mean)
Collecting enough data tends to yield a normal distribution
If I get a particular score, I can convert it to a Z-score (if I know the SD and mean).
Z-score of 1.0 means I did better than 84% of the population
Needing the population standard deviation and mean is a large limitation.
Why use z-score? Because data are normally distributed along the axis. if I get a score and it translates into a 1, I know exactly how I did compared to everyone else. You can find Z score on a table
Application of Z-Scores Test 1 908 958 962 977 1000 1000 1045 1046 1047 1060 Mean 1000 SD 50 25 Test 2 109 121 125 145 152 158 165 170 178 180 What if I took both tests? Test 1 Score: 1100 Test 2 Score: 200
Application of Z-Scores 98% 2% Test 1 score: 1100 Z-Score: 2 Test 2 score: 200 Z-Score: 2 What if I get a Z-score that is not a pretty number?
Z – score review
Normally distributed population:
(Mean) = 24 years old
(SD) = 2 years
Normally distributed population: = 24, = 2
1. Draw a sample of 25 from population: = 22
2. Draw a second sample: = 22
3. Draw a third sample: = 20
4. Draw a fourth sample: = 18 5. Draw a fifth sample: = 26
6. Draw a sixth sample: = 22
7. Draw a seventh sample: = 24
8. Draw an eighth sample: = 24
9. Draw a ninth sample: = 26
10.Draw a tenth sample: = 24
Normally distributed population: =24, = 2 We now have 10 means from samples of 25:
22, 22, 20, 18, 26, 22, 24, 24, 26, 24
We take those 10 means and create a frequency
distribution of the means:
Closer inspection of the distribution of sampling means reveals a mean = 22.8 and SD (called standard error of the mean) = 2.52 ( Average distance between data points)
Distribution of Sample Means
What did we just do?
• Used a sample to provide information about a population
• What do we already know about this process?Samples provide incomplete pictures of the population called; Sampling Error
Sampling distribution of the means:
Inferential statistics
(The difference between the mean of a sample and the mean of the population)
Or..
The discrepancy or amount of error between a Sample statistic and its corresponding population parameter
From Illustration:
-Population mean = 24, SD = 2
- Sample mean = 22.8, ST = 2.52 (average distance between data points)
Samples will be different from the population because there are different individuals, different scores and therefore different sample means.
Sampling error
Sampling Error
How can you tell which sample best describes the population?
Can you predict how well a sample will describe its population?
What is the probability of selecting a sample that has a certain sample mean?
We answer this question by establishing a set of Rules that…
…Relate samples to populations
The collection of sample means for all possible random samples of a particular size ,(n), that pcan be obtained from a population.
Eg, 10 samples yielded a collection of sample means and each sample size was 25, (random samples of a particular size (n). )
Different samples taken from the same population will yield different statistics
In most cases, it is possible to obtain thousands of different samples from one population
The sample means tend to pile up around the population mean
The distribution of sample means is approximately NORMAL in shape
We can use the distribution of sample means to answer PROBABILITY questions about the sample means
How can we predict characteristics of the sample?
It’s not always possible to collect and compute ALL the possible sample means…
..So we need some general characteristics that describe a distribution of sample means. Leads to the Central Limit. Theorem
Distribution of sample means
Summary: The larger your number of samples, the more normal your distribution will be.
For any population, the distribution of sample means will approach a normal distribution as “n”. approaches infinity
The shape of the distribution of sample means will be almost perfectly NORMAL if either one of the following conditions is satisfied:
• Population from which sample selected is normal, and the number of scores (n) and each sample is relatively LARGE (n > 30)
• A sample mean is Expected to be near its population mean
Central Limit Theorem
The larger the sample size, the more probable it is that the sample mean will be CLOSE To the population mean.
Primary use of a distribution sample means is to find the probability associated with any specific Sample
The law of large numbers
A statistical method that uses sample data, statistics, to Evaluate a hypothesis, question, about a population parameter.
A basic,common inferential procedure that uses Z – score is, probability, and the distribution of sample means
Purpose: to help researchers differentiate between REAL patterns in data and RANDOM Patterns in data:
- .
Hypothesis testing
Begins with a population with Known parameters.
Goal: to determine what happens to the population after the Treatment is administered.
If treatment has any affect, it is simply to add or subtractm, a constant amount to each individual score.
•Shape and standard deviation will remain the same
Hypothesis testing
Because researchers must have a Standardize method for evaluating results of their research studies. Not everyone will acknowledge visual analysis.
Need to disseminate and speak the language of those outside of behavior analysis.
Formulized testing procedure: why it is important
- Random sampling
- Independent observations; Each Individual data point you get must be independent of the next data point
- Value of SD is unchanged by the treatment and Normal sampling distribution
Assumptions for hypothesis tests with Z – score
- State the hypothesis about a population
- Set the criteria for a decision
- Use the hypothesis to predict characteristics that the sample should have.
- Collect data and compute sample statistics. Obtain a random sample, compute mean
- Making a Decision
- Compare the obtained sample data with the prediction that was made from hypothesis
Hypothesis testing: four main steps
Determine the effect of a certain treatment on the population mean:
• What is the effect of verbal stimulation on the language development of an infant?
• What is the effect of the use of alcohol on visual and auditory perception?
•What is the effect of family therapy on the relapse hospitalization rate of schizophrenia patients?
• What is the effect of the empty chair gestalt technique on the expression of anger and sadness?
Example hypothesis
Step 1: State the hypothesis
•Statements about unknown population after treatment in terms of population parameters
• Two opposing hypotheses (non-directional):
-1. NULL hypothesis H0 : Predicts the independent variable (treatment) will have NO effect on the dependent variable (H0; u = ?)
- Alternative hypothesis H1, predicts that the independent variable (treatment) WILL have an effect on the dependent variable. (Symbolic statement: H1: u = ? ).
The Null Hypothesis H0 vs The alternative hypothesis H1. ..Two opposing hypotheses (non-directional)
Keep in mind – these are non-directional hypothesis…So they are referring to two- tailed hypothesis tests… Which means they DO NOT predict the direction (increase or decrease) of change
Step One: Hypothesis Testing
Think of being on trial for a crime and you are innocent
Your plea will be “not guilty”
Null hypothesis: No relationship (not guilty)
Alternative hypothesis: Guilty
You are presumed not guilty, the prosecutor must demonstrate that you are guilty
If the prosecutor is successful (found guilty) the jury has accepted the alternative hypothesis
If the prosecutor is unsuccessful (found not guilty) the jury has failed to accept the alternative hypothesis
The defense attorney did not prove you are innocent, there is just not enough evidence for the alternative hypothesis
Fail to Reject?
Example:
Null hypothesis =Treatment A is no better than Treatment B
Alternative=Treatment A is better than Treatment B
Possible researcher conclusions:
-If Treatment A is better than Treatment B,
accept the alternative hypothesis
- If Treatment A is NOT better than Treatment B, fail to reject the null hypothesis
Null Hypothesis
Step 2: Set the criteria for a decision..
By using data from the sample to evaluate the credibility of the null hypothesis
Create a distribution of sample means if the null hypothesis is true
Divide the distribution of sample means into two regions:
- Sample means likely to be obtained if H0 is true
- Sample means that are very unlikely to be obtained if H0 is true
Need to separate high probability samples from low probability samples
Step Two: Hypothesis Testing
Step three:
Collect data, compute sample statistics
Date collected after hypothesis stated and criteria set
•Ensures honest, objective manipulations of the data
Sample mean computed
Sample mean compared with the mean stated in H0
Step Three: Hypothesis Testing
Step four:
Make a decision:
• 2 possible decisions using z scores:
1. Reject H0
2. Fail to reject H0 (accept H0)
Reject the null hypothesis H0 if:
- Sample mean falls in the critical region
- Big discrepancy between sample and H0 (Unlikely to occur if H0 is true)
-Demonstrate treatment effect
Fail to reject the null hypothesis if:
- Sample mean does not fall in the critical region
-Data is reasonably close to H0
-Treatment effect not demonstrated
Note: We do not ever talk about proving the alternative
Step Four: Hypothesis Testing
Purpose: To determine whether the result of the research study (the obtained difference) is more than would be expected by CHANCE alone
Example:
• Z-score statistic that is used in hypothesis testing
A test statistic is a statistic (a quantity derived from the sample) used in statistical hypothesis testing.[1] A hypothesis test is typically specified in terms of a test statistic, considered as a numerical summary of a data-set that reduces the data to one value that can be used to perform the hypothesis test. In general, a test statistic is selected or defined in such a way as to quantify, within observed data, behaviours that would distinguish the null from the alternative hypothesis, where such an alternative is prescribed, or that would characterize the null hypothesis if there is no explicitly stated alternative hypothesis.
An important property of a test statistic is that its sampling distribution under the null hypothesis must be calculable, either exactly or approximately, which allows p-values to be calculated. A test statistic shares some of the same qualities of a descriptive statistic, and many statistics can be used as both test statistics and descriptive statistics. However, a test statistic is specifically intended for use in statistical testing, whereas the main quality of a descriptive statistic is that it is easily interpretable. Some informative descriptive statistics, such as the sample range, do not make good test statistics since it is difficult to determine their sampling distribution.
Two widely used test statistics are the t-statistic and the F-test.
Test Statistic-
Hypothesis Testing
Hypotheses testing Can be completed for two types of errors:
Type l
Type ll
Type 1: Rejecting the null hypothesis when it is
actually true. That is…
•Treatment effect found when effect does not exist
Consequences of making this error
•False reports in the scientific literature
P (type I error) = (a) chosen by the experimenter)
Type I error
Failing to reject the null hypothesis
when it is actually false. That is..
- Treatment effect exists but it is not detected
P (type II error) = b(beta) – cannot be simply determined.
Type II error can’t be controlled, and it is determined by many factors.
Type II error
Hypothesis Testing
shortcoming as an inferential statistic:
•The computation requires knowing the population standard deviation
We use t-statistic, rather than Z-score, for hypothesis testing
Z-scores- a t-test
____ with a sufficient sample from a population, and independent observations, we can test a hypothesis.
Used to COMPARE two MEANS
-tells you whether or not they are statistically different.
Completed using independent observations..
•occurrence of the first event has no effect on the probability of the second event
• Usually satisfied by using random samples
Use when you have TWO groups (e.g., treatment vs. control)
Teaching strategy A produces no difference in standardized test scores when compared to standard teaching strategy B
Use this rather than a Z score or hypothesis testing
What if our samples were not independent?
T-test
Study hint (T = “Two Mean”)
They Eliminate the problem of individual differences between subjects.
•so, also called WITHIN-SUBJECT designs.
Greatly reduces the sample variance, which can be inflated due to differences between subjects that have nothing to do with treatment effects
Related-Samples Studies t-test Advantage
- Carryover effects: Subject’s response in the second treatment is altered by lingering aftereffects from the first treatment
- Progressive error: Subject’s performance changes consistently over time
Two ways to deal with potential problems:
1) Counterbalance the order of treatment presentation
2) If substantial contamination expected, use a different experimental design (i.e, independent- measures)
Related samples T – test -
Contaminating factors That can cause D to be statistically significant when there is actually no difference between the before and after conditions are:
1) The observations within each treatment condition must be independent
2) The population distribution of difference scores (D values) must be normal*
(This can be ignored if n is greater than or equal to 30)
Assumptions of the Related-Samples t-Test
Some cases, more appropriate to demonstrate skill acquisition than a traditional single-case subjects design
Multiple baseline or repeated measures?
In this case, a repeated measures design would probably have been more believable
- “Slavish devotion to design”
Resources would probably have been better spent on testing MORE participants, rather than testing same few participants many times over
•Establishes generality
• Effective?
• Calculate effect size
Repeated Measure
Related samples t-test and ABA
If three groups instead of two:
-Example: The effects on language acquisition of therapy A, therapy B, and therapy C. Could compare A vs. B, B vs. C, A vs. C by using three T-tests
This type of Analysis tells whether or not there is a significant difference between THREE or more groups: (A= B = C)
Follow-up MCP MCP the difference is located
-Example: Therapy A is better than B and C
Analysis of Variance (ANOVA)
Depends on the research question
Can use ABA in group design
Can use hypothesis testing for single-case data with Caution! May promote:
- Non-independence - Trends may be masked if we just focus on means - Single-case effect sizes may tell a better story, and be better received among ABA audience than hypothesis testing
Can use ABA In a group design if we test early ABA intervention on socially important outcomes in children with autism.
Other tests:
•Factorial Analysis of Variance where there are More than two groups and more than one factor, MANOVA, And repeated measures ANOVA
Are T-Tests and ANOVAs appropriate for me
-ABA and Hypothesis Testing
A statistical technique used to measure and describe a relationship between two variables
Tests relationships between quantitative variables or categorical variables.
- a measure of how things are related.
The study of how variables are correlated
Correlation
Type of data/question influences what you can do:
Ex.If I ask a question, “Does smoking shorten your life? Can you do a T-test for this? I can look at how long you smoked and how many years lived, and draw a “CORRELATION”
Correlations are useful because if you can find out what relationship variables have, you can make predictions about future behavior.
Knowing what the future holds is very important in the social sciences like government and healthcare. Businesses also use these statistics for budgets and business plans.
Why Correlations?
Three characteristics of the relationship between X and Y:
- Direction (positive or negative)
• Positive correlation is When X and Y change together moving in the same direction:
- X increases Y also increases.
- X decreases Y also decreases• Negative correlation is when X and Y change inversely:
- X increases Y decreases - When X decreases Y increases
- Form of relationship which is LINEAR
- Degree (strength) of the relationship.
+/- 1.00 is Perfect correlation (straight line) Perfectly consistent, predictable relations.
0.00 has no relationship between X and Y. Note: The correlation coefficient (Pearson Correlation [r]) will always be between –1.00 and + 1.00
Examples Head size and memory Height and shoe size Anxiety and athletic performance Red cars and speeding tickets IQ and social skills GRE and college GPA Class attendance and grades
.
What Does Correlation Measure?
You CANNOT determine cause- and-effect relations from correlation
A correlation coefficient is a way to put a VALUE to the relationship. Correlation coefficients have a value of between -1 and 1. A “0” means there is no relationship between the variables at all, while -1 or 1 means that there is a perfect negative or positive correlation (negative or positive correlation here refers to the type of graph the relationship will produce).
Types
The most common correlation coefficient is the Pearson Correlation Coefficient. It’s used to test for LINEAR relationships between data.
Understanding and Interpreting Correlations
Describes linear relationship between two or more variables
Linear _____. equation Mostly a prediction formula
-Builds upon correlations to make predictions
Ex’s. Two Variables: New test for high school seniors and First Year college GPA
Are these variables related?
We Can calculate a correlation easily
From graph : Correlation = .89
Very strong positive correlation, but now what?
? Effect size?
Regression- Correlation
A measure of strength of a phenomenon
Isn’t that covered in significance testing? No, significance testing only informs us of:
- the probability of obtaining the results that were obtained in the study, given that the null hypothesis is true” (p.167)
Examples:
P < 0.05
P < 0.01
Think about an ABAB reversal design
Measures:
r/R in correlations and regressions
Cohen’s d
There are other’s but it’s not important
Effect Size
Interpretation, ( in many fields using Cohen’s d..)
- Small- 2 to .3, only visible through careful study
- Medium - .5
- Large – .8
- Larger, easy to identify
But size should be interpreted based on your subject and question
For reassuring scenario use the correct tool (T-test) With all of the parameters met, significance at .01 and an effect size of .89
Effect Size
Makes it relatively easy to calculate and interpret for group data
Can be used to summarize data from many studies with different dependent variables
Not dependent on sample size
Drawbacks
Cannot be applied to behavioral data because of:
-Dependent observations
-Autocorrelation
The. studies can be too different.
Effect Size - Benefits
Best way to increase your chances of significance is increasing number of participants
A large number of variables that will have a very small Effects become important
Limits reasons for doing experiments .. only applies if we working in the hypothetical Deductive model
Reduce scientific responsibility
Emphasize population parameter is at the expense of behavior
The probability is a conditional probability event under the True NULL- hypothesis
- Very few situations in which only randomness in data - Best way to increase chances of significance is increasing number of participants
Inferential statistics - reason NOT to use in ABA
Behavior is something in individual does, not what a group average does.
We should be attending to:
• value/social significance
• Durability of changes
• number and characteristics of participants that improve in a socially significant manner
Inferential statistics in ABA - Reason not to use
Can be measured to identify variability and behavioral data (Autocorrelation can be used for this to)
Can be described to identify in Porten variability in IOA Data
Standard deviation
Goal -
To find a single value that best Represents the Entire distribution: Median and mode
Mean
Example. How many cups of coffee per day.
May find the number one value is 2 cups. Even though mean and median might be different.
Discrete variable- what is most typical score
- No info on effect size
- Limits reasons for experiments
- multiple subjects needed
Minimize social significance
Minimizes significance of the individual
Hypothesis
- refers to categories
Ex. School districts
Nominal (name)
Quantities that have an order
Ex. Physical fitness and pain scale
(Not a lot you can do with these two types of data)
Ordinal (order)
Difference between each value is Even
Ex. Degrees Fahrenheit
- Interval
When the difference between each value is even, has a true Zero
Ex. Time, weight, temperature in kelvin
- Ratio
The effect was based on the treatment versus random things in the environment.
REAL Patterns in data- hypothesis testing
Wasn’t affecting all of the kids at the same time
RANDOM Patterns in data – hypothesis testing
Variables that can only take on a finite number of values are called “discrete variables.”
All Qualitative variables are discrete.
Some quantitative variables are discrete, such as performance rated as 1,2,3,4, or 5, or temperature rounded to the nearest degree.
Discrete Variable
A distribution of statistics obtained by selecting ALL possible samples of a specific size from population
Sampling Distribution
We can use this to answer PROBABILITY questions about the sample mean
the Distribution of Sample mean
We define “high” and “low” probability samples by selecting……..
The probability value that is used to define the very UNLIKELY sample outcomes if the null hypothesis is TRUE
Commonly used …….. levels are .05, .01, 001
Defines the critical region:
•Critical region: The extreme sample values that are very UNlLIKELY to be obtained if the null hypothesis is true
Use the alpha level of the unit normal table to find the critical region
Example:
(alpha) = .05
We are 95% sure that we are not making a decision error when we reject the null hypothesis
Before you run any statistical test, you must first determine your alpha level, which is also called the “significance level.” By definition, the (a) alpha level is the probability of rejecting the null hypothesis when the null hypothesis is true
For example, a significance level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference.
The alpha level (a)
AKA level of significance
Alpha Level
In inferential statistics, is a general statement or default position that there is no relationship between two measured phenomena, or no association among groups
(in a statistical test) the hypothesis that there is no significant difference between specified populations, any observed difference being due to sampling or experimental error.
Null hypothesis
With a sufficient SAMPLE from a population, and independent observations we can Test a
HYPOTHESIS
Can be distorted in two ways:
- Restricted range
- Outliers
Correlation
In statistics, an effect size is a quantitative measure of the magnitude of a phenomenon.
(1] Examples of effect sizes
- The correlation between two variables,
- regression coefficient in a regression
- The mean difference,
- Even the risk with which something happens, such as how many people survive after a heart attack for every one person that does not survive.
For most types of effect size, a larger absolute value always indicates a stronger effect, with the main exception being if the effect size is an odds ratio.
Effect sizes complement statistical hypothesis testing, and play an important role in power analyses, sample size planning, and in meta-analyses. They are the first item (magnitude) in the MAGIC criteria for evaluating the strength of a statistical claim. Especially in meta-analysis, where the purpose is to combine multiple effect sizes, the standard error (S.E.) of the effect size is of critical importance.
The S.E. of the effect size is used to weigh effect sizes when combining studies, so that large studies are considered more important than small studies in the analysis. The S.E. of the effect size is calculated differently for each type of effect size, but generally only requires knowing the study’s sample size (N), or the number of observations in each group (n’s).
Effect Size
A type of inferential statistic used to determine if there is a significant difference between the MEANS of two groups, which may be RELATED in certain features.
It is mostly used when the data sets, like the data set recorded as the outcome from flipping a coin 100 times, would follow a normal distribution and may have unknown variances.
A t-test is used as a hypothesis testing tool, which allows testing of an assumption applicable to a population.
T-test
A t-test is a type of inferential statistic used to determine if there is a significant difference between the means of two groups, which may be related in certain features.
The t-test is one of many tests used for the purpose of hypothesis testing in statistics.
Calculating a t-test requires three key data values.
- the difference between the mean values from each data set (called the mean difference),
- the standard deviation of each group
- the number of data values of each group.
There are several different types of t-test that can be performed depending on the data and type of analysis required.
KEY TAKEAWAYS
Z-tests are appropriate for comparing means under stringent conditions regarding normality and a known standard deviation.
A t-test is appropriate for comparing means under relaxed conditions (less is assumed).
Test statistics - two types of tests.
Why is probability relevant to inferential statistics? Statistics are, in one sense, all about probabilities. Inferential statistics deal with establishing whether differences or associations exist between sets of data. The data comes from the sample we use, and the sample is taken from a population.
So we need to think about whether the sample represents the population from which it has been taken. The larger the sample we take the greater the probability that it is representative of the population. If we took the whole population for our study the probability would = 1 since the sample = the population.
A sample smaller than the whole population means that we cannot guarantee that it is similar to the population. There is a probability that it is not. We want to keep this probability of sampling error as small as possible, so researchers often set a limit of probability (p) of a sampling error at no more than 0.05. Some studies might be more stringent and set the chance of a sampling error at 0.01. And in very important studies where you want to be reasonably certain there is little chance of error - say, testing new drugs, some researchers may even use a probability of error being very small indeed at 0.001, saying that the chance of an error is one in a thousand.
Type 1 and Type ll Errors
Probability and inferential statistics
Say we want to see if a group of patients, who have been given a new drug, have recovered more quickly than a group of patients who received the standard drug. We can use a statistical test to see if there is a difference. Whatever test we use we need to remember that the data we are analysing comes from groups that originally started off as similar to one another. If this were not the case we could not tell if the new drug had made the difference.
So if we find a difference, it might be due to the trial, but there is a possibility that it is due to sampling error. Another way of thinking about sampling errors is that it is the error that gives rise to the difference between the sets of data. If the error were not present then there would not be a difference. This type of sampling error, (known as a type 1 error ) says that a difference is found when no difference exists. It is one of the reasons why researchers publish the results of their research. This then enables other researchers to repeat the study to see if they find similar results. If the results were originally due to an error (which has a small chance of happening, ie less than 1 in 20, or 0.05) then repeating the study may not be able to reproduce the result.
Type one error – Probability Associated with inferential statistics
Type 2 errors
There is the possibility of a type of error, known as a type 2 error. Such possibilities have a probability of occurrence. They arise when it is reasonable to expect a difference and you find that the sampling has resulted in no difference being found.
Think about a drug trial, and in this instance think about the possibility that people taking the new drug will each react differently to the drug. Not everybody will respond in exactly the same way to the drug. Some will show a big improvement and for some it will be very minor, if any, improvement. So there is a probability that the trial group is unrepresentative if the sample that forms this group includes folk who do not respond to the new drug. In the end, the type 2 error means we find no difference when one should be found.
The probability of such an event can be determined. Researchers usually set the probability in this case at 0.2. That is, a one in five chance of a type 2 error.
Type ll error - probability associated with inferential statistics
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables (or ‘predictors’). More specifically, regression analysis helps one understand how the typical value of the dependent variable (or ‘criterion variable’) changes when any one of the independent variables is varied, while the other independent variables are held fixed.
Most commonly, regression analysis estimates the conditional expectation of the dependent variable given the independent variables – that is, the average value of the dependent variable when the independent variables are fixed. Less commonly, the focus is on a quantile, or other location parameter of the conditional distribution of the dependent variable given the independent variables. In all cases, a function of the independent variables called the regression function is to be estimated. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the prediction of the regression function using a probability distribution. A related but distinct approach is Necessary Condition Analysis[1] (NCA), which estimates the maximum (rather than average) value of the dependent variable for a given value of the independent variable (ceiling line rather than central line) in order to identify what value of the independent variable is necessary but not sufficient for a given value of the dependent variable.
Regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning. Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships. In restricted circumstances, regression analysis can be used to infer causal relationships between the independent and dependent variables. However this can lead to illusions or false relationships, so caution is advisable.
Many techniques for carrying out regression analysis have been developed. Familiar methods such as linear regression and ordinary least squares regression are parametric, in that the regression function is defined in terms of a finite number of unknown parameters that are estimated from the data. Nonparametric regression refers to techniques that allow the regression function to lie in a specified set of functions, which may be infinite-dimensional.
Regression- statistics
Your caloric intake and your weight.
Your eye color and your relatives’ eye colors.
The amount of time your study and your GPA.
examples HIGH correlation
Your sexual preference and the type of cereal you eat.
A dog’s name and the type of dog biscuit they prefer.
The cost of a car wash and how long it takes to buy a soda inside the station.
examples LOW correlation (or none at all):
Characteristics:
- Involves no manipulation or control - Requires two scores for each individual (X and Y) - Presented graphically in a scatter plot
Correlation
In statistical modeling, analysis is a set of statistical processes for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables (or ‘predictors’). More specifically, regression analysis helps one understand how the typical value of the dependent variable (or ‘criterion variable’) changes when any one of the independent variables is varied, while the other independent variables are held fixed.
Regression