final Flashcards
WHAT IS STATISTICS?
It is a set of tools used in order to describe, organize, summarize , interpret data, draw conclusions & relate one data set to another. i.e school scores, level of stress. Statistics help us understand the world around us.
What is descriptive statistics
Tools used to organize and describe characteristics of a collection of data.
What is Inferential statistics?
Next step after descriptive tools to infer data findings from a smaller group/sample to a larger group
What is an average?
It is the one value that best reprents (best value of) an entire group of scores
average = measures of central tendancies.
Define the mean
MOST USED type of average, MOST ACCURATELY reflects the population mean. very SENSITIVE TO EXTREME SCORES as these can pull the mean in one or the other direction & make it less representative of the set of scores and less useful
=TYPICAL, AVERAGE, MOST CENTRAL SCORE
formula to obtain the mean
The sum of all the values in a group, divided by the number of values in the group
What is the difference between statistics and parameters?
PARAMETERS describe POPULATION;
i.e.average height of all WSU students
STATISTICS describe SAMPLES
Ex:average height of the students in our sample
what types of sampling method do we have?
BIASED sample; just ask your friends
RANDOM sampling: everyone in the group has equal chance of being selected
Why is random sampling the best method?
- maximizes chances to have a sample that is BEST REPRESENTATIVE of population
- representative sample, allow us to GENERALIZE OUR RESULTS much easier
Define variable
condition/characteristics that can have different values
Define value
possible number or category a score can have
Define score
A particular person’s value on a variable
difference between mean and median
The mean is the MIDDLE POINT OF A SET OF VALUES and the median is the MIDDLE POINT OF A SET OF CASES, as it cares about how many cases and not the values of those cases.
define median
defined as the MIDPOINT in a set of scores, where 50% of the scores fall ABOVE OR BELOW IT. It is the MIDDLE MOST VALUE. When there is an even number of values the median is the mean of the two middle values.
define mode
- Most GENERAL AND LEAST PRECISE
- helps understand the characteristics of a set of scores.
- value that OCCURS MOST FREQUENTLY
what is an extreme score?
know as outliers
Scores that do not “look like” the rest of the data/observations
Are “very different” from the group to which they belong (high or low)
Known as “outliers” (Can be bigger or smaller)
PULL the value of the mean ineither direction & makes it less valuable to know
characteristics of median
-Cares about how many data points, not the value of each of the data points.
-Insensitive to extreme scores (Outliers)
-Has a relationship with Percentile Points
“at the 50th %” - What does that mean?
You are the top half of the median
characteristics of mode
Possible to have no mode
Possible to have more than one mode
E.g. “bimodal distributions”
characteristics of mean
- “BALANCES” the numbers(Values on either side are equal in weight)
- Same “TOTAL DISTANCE”
- Does not have to be a number in a set
When to use what?
mean is more precise than the median & the median more precise than the mode. WITH ALL THINGS BEING EQUAL USE MEAN
•Use mode for categorical data
•Use median when you have extreme scores
•Use mean when you have data that isn’t categorical and do not have extreme scores.
Define variability
Provide the FULL PICTURE as it reflects HOW SCORES DIFFER FROM ONE ANOTHER, more precisely FROM THE MEAN, since the mean is the best representation of the average of a set of scores.
What are the measures of variability?
Three measures Range, standard deviation and Variance.
Define the range
MOST GENERAL measure of variability, tells how far apart scores from 1 another
- subtract the lowest score from the highest score R=h-l
- not to be used as a conclusion, but as a part of a process
Define the standard deviation
MOST COMMONLY used it represents the AVERAGE AMOUNT OF VARIABILITY in a set of scores, it’s the average distance from the mean.
Can we add the sum of deviation from the mean?
-no because they always =0
Why do we square the sum of deviation?
- to get rid of the negative sign
- to check validity of answer
Why do we remove the square root?
-to return to the same units we started with.
Why do we divide by n-1 instead of n
because the SD is an estimate of the population standard which is unbiased. We do this TO FORCE THE SD TO BE ARTIFICIALLY LARGER THAN IT WOULD OTHERWISE BE.
All other things being equal the larger the size of the population the lesser the difference between the biased and unbiased estimates of SD. The closer to the size of the population the sample is, the more accurate the estimate will be.
What if S=0
there is no variability as the scores are essentially identical in value. Rare to find
What is the Variance?
SD squared, not commonly used by itself in research articles as it is difficult to interpret a square number. It is still important because it used as concept and as a practical measure of variability.
What is the differnce between SD and S?
They are both measure of variability, dispersion or spread, but the SD is expressed in original units and S is expressed in squared unit
What do we use both variability and central tendency?
Possible to have the same mean, but varying amounts of variability. Proof as to why we need to know and report BOTH
Whys is the range the most convenient measure of dispersion? when use it?
because you only to do a simple substraction. doesnt consider the values. USE WHEN YOU NEED A GROSS ESTIMATE
Why does the SD gets smaller as the individuals in a group score more similarly on a test?
as individuals score more simirlarly, they are closer to the mean, and the deviation from the mean is smaller, SD is smaller also
inclusive range
r = h – l + 1 (because one point will be outside the range)
Why n-1 as a denominator?
Overestimate the SD of the population
Graphs
Graphs help examine how DIFFERENCES in measures of CENTRAL TENDENCY and those of VARIABILITY can RESULT in different looking distributions. Graphs are a VISUAL REPRESENTATION of a distribution of scores.
Tips: a graph should communicate only one idea
What is a frequency distribution?
IT is a method of tallying and representing how often scores occur. A FD is grouped into class of intervals (range of numbers).
steps to create a frequency distribution
- Order data sequentially
- Look at the Range of Values
- Decide how many intervals you want
- Divide by number of intervals
- List intervals, largest to smallest.
- Start placing actual data points into the buckets.
- Calculate frequencies.
histogram
This is a VISUAL REPRESENTATION of the FD where frequencies are presented by bar
What do you need to create a histogram ?
i) Place values at equal distances on the x-axis, then identify their midpoint
ii) Draw a bar around each midpoint that represents the entire class interval to the height representing the frequency.
What is a polygon?
A polygon is a continuous line that represents the frequencies of scores within a class interval
What are cumulatives frequencies and do we create them?
It is a visual representation of the CUMULATIVE OF OCCURENCES by class intervals. It is created by adding the frequency in a class interval to all frequencies below it.
Frequency distributions differences
1) AVERAGE VALUE: the middle point in a distribution is only the average when the curve is a mirror image of itself
2) VARIABILITY
3) SKEWNESS: of lack of symmetry,1 tail of distribution is longer than another.
4) KURTOSIS: how flat peaked a distribution appears.
Types of Kurtosis
Two kinds
- PLATYKURTIC: distribution relatively FLAT compared to a normal bell-shaped distribution. They are more DISPERSE than those that are not.
- LEPTOKURTIC: distribution relatively PEAKED compared to a normal bell shaped distribution. They are LESS VARIABLE or disperse relative to others
Define correlations?
It is how the VALUE IN ONE variable CHANGES THE VALUE in another variable. It reflects the dynamic quality of the relationship between two variables.
correlations values
Value
-1 and +1
A correlation between two variables is called bivariate
Types of correlations
-Direct: or positive. When x increases in value y increases in value
When x decreases in value y decreases in value
-Indirect: or negative when x increases in value y decreases in value
When x decreases in value y increases in value
Absolute value
The absolute value REFLECTS THE STRENGHT of the correlation
correlation coefficient?
PEARSON PRODUCT MOMENT CORRELATION
(r) is any value between -1 &+1
Number that reflects the STRENGTH AND DIRECTION RELATIONSHIP between 2 variables
correlation matrix?
a tool for organizing BI VARIATE correlations between a set of variables
coefficient of determination?
PERCENTAGE OF VARIANCE IN one variable that is accounted for by the variance in the other variable
what happens when we square r?
we found out how much VARIABILITY in one variable can be accounted for in the other variable
advantage of stronger correlation
THE STRONGER, LARGER THE CORRELATION the more shared variance=the more INFORMATION about a PERFORMANCE on one score can be explained by the other score
are all correlations linear?
no they do not all have to be linear, because not all relationships are linear
correlation versus causation
a change in one does not result in the change in the other. ice cream consumption increase, crime rate increase, and same with decreasing, but the only thing they share is outside temperature.
What are the signs in correlation for?
Changes in different directions
what does a perfect relationship mean?
if you know the value of one, you know the value of the other
What is the correlation of alienation?
the amount of UNEXPLAINED VARIANCE between variables