STATISTICS Flashcards
Branch of Mathematics that focuses on the organization, analysis and interpretation of a group of numbers
Statistics
Used to summarize and describe a group of numbers from a research study
Descriptive statistics
Used to draw conclusion and to make references that are based on the numbers from a research study but that go beyond the numbers
Inferential statistics
The entire set of the individuals of interest for a particular research question
Population
Set of individuals selected from a population, usually intended to represent the population in a research study
Sample
Characteristics that can have different values
Variable
Internal characteristics that cannot directly observe
Construct
Possible number or category that a variable can have
Values
Particular persons value on a variable (datum)
Score
Collection of measurement or observation, complete set of scores
Data/data set
Value, usually a numerical value that describe a population
Parameters
Value, usually a numerical value, that a describe a sample
Statistics
Naturally occurring discrepancy that exists between a sample statistic and the corresponding population parameter
Sampling error
The values are names, categories and the score that is not numerical
Nominal scale
The number stand only for relative ranking (rank ordered variable) has magnitude but zero interval
Ordinal scale
Measures magnitude with equal interval between the values
Interval scale
Measures magnitude with equal interval between the values and has true or absolute zero
Ratio scale
One that has a specific values and cannot have values between the specific values
Discrete variable
An infinite number of values between any two values
Continuous variable
Naturally formed (male/female)
True dichotomous
Reflects an underlying continous scale forced into a dichotomy (passed/failed)
Artificial dichotomous
Income(low, middle, high)
Ordinal
Likert scale (stronly disagree to strongly agree)
Ordinal scale
Preferences brand
Ordinal scale
Degree of agreement
Ordinal scale
Iq scores
Interval scale
Calendar years
Interval scale
Time of day
Interval scale
Standardized test scores
Interval scale
Measuring an income as range (0-99;1000-1999)
Interval scale
Dates
Interval scale
Grade levels in a school (1st grader, 2nd grader, 3rd grader)
Interval scale
Measuring kilograms, miligrams, ounces, pounds
Ratio
Number of children in a family
Discrete (ratio)
Number of dogs that belong to one owner
Ratio (discrete)
Number of stores runned by the same owner
Ratio (discrete)
Human height
Ratio (continous)
Weight of a person (thin -55kg)
Ratio (continous)
Time spent on the task
Ratio (continous)
Temperature (all)
Ratio scale
An organized tabulation of the number of an individual located in each category on the scale of measurement
Frequency distribution
Ordered listing of number of individuals/ subjects/ respondent having each of the different values for a particular variable
Frequency table
Measures the fraction of the total group that is associated with each score
Proportion
Proportion formula
P=f/n
An amount of something, often expressed as a number out of 100
Percentage
Percentage formula
P(100) =f/n (100)
Range of values in a grouped frequency table that are grouped together
Interval
Barlike graph of a frequency distribution in which the values are plotted along the horizontal axis and the height of each bar is the frequency of the value; the bars are usually placed next to each other without spaces, giving the appearance of a city skyline
Histogram
Continuous line that represents the frequency of score within a class interval based on a histogram; used for continous data
Frequency polygon
A data visualization where each category is represented by a rectangle, with the height of the rectangle being proportional to the values being plotted
Column chart
Identical to column charts, but in this chart, categories are organized vertically on the y-axis and values are shown horizontally on the x-axis
Bar graph
A line plot or line chart a graph which uses lines to connect individual points that display quantitative values over a specified time interval
Line graph
A statistical measure that attempt to determine the single value, usually located in the center of a distribution, that is most typical or most representative of the entire set of scores
Central tendency
Sum of all the scores in the distribution and divided by the number scores
Mean
Mean formula
M= summation of ×/ n
An average in which each observation in the data set is assigned or multiplied by a weight before summing to a single average value
Weighted mean
Weighted mean formula
(Summation of) xw / (summation of) w
The middle score when all the scores in a distribution are arranged from lowest to highest
Median
Set of scores or category that has the greatest frequency
Mode
Frequency distribution with one value clearly having a larger frequency than any other, has only one point
Unimodal distribution
Frequency distribution with two approximately equal frequencies, each clearly larger than any others, two equal high point
Bimodal distribution
Distribution with two. Or more highpoints
Multimodal distribution
Frequency distribution in which all values have approximately the same frequency
Rectangular distribution
The pattern of frequencies on the left and right side are mirror images of each other
Symmetrical distribution
Lack of symmetry
Skewness
Majority of the scores are at the low end of the distribution
Floor effect- positively skewed
Majority of the scores are at the high end of the distribution
Ceiling effect - negatively skewed
Extent to which a frequency distribution deviates from a normal curves in terms of wether its curve in the middle is more peaked or flat than a normal curve
Kurtosis
Scores are concentrated towards the mean
Leptokurtic
Normal curve
Mesokurtic
The scores have an extremely large deviation from the mean
Platykurtic
Frequency table in which the number of individuals is given for each interval of values
Grouped frequency table
The best type of scale to use depends on:
- The nature of the variable
- How much measurements precision you desire
Two elements of frequency distribution
- Set of categories that make up the original measurement scale
- Record of frequency, or number of individuals in each category
Provides a quantitative measure of rhe difference between the scores in a distribution
Variability
It describes the degree to which the scores are spread out or clustered together
Variability
Measures of variability includes ___, ___, ____, and ___
Range
Variance
Standard deviation
Median absolute deviation
Distance covered by the scores in a distribution, from the lowest to the highest score
Range
Range formula
R= X max - X min
The average score’s squared difference from the mean
Variance
Variance formula
q2(population variance) = summation of (X- population mean) ² / N (no.values)
Use the mean of the distribution as a reference point and measures variability by considering the distance between each score and the mean
Standard deviation
Standard deviation formula
q (population variance) = √ (square) summation of (X- population mean ) ²/ N
Variations of the Formula
• population variance
• population standard deviation
q² = SS/N
q = √SS/N
Easier to use for figuring by hand, but it does not directly show the meaning of the procedure
Computational formula
It is time-consuming, but it is directly showing the meaning of the procedure
Definitional formula
It is consistently overestimate or underestimate the corresponding population parameter
Biased statistics
Formula for samples
• sum of squares for sample
• sample variance
• sample standard deviation
SS = summation of (x -M)²
s²= SS/ n-1
s= √ SS/ n-1
A robust measure of how spread the data is, if the assumptions of standard deviation were not met
Median absolute deviation
Median absolute deviation (MAD) formula
MAD = Mdn (|x - Mdn (x) |)
When to use the mean ?
Approximately normal distributed data
Continuous data
No significant outliers
When to use median?
With rank-ordered variable
Non-normal or skewed distribution
When a distribution has one or more outliers
It is the number of standard deviation that a score is above (or below, if it is negative) the mean of its distribution
Z score
A ___ is an ordinary score as opposed to a Z score
Raw score
The mean of Z scores is always eual to ___
0
The standard deviation of Z scores is always equal to __
1
It is a specific, mathematically defined, bell-shaped frequency distribution that is symmetrical and unimodal
The normal curve
The normal curve is also called the ___
Gaussian distribution
A ___ distribution is a frequency distribution that follows a normal curve
Normal
The gaussian distribution is derived from ____ however the original concept comes from
Karl Friedrich Gauss
Abraham De Moivre
____ refers to how spread out a data set is about the mean
Dispersion
What does statistics focus on ; (3)
Organization
Analysis
Interpretation