Statistics Flashcards
Also called a categorical variable. Simple classification. We do not need to count to distinguish one item from another, mutually exclusive.
Nominal
The only discrete-only in scales of measurement.
Nominal
The only continuous alone on scales of measurement or have 0.5 as the smallest unit.
Ordinal
Cases are ranked or ordered. Represent position in a group where the order matters but not the difference between values.
Ordinal
It uses intervals equal in amount measurement where the difference between two values is meaningful.
Interval
Similar to interval but includes a true zero point and relative proportions on the scale make sense.
Ratio
Which among the scales of measurement are parametric and non parametric?
P- Interval & ratio
NP- Nominal & Ordinal
What are the 4 scales of measurement?
Nominal
Ordinal
Interval
Ratio
Refers to the analysis of data of an entire population merely using numbers to describe a known data set.
Descriptive Statistics
Value in a group of values which is the most typical for the group, or the score at which all the scores are evenly clustered around. The average or midmost score.
Measures of Central Tendency
What are the measures of central tendency?
Mean
Median
Mode
The average/arithmetic mean. Sum of a set of measurements in the set. Data is interval only.
Mean
Central value of a set value such that the half the observations fall above it and half below it. The middle score in the distribution. Use ordinal and interval data.
Median
Modal value of a set. Most frequently occurring value. For grouped data, it is the midpoint of the class interval with the largest frequency, uses nominal, ordinal and interval data.
Mode
Measures of how much or how little the rest of the values tend to vary around the central or typical value. Variation or error.
Measures of variability/Dispersion
What are the measures of variability/dispersion?
Standard deviation
Variance
Range
What level of data does all measures of variability/dispersion use?
Interval (some books include ratio)
Square root of variance. Shows the distribution of measurement.
Standard deviation
(Sd)²
Variance
Simplest measure of variation. Difference between the largest and smallest measurement.
Range
Used to describe the position of a particular observation in relation to the rest of the data set.
Measures of Location
In measures of location, The pth percentile of a data set is a value such that at least percent of the observation take on this value or less and at least _ percent of the observations take on this value or more.
100-p
What are the measures of location?
Percentiles
Quartiles
Deciles
Frequency Distribution
Percentage of the total number of observations that are less than the given value. Identifies the point below which a specific percentage of the cases fall.
Percentiles
The data can be divided into 4 parts instead of two. This is what you call the cut points.
Quartiles
The data can be divided into 10 parts instead of two or four. This is what you call the cut points.
Deciles
A classification of data that may help in understanding important features of the data may be graphically presented in the form of a histogram, polygon, etc.
Frequency Distribution
This measure of location represents the same 2 elements:
Set of categories that make up the original measurement scale.
A record of the frequency, or number of individuals in each category.
Frequency Distribution
All measures of location use ordinal, interval, and ratio level of data except _ which uses all levels of data.
Frequency Distribution
Measurement of the extent to which pairs of related values on 2 variables tend to change together; gives measure of the extent to which one variable can be predicted from values on the other variable.
Measures of correlation.
If one variable increases with the other, the correlation is positive (near _). If the relationship is inverse, it is a negative correlation (near _). A lack of correlation is signified by a value close to _.
+1
-1
0
What are the measures of correlation?
Pearson’s Product moment correlation
Spearman’s Rho Rank-order
Kendall’s Coefficient of Concordance W
Point-Biserial Coefficient rpb
Phi or Fourfold Coefficient
Lambda
A measure of correlation for 2 groups, using interval level of data. Data must be in the form of related pairs of scores. The higher the r , the higher the correlation.
Pearson’s Product Moment Correlation (r)
A measure of correlation for 2 groups, using the ordinal level of data. Data must be in the form of related pairs of scores and is used for ≤ 3. Easy to calculate but non parametric.
Spearman’s Rho Rank-order
A measure of correlation for ≥ 3 groups, using the ordinal level of data. Data must be ≥ 3 sets of ranks. Easy to calculate but non parametric.
Kendall’s Coefficient Concordance W
A measure of correlation for 2 groups, using the continuous and dichotomous nominal level of data.
Point-Biserial Coefficient rpb
A measure of correlation for 2 groups, using 2 dichotomous nominal level of data.
Phi or Fourfold Coefficient
A measure of correlation for ≥ 2 groups, using nominal (dependent/independent) ) levels of data. It is also known as Guttman’s Coefficient of predictability. Gives an indication of the reduction of errors made in a prediction scheme.
Lambda
A non parametric measure of the agreement between two rankings.
Tau Coefficient
Tests for statistical dependence.
Kendall’s Tau Coefficient
An index of interrater reliability of ordinal data.
Coefficient of Concordance (W)
Measurement of the extent to which pairs of related values on 2 variables tend to change together; gives measure of the extent to which values on one variable can be predicted from the values on the other variable.
Inferential statistics
What are the inferential statistics tests?
Z-test of one sample mean
T-test
Variation of t-test
Independent samples
Dependent samples
Proportions/Percentages
Variances
2 correlation coefficients
What level of data do all tests for inferential statistics use?
Interval
A measure for inferential statistics for 1 group.
N ≥30 used to test whether a population parameter is significantly different from some hypothesized value.
Z-test of one sample mean
A measure for inferential statistics when n< 30
T-test
This kind of t-test is for 2 groups. It assesses whether the means of 2 groups are statistically different from each other.
Independent samples
This kind of t-test is for 1 group. It is used when the subjects making up the 2 samples are matched on some variable before being put in the 2 groups or the situation where the 2 groups are the same subjects administered a pretest and post test.
Dependent samples
This kind of t-test is for 1 group. It is used to test the hypothesis that an observed proportion is equal to a pre-specified proportion.
Proportions/Percentages
This kind of t-test uses the F test for equal and unequal.
Variances
This kind of t-test is for 2 groups. It is used to assess the significance of the difference between two correlation coefficients found in 2 independent samples.
2 correlation coefficients
It is used for problems of predicting one variable from a knowledge of another or possibly several other variables. It is always the regression of the predicted value on the known variable.
Regression Equation
What are the regression equations?
Linear regression of y on x
Linear regression of x on y
Standard error of estimate (SEE)
Standard deviation of errors of prediction. An indication of the variability about the regression line in the population wherein predictions are being made.
Standard error of estimate (SEE)
Among ANOVA and t-tests, which organizes and directs analysis and has easier interpretation of the results?
ANOVA
Performing repeated t-tests increases the probability of _?
Type I error
ANOVA needs to be followed by what test?
Post hoc test
What does the post hoc test determine?
Which group differs from each other.
We should not conduct a post hoc test unless the null is ?
Rejected.
A test designed for a situation with equal sample size per group, but can be adapted to unequal sample sizes as well.
Tukey’s (Honestly Significant Difference or HSD) Test
Descriptive measure of the utility of the regression equation for making predictions square of the linear correlation coefficient.
Coefficient of Determination
In determining the coefficient of determination, the nearer the value is to _, the useful is the regression equation in making predictions.
1
Used to test the significance of the differences among means obtained from independent samples (parametric tests) significance where >2 conditions are used, or even when several independent variables are involved.
Analysis of Variance (ANOVA)
What are the types of ANOVA?
One-Way ANOVA
Two-Way ANOVA
Three-Way ANOVA
Tests used if 2 or more samples were drawn from the same population by comparing means or if data from several groups have a common mean. There’s 1 IV and 1 DV and an interval level of data.
One-Way ANOVA
It tests the hypothesis that the means of 2 variables (factors) from 2 or more groups (2 IV, 1 DV) are equal (drawn from population with the same mean)
Two-way ANOVA
It has a similar purpose with the different kinds of ANOVA, except that the groups here have 3 categories of defining characteristics. It must have 3 IV and 1 DV
Three-Way ANOVA
It corrects alpha not just for all pair-wise or simple comparisons of means, but also for complex comparisons (contrast of more than 2 at a time) of means.
Scheffe’s Test
The most popular of the post hoc procedures, most flexible and most conservative but has least statistically powerful procedure.
Scheffe’s Test
Versatile formula data must be presented in frequencies. It is categorized under non parametric test but can also be used as a parametric test data
Chi-square
The 2 chi-square tests
Goodness of Fit
Independence
Also called one-sample or one-variable chi-square. Involves 1 variable of ≥ 2 categories. It compares the distribution of measures for deviation from a hypothesized distribution. Nominal level of data.
Goodness of Fit
A chi-square test that involves 2 variables consisting of ≥ 2 categories. It determines whether the 2 variables are related or not. Reveals only the relationship but not the magnitude of the relationship.
Independence
Parametric Test or Non-Parametric test?
Random selection of subjects from a normal population with equal variances.
Parametric Tests
Parametric Test or Non-Parametric test?
Whether the groups or samples to be compared are independent samples or correlated.
Both
Parametric Test or Non-Parametric test?
Whether the number of groups to be compared is ≥ 2.
Non-parametric Test
Parametric Test or Non-Parametric test?
More power, higher power efficiency.
Parametric Test
Parametric Test or Non-Parametric test?
Simple and easier to calculate.
Non-parametric Test
Parametric Test or Non-Parametric test?
No need to meet data requirements at all.
Non-parametric Test
What are some non parametric tests?
Median test
Father’s Sign Test
Wilcoxon Rank Sum Test
Mann-Whitney (U) Test
Wilcoxon Signed Ranks Test (T)
Kruskal-Wallis H Test
Friedman Rank
Non parametric test that compares the medians of 2 independent sample (uncorrelated). Only considers the number of cases above and below the median. Presented in an ordinal data.
Median Test
Non parametric test that compares the 2 correlated samples by obtaining the differences between each pair of observation. Consider the signs of the differences between paired observations in their sizes.
Fisher’s Sign Test
What level of data do non parametric tests have?
Ordinal
Non parametric test that is used for comparing 2 independent samples using rank data.
Wilcoxon Rank Sum Test
Non parametric test that is used with independently drawn random samples, the sizes of which need not to be the same.
Mann-Whitney (U) Test
Non parametric test that is used for correlated samples, the difference, d, between each pair is calculated (data subjected to computation)
Wilcoxon Signed Ranks Test (T)
Non parametric test that is used to test whether or not a group of independence samples is from the same or different population. Compares 3 or more independent samples with respect to an ordinal variable.
Kruskal-Wallis H Test
Non parametric test that is used to test whether or not the data is from the sample under 3 different conditions.
Friedman Rank
The act of assigning numbers or symbols to characteristics of things according to rules.
Measurement
A set of numbers or symbols whose properties model empirical properties of the objects to which the numbers are assigned.
Scale
Permits classification. Rank ordering on some characteristic. Have no absolute zero point.
Ordinal scales
A set of test scores arrayed for recording or study.
Distribution
A straightforward, unmodified accounting of performance that is usually numerical.
Raw score
All scores are listed alongside the number of times each score occurred.
Simple Frequency Distribution
Class intervals replace the actual test scores.
Grouped frequency distribution
A graph with vertical lines drawn at the true limits of each test score or class interval forming a series of contiguous rectangles.
Histogram
Expressed by continuous line connecting the points where the test scores or class intervals (X axis) meet frequencies (Y axis).
Frequency Polygon
If the distribution is normal, the most appropriate measure of central tendency for mean is what data?
Interval or ratio
There are two scores that occur with the highest frequency. It is theoretically possible for this distribution to have two modes with falls at the high or low end of the distribution.
Bimodal distribution
An indication of how scores in a distribution are scattered or dispersed.
Variability
Statistics that describe the amount of variation in a distribution.
Measures of variability
It is a measure of variability equal to the difference between Q3 and Q1.
Interquatile ranges
It is equal to the interquartile range divided by 2.
Semi-interquatile range.
Quarter refers to an _.
Interval
The dividing points between the 4 quarters in the distribution. It refers to a specific point.
Quartiles
Q2 and the _ are exactly the same.
Median
In a perfectly symmetrical distribution, Q1 and Q3 will be exactly the same distance from the _.
Median
To obtain this, all the deviation scores are summed and divided by the total number of scores.
Average deviation
A measure of variability equal to the square root of the variance.
Standard deviation
It is equal to the arithmetic mean of the squares or the differences between the scores in a distribution and their mean.
Variance
It is the nature and extent to which symmetry is absent.
Skewness
Relatively few of the scores fall at the high end of the distribution.
Positive skew
Relatively few of the scores fall at the low end of the distribution.
Negative skew
Refers to the steepness of a distribution in its center.
Kurtosis
What are the 3 general types of curves and what does it mean?
Platykuric- relatively flat
Leptokurtic- relatively peaked
Mesokurtic- somewhere in the middle
Distribution that have _ kurtosis have a high peak and fatter tails compared to a normal distribution.
High
Distribution that have _ kurtosis values indicate a distribution with a rounded peak and thinner tails.
Lower
The development of the concept of a normal curve began in the middle of the 18th century with the work of _ and later the Marquis de Laplace.
Abraham DeMoivre
Through the early 19th century, scientists referred to the Normal curve as the _.
Laplace-Gaussian curve
He was credited with being the first to refer to the curve as the normal curve.
Karl Pearson
The distribution of the normal curve ranges from _ to _.
Negative infinity to positive infinity.
The normal curve is perfectly symmetrical with _ skewness.
No skewness
A raw score that has been converted from one scale to another scale where the latter scale has some arbitrarily set mean and standard deviation.
Standard scores
These scores are more easily interpretable than raw scores.
Standard scores
It results from the conversion of a raw score into a number indicating how many standard deviation units the raw score is below or above the mean of the distribution.
Z scores or zero plus or minus one scale
T score is devised by W.A Mccall and named it in honor of his professor _.
E. L. Thorndike
This standard scores system is composed of a scale that ranges from 5 standard deviations below the mean to 5 standard deviations above the mean. Mean=50; Std.= 10
T scores or fifty plus or minus ten scale
A standard score which has a mean of 5 and std of 2.
Stanine
The 5th stanine indicates performance in the _.
Average range
It is an expression of the degree and direction of correspondence between two things.
Correlation
A number that provides us with the index of the strength of the relationship between two things.
Coefficient of Correlation (r)
The meaning of correlation coefficient is interpreted by its _ and _.
Sign and magnitude
The two ways to describe a perfect correlation between two variables are as either _ of _.
+1
-1
Magnitude is a number anywhere between _ and _.
+1
-1
He devised the Pearson r .
Karl Pearson
Can be the statistical tool of choice when the relationship between the variables is linear and when the two variables being correlated are continuous.
Pearson r/ Pearson Correlation Coefficient/ Pearson Product-moment Coefficient of Correlation
The Spearman Rho was developed by _.
Charles Spearman
A measure of correlation that is frequently used when the sample size is small (fewer than 30 pairs of measurement) and when both sets of measurements are in ordinal form.
Spearman Rho
A simple graphic of the coordinate points for the values of the X-variable (horizontal axis) and the Y-variable (vertical axis). They are useful because they provide a quick indication of the direction and magnitude of the relationship between the 2 variables and also reveals the presence of curvilinearity.
Scatterplot
An eyeball gauge of how curved a graph is.
Curvilinearity
An extremely atypical point located at a relatively long distance from the rest of the coordinate points in a scatter plot.
Outlier
A statistic useful in describing sources of test score variability.
Variance
Variance from true differences.
True variance
Variance from irrelevant random sources.
Error variance
The greater the proportion of the total variance to true variance, the more _ the test.
Reliable
Refers to all the factors associated with the process of measuring some variable other than the variable being measured.
Measurement error
A source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process.
Random Error
A source of error in measuring a variable that is typically constant or proportionate to what is presumed to be the true value of the variable being measured.
Systematic error
Give a source of variance during test construction.
Item sampling/Content sampling
A validity coefficient that is used when correlating self rankings of performance.
Spearman Rho Rank-order Correlation