Statistics Flashcards by Jyllen Arambulo

Also called a categorical variable. Simple classification. We do not need to count to distinguish one item from another, mutually exclusive.

Nominal

How well did you know this?

Not at all

Perfectly

The only discrete-only in scales of measurement.

Nominal

How well did you know this?

Not at all

Perfectly

The only continuous alone on scales of measurement or have 0.5 as the smallest unit.

Ordinal

How well did you know this?

Not at all

Perfectly

Cases are ranked or ordered. Represent position in a group where the order matters but not the difference between values.

Ordinal

How well did you know this?

Not at all

Perfectly

It uses intervals equal in amount measurement where the difference between two values is meaningful.

Interval

How well did you know this?

Not at all

Perfectly

Similar to interval but includes a true zero point and relative proportions on the scale make sense.

Ratio

How well did you know this?

Not at all

Perfectly

Which among the scales of measurement are parametric and non parametric?

P- Interval & ratio
NP- Nominal & Ordinal

How well did you know this?

Not at all

Perfectly

What are the 4 scales of measurement?

Nominal
Ordinal
Interval
Ratio

How well did you know this?

Not at all

Perfectly

Refers to the analysis of data of an entire population merely using numbers to describe a known data set.

Descriptive Statistics

How well did you know this?

Not at all

Perfectly

Value in a group of values which is the most typical for the group, or the score at which all the scores are evenly clustered around. The average or midmost score.

Measures of Central Tendency

How well did you know this?

Not at all

Perfectly

What are the measures of central tendency?

Mean
Median
Mode

How well did you know this?

Not at all

Perfectly

The average/arithmetic mean. Sum of a set of measurements in the set. Data is interval only.

Mean

How well did you know this?

Not at all

Perfectly

Central value of a set value such that the half the observations fall above it and half below it. The middle score in the distribution. Use ordinal and interval data.

Median

How well did you know this?

Not at all

Perfectly

Modal value of a set. Most frequently occurring value. For grouped data, it is the midpoint of the class interval with the largest frequency, uses nominal, ordinal and interval data.

Mode

How well did you know this?

Not at all

Perfectly

Measures of how much or how little the rest of the values tend to vary around the central or typical value. Variation or error.

Measures of variability/Dispersion

How well did you know this?

Not at all

Perfectly

What are the measures of variability/dispersion?

Standard deviation
Variance
Range

How well did you know this?

Not at all

Perfectly

What level of data does all measures of variability/dispersion use?

Interval (some books include ratio)

How well did you know this?

Not at all

Perfectly

Square root of variance. Shows the distribution of measurement.

Standard deviation

How well did you know this?

Not at all

Perfectly

(Sd)²

Variance

How well did you know this?

Not at all

Perfectly

Simplest measure of variation. Difference between the largest and smallest measurement.

Range

How well did you know this?

Not at all

Perfectly

Used to describe the position of a particular observation in relation to the rest of the data set.

Measures of Location

How well did you know this?

Not at all

Perfectly

In measures of location, The pth percentile of a data set is a value such that at least percent of the observation take on this value or less and at least _ percent of the observations take on this value or more.

100-p

How well did you know this?

Not at all

Perfectly

What are the measures of location?

Percentiles
Quartiles
Deciles
Frequency Distribution

How well did you know this?

Not at all

Perfectly

Percentage of the total number of observations that are less than the given value. Identifies the point below which a specific percentage of the cases fall.

Percentiles

How well did you know this?

Not at all

Perfectly

The data can be divided into 4 parts instead of two. This is what you call the cut points.

Quartiles

The data can be divided into 10 parts instead of two or four. This is what you call the cut points.

Deciles

A classification of data that may help in understanding important features of the data may be graphically presented in the form of a histogram, polygon, etc.

Frequency Distribution

This measure of location represents the same 2 elements: Set of categories that make up the original measurement scale. A record of the frequency, or number of individuals in each category.

Frequency Distribution

All measures of location use ordinal, interval, and ratio level of data except _ which uses all levels of data.

Frequency Distribution

Measurement of the extent to which pairs of related values on 2 variables tend to change together; gives measure of the extent to which one variable can be predicted from values on the other variable.

Measures of correlation.

If one variable increases with the other, the correlation is positive (near _). If the relationship is inverse, it is a negative correlation (near _). A lack of correlation is signified by a value close to _.

+1 -1 0

What are the measures of correlation?

Pearson's Product moment correlation Spearman's Rho Rank-order Kendall's Coefficient of Concordance W Point-Biserial Coefficient rpb Phi or Fourfold Coefficient Lambda

A measure of correlation for 2 groups, using interval level of data. Data must be in the form of related pairs of scores. The higher the r , the higher the correlation.

Pearson's Product Moment Correlation (r)

A measure of correlation for 2 groups, using the ordinal level of data. Data must be in the form of related pairs of scores and is used for ≤ 3. Easy to calculate but non parametric.

Spearman's Rho Rank-order

A measure of correlation for ≥ 3 groups, using the ordinal level of data. Data must be ≥ 3 sets of ranks. Easy to calculate but non parametric.

Kendall's Coefficient Concordance W

A measure of correlation for 2 groups, using the continuous and dichotomous nominal level of data.

Point-Biserial Coefficient rpb

A measure of correlation for 2 groups, using 2 dichotomous nominal level of data.

Phi or Fourfold Coefficient

A measure of correlation for ≥ 2 groups, using nominal (dependent/independent) ) levels of data. It is also known as Guttman's Coefficient of predictability. Gives an indication of the reduction of errors made in a prediction scheme.

Lambda

A non parametric measure of the agreement between two rankings.

Tau Coefficient

Tests for statistical dependence.

Kendall's Tau Coefficient

An index of interrater reliability of ordinal data.

Coefficient of Concordance (W)

Measurement of the extent to which pairs of related values on 2 variables tend to change together; gives measure of the extent to which values on one variable can be predicted from the values on the other variable.

Inferential statistics

What are the inferential statistics tests?

Z-test of one sample mean T-test

Variation of t-test

Independent samples Dependent samples Proportions/Percentages Variances 2 correlation coefficients

What level of data do all tests for inferential statistics use?

Interval

A measure for inferential statistics for 1 group. N ≥30 used to test whether a population parameter is significantly different from some hypothesized value.

Z-test of one sample mean

A measure for inferential statistics when n< 30

T-test

This kind of t-test is for 2 groups. It assesses whether the means of 2 groups are statistically different from each other.

Independent samples

This kind of t-test is for 1 group. It is used when the subjects making up the 2 samples are matched on some variable before being put in the 2 groups or the situation where the 2 groups are the same subjects administered a pretest and post test.

Dependent samples

This kind of t-test is for 1 group. It is used to test the hypothesis that an observed proportion is equal to a pre-specified proportion.

Proportions/Percentages

This kind of t-test uses the F test for equal and unequal.

Variances

This kind of t-test is for 2 groups. It is used to assess the significance of the difference between two correlation coefficients found in 2 independent samples.

2 correlation coefficients

It is used for problems of predicting one variable from a knowledge of another or possibly several other variables. It is always the regression of the predicted value on the known variable.

Regression Equation

What are the regression equations?

Linear regression of y on x Linear regression of x on y Standard error of estimate (SEE)

Standard deviation of errors of prediction. An indication of the variability about the regression line in the population wherein predictions are being made.

Standard error of estimate (SEE)

Among ANOVA and t-tests, which organizes and directs analysis and has easier interpretation of the results?

ANOVA

Performing repeated t-tests increases the probability of _?

Type I error

ANOVA needs to be followed by what test?

Post hoc test

What does the post hoc test determine?

Which group differs from each other.

We should not conduct a post hoc test unless the null is ?

Rejected.

A test designed for a situation with equal sample size per group, but can be adapted to unequal sample sizes as well.

Tukey's (Honestly Significant Difference or HSD) Test

Descriptive measure of the utility of the regression equation for making predictions square of the linear correlation coefficient.

Coefficient of Determination

In determining the coefficient of determination, the nearer the value is to _, the useful is the regression equation in making predictions.

Used to test the significance of the differences among means obtained from independent samples (parametric tests) significance where >2 conditions are used, or even when several independent variables are involved.

Analysis of Variance (ANOVA)

What are the types of ANOVA?

One-Way ANOVA Two-Way ANOVA Three-Way ANOVA

Tests used if 2 or more samples were drawn from the same population by comparing means or if data from several groups have a common mean. There's 1 IV and 1 DV and an interval level of data.

One-Way ANOVA

It tests the hypothesis that the means of 2 variables (factors) from 2 or more groups (2 IV, 1 DV) are equal (drawn from population with the same mean)

Two-way ANOVA

It has a similar purpose with the different kinds of ANOVA, except that the groups here have 3 categories of defining characteristics. It must have 3 IV and 1 DV

Three-Way ANOVA

It corrects alpha not just for all pair-wise or simple comparisons of means, but also for complex comparisons (contrast of more than 2 at a time) of means.

Scheffe's Test

The most popular of the post hoc procedures, most flexible and most conservative but has least statistically powerful procedure.

Scheffe's Test

Versatile formula data must be presented in frequencies. It is categorized under non parametric test but can also be used as a parametric test data

Chi-square

The 2 chi-square tests

Goodness of Fit Independence

Also called one-sample or one-variable chi-square. Involves 1 variable of ≥ 2 categories. It compares the distribution of measures for deviation from a hypothesized distribution. Nominal level of data.

Goodness of Fit

A chi-square test that involves 2 variables consisting of ≥ 2 categories. It determines whether the 2 variables are related or not. Reveals only the relationship but not the magnitude of the relationship.

Independence

Parametric Test or Non-Parametric test? Random selection of subjects from a normal population with equal variances.

Parametric Tests

Parametric Test or Non-Parametric test? Whether the groups or samples to be compared are independent samples or correlated.

Both

Parametric Test or Non-Parametric test? Whether the number of groups to be compared is ≥ 2.

Non-parametric Test

Parametric Test or Non-Parametric test? More power, higher power efficiency.

Parametric Test

Parametric Test or Non-Parametric test? Simple and easier to calculate.

Non-parametric Test

Parametric Test or Non-Parametric test? No need to meet data requirements at all.

Non-parametric Test

What are some non parametric tests?

Median test Father's Sign Test Wilcoxon Rank Sum Test Mann-Whitney (U) Test Wilcoxon Signed Ranks Test (T) Kruskal-Wallis H Test Friedman Rank

Non parametric test that compares the medians of 2 independent sample (uncorrelated). Only considers the number of cases above and below the median. Presented in an ordinal data.

Median Test

Non parametric test that compares the 2 correlated samples by obtaining the differences between each pair of observation. Consider the signs of the differences between paired observations in their sizes.

Fisher's Sign Test

What level of data do non parametric tests have?

Ordinal

Non parametric test that is used for comparing 2 independent samples using rank data.

Wilcoxon Rank Sum Test

Non parametric test that is used with independently drawn random samples, the sizes of which need not to be the same.

Mann-Whitney (U) Test

Non parametric test that is used for correlated samples, the difference, d, between each pair is calculated (data subjected to computation)

Wilcoxon Signed Ranks Test (T)

Non parametric test that is used to test whether or not a group of independence samples is from the same or different population. Compares 3 or more independent samples with respect to an ordinal variable.

Kruskal-Wallis H Test

Non parametric test that is used to test whether or not the data is from the sample under 3 different conditions.

Friedman Rank

The act of assigning numbers or symbols to characteristics of things according to rules.

Measurement

A set of numbers or symbols whose properties model empirical properties of the objects to which the numbers are assigned.

Scale

Permits classification. Rank ordering on some characteristic. Have no absolute zero point.

Ordinal scales

A set of test scores arrayed for recording or study.

Distribution

A straightforward, unmodified accounting of performance that is usually numerical.

Raw score

All scores are listed alongside the number of times each score occurred.

Simple Frequency Distribution

Class intervals replace the actual test scores.

Grouped frequency distribution

A graph with vertical lines drawn at the true limits of each test score or class interval forming a series of contiguous rectangles.

Histogram

Expressed by continuous line connecting the points where the test scores or class intervals (X axis) meet frequencies (Y axis).

Frequency Polygon

If the distribution is normal, the most appropriate measure of central tendency for mean is what data?

Interval or ratio

There are two scores that occur with the highest frequency. It is theoretically possible for this distribution to have two modes with falls at the high or low end of the distribution.

Bimodal distribution

An indication of how scores in a distribution are scattered or dispersed.

Variability

Statistics that describe the amount of variation in a distribution.

Measures of variability

It is a measure of variability equal to the difference between Q3 and Q1.

Interquatile ranges

It is equal to the interquartile range divided by 2.

Semi-interquatile range.

Quarter refers to an _.

Interval

The dividing points between the 4 quarters in the distribution. It refers to a specific point.

Quartiles

Q2 and the _ are exactly the same.

Median

In a perfectly symmetrical distribution, Q1 and Q3 will be exactly the same distance from the _.

Median

To obtain this, all the deviation scores are summed and divided by the total number of scores.

Average deviation

A measure of variability equal to the square root of the variance.

Standard deviation

It is equal to the arithmetic mean of the squares or the differences between the scores in a distribution and their mean.

Variance

It is the nature and extent to which symmetry is absent.

Skewness

Relatively few of the scores fall at the high end of the distribution.

Positive skew

Relatively few of the scores fall at the low end of the distribution.

Negative skew

Refers to the steepness of a distribution in its center.

Kurtosis

What are the 3 general types of curves and what does it mean?

Platykuric- relatively flat Leptokurtic- relatively peaked Mesokurtic- somewhere in the middle

Distribution that have _ kurtosis have a high peak and fatter tails compared to a normal distribution.

High

Distribution that have _ kurtosis values indicate a distribution with a rounded peak and thinner tails.

Lower

The development of the concept of a normal curve began in the middle of the 18th century with the work of _ and later the Marquis de Laplace.

Abraham DeMoivre

Through the early 19th century, scientists referred to the Normal curve as the _.

Laplace-Gaussian curve

He was credited with being the first to refer to the curve as the normal curve.

Karl Pearson

The distribution of the normal curve ranges from _ to _.

Negative infinity to positive infinity.

The normal curve is perfectly symmetrical with _ skewness.

No skewness

A raw score that has been converted from one scale to another scale where the latter scale has some arbitrarily set mean and standard deviation.

Standard scores

These scores are more easily interpretable than raw scores.

Standard scores

It results from the conversion of a raw score into a number indicating how many standard deviation units the raw score is below or above the mean of the distribution.

Z scores or zero plus or minus one scale

T score is devised by W.A Mccall and named it in honor of his professor _.

E. L. Thorndike

This standard scores system is composed of a scale that ranges from 5 standard deviations below the mean to 5 standard deviations above the mean. Mean=50; Std.= 10

T scores or fifty plus or minus ten scale

A standard score which has a mean of 5 and std of 2.

Stanine

The 5th stanine indicates performance in the _.

Average range

It is an expression of the degree and direction of correspondence between two things.

Correlation

A number that provides us with the index of the strength of the relationship between two things.

Coefficient of Correlation (r)

The meaning of correlation coefficient is interpreted by its _ and _.

Sign and magnitude

The two ways to describe a perfect correlation between two variables are as either _ of _.

+1 -1

Magnitude is a number anywhere between _ and _.

+1 -1

He devised the Pearson r .

Karl Pearson

Can be the statistical tool of choice when the relationship between the variables is linear and when the two variables being correlated are continuous.

Pearson r/ Pearson Correlation Coefficient/ Pearson Product-moment Coefficient of Correlation

The Spearman Rho was developed by _.

Charles Spearman

A measure of correlation that is frequently used when the sample size is small (fewer than 30 pairs of measurement) and when both sets of measurements are in ordinal form.

Spearman Rho

A simple graphic of the coordinate points for the values of the X-variable (horizontal axis) and the Y-variable (vertical axis). They are useful because they provide a quick indication of the direction and magnitude of the relationship between the 2 variables and also reveals the presence of curvilinearity.

Scatterplot

An eyeball gauge of how curved a graph is.

Curvilinearity

An extremely atypical point located at a relatively long distance from the rest of the coordinate points in a scatter plot.

Outlier

A statistic useful in describing sources of test score variability.

Variance

Variance from true differences.

True variance

Variance from irrelevant random sources.

Error variance

The greater the proportion of the total variance to true variance, the more _ the test.

Reliable

Refers to all the factors associated with the process of measuring some variable other than the variable being measured.

Measurement error

A source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process.

Random Error

A source of error in measuring a variable that is typically constant or proportionate to what is presumed to be the true value of the variable being measured.

Systematic error

Give a source of variance during test construction.

Item sampling/Content sampling

A validity coefficient that is used when correlating self rankings of performance.

Spearman Rho Rank-order Correlation

Statistics Flashcards

(151 cards)