Statistics Flashcards

Question

variability

Answer 1

goal is to obtain a measure of how spread out the scores are in a distribution; describes how the scores are scattered around that central point. In the context of inferential statistics, variability provides a measure of how accurately any individual score or sample represents the entire population. Measuring distance.

Answer 2

largest to smallest The interquartile range is the distance covered by the middle 50% of the distribution (the difference between Q1 and Q3).

Answer 3

average distance between most scores in the distribution (square root of the variance) – b/w the score and the mean

Answer 4

average squared deviation between most scores in the distribution (it is the standard deviation squared)

Answer 5

Positive Skew – extreme outlier(s) in the positive/ high end Negative Skew – extreme outlier(s) in the negative/ low end

Answer 6

violates the parametric assumptions needed for the parametric tests (t-tests, anova, pearson correlation)

Answer 7

methods for organizing and summarizing data -mean, median, mode, standard deviation, variance

Answer 8

A descriptive value for a population is called a parameter and a descriptive value for a sample is called a statistic.

Answer 9

organized tabulation showing exactly how many individuals are located in each category on the scale of measurement. A frequency distribution presents an organized picture of the entire set of scores, and it shows where each individual is located relative to others in the distribution.

Answer 10

when a frequency distribution table lists all of the individual categories (X values) it is called a regular frequency distribution.

Answer 11

In a grouped frequency distribution, the X column lists groups of scores, called class intervals, rather than individual values (too many different X values).

Answer 12

Frequency distribution graphs are useful because they show the entire set of scores. At a glance, you can determine the highest score, the lowest score, where the scores are centered, most common score.

Answer 13

methods for using sample data to make general conclusions (inferences) about populations

Answer 14

measure of the strength of a phenomenon

Answer 15

general goal of a hypothesis test is to rule out chance (sampling error) as a plausible explanation for the results from a research study. A technique to help determine whether a specific treatment has an effect on the individuals in a population.

Answer 16

H0, always states that the treatment has no effect (no change, no difference). According to the null hypothesis, the population mean after treatment is the same is it was before treatment. Test statistic in critical region → reject the null = p<0.05 = there is an effect, difference Test statistic in the body (outside of the critical region) → fail to reject the null = there is no effect/ no difference

Answer 17

a statement that directly contradicts a null hypothesis by stating that the actual value of a population parameter is less than, greater than, or not equal to the value states in the null hypothesis alpha level: establishes a criterion, or "cut-off", for making a decision about the null hypothesis. The alpha level also determines the risk of a Type I error. Power - probability that the test will reject the null hypothesis when the null hypothesis is false (find an effect when there is one) -Influenced by: Alpha level; Sample size; Sensitivity of test; Effect size

Answer 18

occurs when the sample data appear to show a treatment effect when, in fact, there is none. In this case the researcher will reject the null hypothesis and falsely conclude that the treatment has an effect.

Answer 19

caused by unusual, unrepresentative samples. Just by chance the researcher selects an extreme sample with the result that the sample falls in the critical region even though the treatment has no effect. The hypothesis test is structured so that Type I errors are very unlikely; specifically, the probability of a Type I error is equal to the alpha level.

Answer 20

occurs when the sample does not appear to have been affected by the treatment when, in fact, the treatment does have an effect. In this case, the researcher will fail to reject the null hypothesis and falsely conclude that the treatment does not have an effect.

Answer 21

commonly the result of a very small treatment effect. Although the treatment does have an effect, it is not large enough to show up in the research study.

Answer 22

A directional test or a one-tailed test includes the directional prediction in the statement of the hypotheses and in the location of the critical region.

Answer 23

effect size We use Cohen’s d as a standardized measure of effect size. Much like a z-score, Cohen’s d measures the size of the mean difference in terms of the standard deviation (Impact of the IV).

Answer 24

1) Independent Observations – random selection, representative sample you can’t just bring friends to the experiment – b/c of bias (not representative) 2) Normally distributed: Populations which samples are selected from are normally distributed (if not normally distributed then can’t run these tests) 3) Homogeneity of variance: populations from which samples are selected have equal variances (if change the experiment then the scores should shift equally)

Answer 25

(2 separate groups) An independent-measures design can be used to test for mean differences between two distinct populations (such as men versus women) or between two different treatment conditions (such as drug versus no-drug).

Answer 26

single group of individuals is obtained and each individual is measured in both of the treatment conditions being compared. Thus, the data consist of two scores for each individual.

Answer 27

each individual in one treatment is matched one-to-one with a corresponding individual in the second treatment

Answer 28

comparing 3 or more treatment conditions; more than 1 IV (factor); more than 2 levels of an IV

Answer 29

controls the risk of a Type I error in situations where a study is comparing more than two population means

Answer 30

ANOVA simply establishes that differences exist, it does not indicate exactly which treatments are different. Specifically, you must follow the ANOVA with additional tests, called post hoc tests, to determine exactly which treatments are different and which are not.

Answer 31

The Scheffe test and Tukey=s HSD are examples of post tests. Indicates exactly where the difference is

Answer 32

the repeated measures design eliminates individual differences from the between treatments variability because the same subjects are used in every treatment condition

Answer 33

ANOVA with the addition of multiple Dependent variables

Answer 34

ANOVA with a covariate- a variable that varies systematically with the IV -covariate: another variable that has a relation to the DV

Answer 35

used when violate the parametric assumptions (or data is NOT interval or ratio level)

Answer 36

tests the shape of the distribution of nominal data/ categorical/ not a parametric test

Answer 37

roughly the same number of subjects in each category? OR does the distribution fit a predetermined distribution (i.e. 40% male and 60% female)

Answer 38

similar to correlation in that it looks at the relationship between 2 variables but uses NOMINAL data

Answer 39

analogous to independent measures t-test

Answer 40

analogous to repeated measures anova

Answer 41

analogous to repeated measures t-test

Answer 42

analogous to independent measures (one-way) anova

Answer 43

tests the relationship between 2 variables that occur naturally: relationship only; no cause and effect; determine whether there is a relationship between two variables and to describe the relationship

Answer 44

positive or negative tells nothing about the strength of the relationship but tells about the direction (both go up together or one goes up the other goes down) -Negative correlation: one variable increases, other decreases -Positive correlation: one variable increases, the other increases Strength – between 0 and 1 (closer to 1 = stronger)

Answer 45

can make the relationship seem weaker because you are only getting a small snapshot of the full relationship between the two variables (i.e. looking at the relationship between age and reaction time if you only use 19-22 year olds you won’t find a relationship)

Answer 46

used for parametric data/ interval or ratio level data with a linear relationship

Answer 47

used for non-linear (i.e. inverted U relationship) or when data is ordinal/ ranked

Answer 48

used for prediction; a statistical technique that related a dependent variable to one or more independent (explanatory) variables. A regression model is able to show whether changes observed in the dependent variable are associated with changes in one or more of the explanatory variables

Answer 49

same as regression but using multiple variables to make the prediction (i.e. SAT scores, HS GPA, and quality of essay to predict college GPA)

Answer 50

allows for the evaluation of the causal flow of relationships because a priori predictions are made about the correlations

Answer 51

prediction of group membership (doctor, lawyer, tradesperson) from interval or ratio level predictors (scores usually)

Answer 52

same as Logistic regression but predictors can be any kind of variable (dichotomous, interval, ratio, nominal)

Answer 53

branch of mathematics used to summarize, analyze, and interpret a group of numbers or observations

Answer 54

set of systemic techniques used to acquire, modify, and integrate knowledge concerning observable and measurable phenomena

Answer 55

use of methods and procedures to make observations in which a researcher fully controls the conditions and experiences of participants by applying three required elements of control (manipulation, randomization, and comparison/control) to isolate cause-and-effect relationships between variables

Answer 56

examines the relationship between variables by measuring pairs of score for each individual. This method can determine whether a relationship exists between variables, but it lacks the appropriate controls needed to demonstrate cause and effect.

Answer 57

this study does not include a manipulated independent variable and it lack a comparison/control group

Answer 58

preexisting variable that is often a characteristic inherent to an individual, which differentiates the groups or conditions being compared in a research study. Because the levels of the variable are preexisting, it is not possible to randomly assign participants to groups

Answer 59

set of all individuals, items, or data of interest. This is the group about which scientist will generalize

Answer 60

set of individuals, items, or data selected from a population of interest.

Answer 61

variable that is manipulated in an experiment. This variable remains unchanged (or “independent”) between conditions being observed in an experiment. The specific conditions of the IV are referred to as levels.

Answer 62

the variable that is measured in each group of a study, and it is believed to change in the presence of the independent variable. It is the “presumed effect.”

Answer 63

procedures used to summarize, organize, and make sense of a set of scores called data. They are typically presented graphically or in tables.

Answer 64

procedures used that allow researchers to infer or generalize observations made with samples to the larger population from which they are selected

Answer 65

a characteristic (usually numeric) that describes a population. **Will usually never have this!

Answer 66

a characteristic (usually numeric) that describes a sample

Answer 67

are measurements in which a number is assigned to represent something or someone (a name, has no numerical properties)

Answer 68

measurements that convey order or rank alone (does not tell us space between ranks)

Answer 69

measurements that have no true zero and are distributed in equal units (equidistant, ex. Likert Scale)

Answer 70

measurements that have a true 0 and are distributed in equal units

Answer 71

varies by class. This variable is often represented as a label and describes nonnumeric aspects of phenomena.

Answer 72

varies by amount. This variable is measured numerically and is often collected by measuring or counting.

Answer 73

is measured alone a continuum at any place beyond the decimal points, can thus be measures in fractional units (ex. Time)

Answer 74

measured in whole units or categories that not distributed along a continuum

Answer 75

statistical measures for locating a single score that is most representative or descriptive of all scores in a distribution

Answer 76

also called the average) is the sum of a set of scores in a distribution, divided by the total number of scores summed. **Best to use the mean in a normal distribution and with interval and ratio scales. In normal distributions, the mean median, and mode are all the same

Answer 77

is the middle value in a distribution of data listed in numeric order (aka the 50th percentile) **Best to use the median with data that had outliers, is skewed and with ordinal scales

Answer 78

is the value in a data set that occurs most often or most frequently (can have multiple modes) **Best to use the mode when working with a modal/bimodal distribution (curve sinks in the middle) and with nominal scales

Answer 79

bell-shaped) a theoretical distribution in which scores are symmetrically distributed above and below the mean, the median, and the mode at the centers of the distribution

Answer 80

distribution of scores that includes scores that fall substantially above or below most other scores in a data set -Positive Skew: a distribution of scores that includes scores that are substantially larger than most other score (tail/flat part will be on the positive/ right side) -Negative Skew: a distribution of scores that includes scores that are substantially smaller than most other scores (tail/flat part will be on the negative/ left side)

Answer 81

a distribution of scores in which two scores occur most often or most frequently (a high hump where one mode is, a dip in the center (mean/median), and then another high hump where the second mode is, refer to pg. 95 in Privitera book for visual so you don’t have to keep listening to me talk about humps)

Answer 82

is a measure of the dispersion or spread of scores in a distribution that ranges from 0 to positive infinity

Answer 83

is the difference between the largest value (L) and smallest value (S) in a data set.

Answer 84

is the range of values between the upper (Q3) and the lower (Q1) quartiles of a data set Lower Quartile: the median value of the lower half of a data set at the 25th percentile of a distribution Upper Quartile: is the median value of the upper half of the data set at the 75th percentile of a distribution

Answer 85

is the sum of the squared deviations of scores from their mean. The SS is the numerator in the variance formula

Answer 86

measure of variability for the average squared difference that scores deviate from their mean

Answer 87

measure of variability for the average distance that scores deviate from their mean. It is calculated by taking the square root of the variance.

Answer 88

value on the x-axis of a standard normal distribution. The numerical value of a z score specifies the distance or the number of standard deviations that a value is above or below the mean

Answer 89

-Mathematically defined -Theoretical (no distribution is perfect, it is approximate) -Mean, median, and mode all in the 50th percentile -It is symmetrical -The total area under the curve is equal to 1 or 100%

Answer 90

a distribution of samples means or sample variances that could be obtained in samples of a given size from the same population **Sampling distribution of sampling means is an unbiased estimator and follows the central limit theorem

Answer 91

any sample statistic, such as sample variance when we divide SS by n-1, obtained from a randomly selected samples that equals the value of its respective population parameter, such as population variance, on average

Answer 92

explains that regardless of the distribution of scores in a population, the sampling distribution of samples means selected at random from that population will approach the shape of a normal distribution, as the number of samples in the sampling distribution increases

Answer 93

the standard deviation of a sampling distribution of sampling means. It is the standard error or distance that sample mean values deviate from the value of the population mean

Answer 94

the extent to which sample means selected from the same population differ from the mean of sampling distribution of sample means (SDSM) -The question is: is you randomly select one sample mean from the sample, how close will it be to the SDSM? -Therefore, sampling error is the measurement of how narrow you SDSM is so a lot of error will mean that when taking a random sample mean, that number will be far off from the SDSM

Answer 95

The larger the population standard deviation, the larger the standard error -As numerator gets bigger, quantity as a whole gets bigger -The wider the distribution, the less likely you are to get a random sample mean that is close to the SDSM As sample size increases, standard error decreases -As denominator gets bigger, quantity gets smaller -Narrower distribution bigger population = less error

Answer 96

tells you the probability of sampling a particular mean from a particular population (NEED population standard devation)

Answer 97

a method for testing a claim or hypothesis about a parameter in a population, using data measured in a sample. In this method, we use test a hypothesis by determining the likelihood that a sample statistic would be selected if the hypothesis regarding the population parameter were true -Goal: to see if difference are significant

Answer 98

is a statement about a population parameter, such as the population mean, that is assumed to be true

Answer 99

a statement that directly contradicts a null hypothesis by stating that the actual value of a population parameter is less than, greater than, or not equal to the value states in the null hypothesis

Answer 100

is a criterion of judgment upon which a decision is made regarding the value stated in a null hypothesis. The criterion is based on the probability of obtaining a static measured in a sample if the value stated in the null hypothesis were true (p value)

Answer 101

describes a decision made concerning a value stated in the null hypothesis. When the null hypothesis is rejected, we reach significance. When the null hypothesis is retained, we fail to reach significance.

Answer 102

is the probability of rejecting the null hypothesis that is actually true. Researchers directly control for the probability of committing this type of error (using alpha). So, saying there is significance and rejecting the null when the null was actually true (false positive)

Answer 103

the largest probability of committing a Type I error than we will allow and still decide to reject the null hypothesis (usually .05)

Answer 104

is the probability of retaining a null hypothesis that is actually false. So, retaining the null and saying your results are not significant when really the alternative hypothesis is true (false negative)

Answer 105

the probability that a randomly selected sample will show that the null hypothesis is false when the null hypothesis is indeed false

Answer 106

a cutoff value that define the boundaries beyond which less than 5% of sample means can be obtained if the null hypothesis is true

Answer 107

an inferential statistic used to determine the number of standard deviations in a standard normal distribution that a sample mean deviates form the population mean stated in the null hypothesis **To do a z-test, you need a POPULATION standard deviation, which is very unlikely. That is why t test are more commonly used.

Answer 108

a statistical measure od the size of an effect in a population, which allows researchers to describe how far scores shifted in the population, or the present of variance that can be explained by a given variable (Cohen’s d) Cohen’s d effects: Small= d < 0.2, Medium= 0.2 < d < 0.8, Large= d > 0.8

Answer 109

alternative hypothesis is stated as specifically greater of less than, region of rejection is on either the upper or lower portion of sampling distribution, makes it easier to reject the null and has greater power BUT is harder to justify (everybody will be like… that’s some whack ass research)

Answer 110

alternative hypothesis is stated as not equal, region of rejection is upper AND lower portion of sampling distribution, ALWAYS USE TWO TAILS

Answer 111

an inferential statistic used to determine the number of standard deviations in a t distribution that a sample mean deviates from the mean value or mean difference stated in the null hypothesis

Answer 112

Normality Random Sampling Independence Equal Variances -(larger s2 divided by smaller s2, should be less than 2 to show equality)

Answer 113

a statistical procedure used to compare a mean value measured in a sample to known value in the population. It is specifically used to test hypothesis concerning the mean in a single population with an unknown variance (or other arbitrary value).

Answer 114

a statistical procedure used to compare the mean difference between two independent groups. This test is specifically used to test hypotheses concerning the difference between two population means, where the variance in one or both populations is unknown

Answer 115

participants are related

Answer 116

research design in which the same participants are observed in each treatment

Answer 117

a types of repeated-measures design in which researchers measure a dependent variable from participants before and after a treatment

Answer 118

type of repeated-measures design in which researchers observe the same participants across many treatments but not necessarily before and after a treatment

Answer 119

research design in which participants are selected and then matched based on common characteristics or traits Experimentally: ex. measure intelligence and then match the two participants who scored the highest (put one in the drug group and one in the placebo group) Naturally: ex. testing twins

Answer 120

More practical Minimizes standard error -Most important advantage, it reduces standard error by removing individual differences and isolating effect Increases power

Answer 121

Requires individuals to be exposed to more than one treatment -The 1st test may influence the score of the 2nd test (aka order effects) -People get tired the more tests you give them

Answer 122

a statistical procedure used to test hypothesis concerning the mean in a single population with a known variance

Answer 123

a type of sample in which different participants are independently observed one time in each group

Answer 124

only one group of participants that are given treatment and then compared to the population mean or some arbitrary value

Answer 125

participants in each group or sample are unrelated, selected from 2 or more populations, or single population with random different groups, then the groups are compared to each other

Answer 126

participants are related

Answer 127

is a research design in which the same participants are observed in each treatment. Two types of repeated-measures designs are the pre-post design and the within-subjects design

Answer 128

types of design in which researchers measure a dependent variable for participants before (pre) and after (post) a treatment

Answer 129

is a type of repeated-measures design in which researchers observe the same participants across many treatments but not necessarily before and after a treatment

Answer 130

a within-subjects research design in which participants are selected and then matched, experimentally or naturally, based on common characteristics or traits Experimentally: ex. Measure intelligence then match the two participants who scored the highest with each other and so on Naturally: ex. Match based on genetics (twins)

Answer 131

score or value obtained by subtracting one score from another. In a related-samples t test, difference score are obtained prior to computing the test statistic

Answer 132

a test, this refers to any unexplained difference that cannot be attributed to, or caused by, having different treatments. The standard error of the man is used to measure error or unexplained differences in a statistical design.

Answer 133

symbolized as K, are the number of groups or different ways in which an independent or quasi-independent variable is observed

Answer 134

statistical procedure used to test hypothesis for one or more factors concerning the variance among two or more group means (k>2), where the variance in one or more populations is unknown

Answer 135

a statistical procedure used to test hypothesis for one factor with two or more levels concerning the variance among group means. This test is used when different participants are observed at each level of a factor and the variance in any one population is unknown.

Answer 136

variation that can be measures in a study. In the one-way between-subjects ANOVA, there are two sources of variation: variation attributed to difference between group means and variation attributed to error.

Answer 137

is the variation attributed to mean differences with each group. This source of variation cannot be attributed to or caused by having difference groups and is therefore called error variation (this is bad variation)

Answer 138

variation attributed to the mean differences between groups (this is good variation). This is where you will see treatment effects.

Answer 139

statistical procedure computed following a significant ANOVA to determine which pair or pairs of group means significantly differ. This test is only necessary when k >2 because multiple comparisons are needed. When k=2, only one comparison is made because only one pair of group means can be compared.

Answer 140

research design in which participants are observed across the combination of levels of two of more factors

Answer 141

statistical procedure used to test hypotheses concerning the variance of groups created by combining the levels of two factors. This test is used when the variance in any one population is unknown.

Answer 142

research design in which different participants are observed at each level of the between-subjects factor and the same participants are observed across the levels of the within-subjects factor.

Answer 143

combination of one level from each factor. Each cell is a group in a research study

Answer 144

research design in which each level of one factor is combined or crossed with each level of the other factor, with participants observed in each cell or combination of levels

Answer 145

source of variation associated with mean differences across the levels of a single factor. In the two-way ANOVA, there are two factors and therefore two main effects: one for Factor A and one for Factor B.

Answer 146

source of variation associated with the variance of group means across the combination of levels of two factors. It is a measure of how cell means at each level of one factor change across the levels of a second factor.

Answer 147

number of different ways a variable is manipulated or measured

Answer 148

unique combinations of different levels of different independent variables

Answer 149

are hypothesis tests used to analyze a significant interaction by comparing the mean differences or simple main effects of one factor at each level of a second factor **Main effects and interactions occur independently of each other

Answer 150

add cell means then divide by how many cells, the marginal means will inform you about main effects and the cell means will inform you about interactions

Answer 151

one way: uses one variable (e.g. average heights of plants grown with different fertilizers) two way: uses two independent variables, also assesses for interaction between the two independent variables (e.g. worker productivity based on gender and department)

Statistics Flashcards

(175 cards)