IDE 620 Week 1 Flashcards

1
Q

The type of statistical analysis focused on describing, summarizing, or explaining a set of data.

A

Descriptive statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The type of statistical analysis focused on making inferences about populations based on sample data

A

Inferential statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A set of data where the rows are “cases” and the columns are “variables”

A

Data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data arrangement in which the frequencies of each unique data value is shown

A

Frequency distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Graphs that use vertical bars to represent the data values of a categorical variable

A

Bar graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Graph depicting frequencies and distribution of a quantitative variable

A

Histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A graph relying on the drawing of one or more lines connecting data points.

A

Line graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A graphical depiction of the relationship between two quantitative variables

A

Scatterplot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Numerical value expressing what is typical of the values of a quantitative variable

A

Measures of Central Tendency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The most frequently occurring number

A

Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The center point in an ordered set of numbers

A

Median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The arithmetic average

A

Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Numerical value expressing how spread out or how much variation is present in the values of a quantitative variable

A

Measures of variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Highest number minus the lowest number

A

Range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The average deviation of data values from their mean in squared units

A

Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The square root of the variance

A

Standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

A theoretical distribution that follows the 68, 95, 99.7 percent rule

A

Normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Rule stating percentage of cases falling within 1, 2, and 3 standard deviations from the mean on a normal distribution

A

68, 95, 99.7

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

A score that has been transformed into standard deviation units

A

z score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

The difference between two means in the variables’ natural units

A

Unstandardized difference between means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

The difference between two means in standard deviation units

A

Cohen’s d

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

index of magnitude or strength of a relationship or difference between means

A

Effect size indicator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Index indicating the strength and direction of linear relationship between two quantitative variables

A

Correlation coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Correlation in which values of two variables tend to move in opposite directions

A

Negative correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Correlation in which values of two variables tend to move in the same direction

A

Positive correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

a nonlinear (curved) relationship between two quantitative variables

A

Curvilinear relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

The type of regression analysis that can accurately model curved relationships

A

Curvilinear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

The correlation between two quantitative variables controlling for one or more variables

A

Partial coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Use of one or more quantitative variables to explain or predict the values of a single quantitative dependent variable

A

Regression analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Regression analysis with one dependent variable and one independent variable

A

single regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

regression analysis with one dependent variable and two or more independent variables

A

Multiple regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

The equation that defines a regression line

A

Regression equation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

The line of “best” fit based on a regression equation

A

Regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Defined as the point at which a regression line cross the y vertical access

A

Y intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

The slope or change in y given a one unit change in x

A

regression coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

The regression coefficient in a multiple regression equation

A

Partial regression coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Table used to examine the relationship between categorical variables

A

Contingency table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Percentage of people in a group that have a particular characteristic

A

Rates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Any of several methods used when the variables, especially the *dependent variables, to be analyzed are categorical rather than continuous (measured on an *interval or *ratio scale). These include the *chi-square test, *log-linear analyses, *logistic regression, and *probit regression.

A

Categorical Data Analysis 

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

A variable that distinguishes among subjects by sorting them into a limited number of categories, indicating type or kind, as religion can be categorized: Buddhist, Christian, Jewish, Muslim, Other, None. Breaking a continuous variable, such as age, to make it categorical is a common practice, but since this involves discarding information, it is usually not a good idea.

A

Categorical Variable. The categories of a categorical variable should be exhaustive (cover all cases) and mutually exclusive (no case can fit into more than one category). Also called “discrete” or “nominal” variable. Compare *attribute, *continuous variable, *nominal scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

A graphic representation of the alternatives in a decision-making problem.

A

Decision Tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

(a) Assigning numbers or symbols to things, usually to characteristics of *variables. (b) The subdiscipline concerned with how to assign numbers or symbols to variables. Compare *coding.

A

measurement: People often think of measurement and statistics as being the same thing, but there is a distinction: Measurement is how we get the numbers upon which we then perform statistical operations. If we do not have good measurement, the result is *GIGO. See *level of measurement.

43
Q

Inaccuracy due to flaws in a measuring instrument, due to mistakes of those using it, or simply due to random or chance factors.

A

Measurement error: Measurement error is inevitable since perfect precision is impossible. If measurement errors are *random, they will cancel one another out in the long run; however, if they are systematic, they will result in bias or invalidity rather than just reduced reliability. See *random error, *sampling error. Compare *bias.

44
Q

Either (a) a limit or boundary or (b) a characteristic or an element. The word has many general and technical uses.

A

Parameter. In statistics, a common use of “parameter” is for a characteristic of a population, or of a distribution of scores, described by a statistic such as a mean or a standard deviation. For example, the mean (average) score on the midterm exam in Psychology 201 is a parameter. It describes the population composed of all those who took the exam. Population parameters are usually symbolized by Greek letters, such as σ (lowercase sigma), not Roman letters such as s, which are used for sample statistics.
In computers, the most common use of “parameter” is as an instruction limiting or specifying what you want the computer to do. For example, if you typed the following into your computer: “delete files 4, 7, & 9,” “delete” would be the command, and “files 4, 7, & 9” would be the parameters.
In mathematics, “parameter” means an unknown that may vary. In the equation Y = bX + e, b and e are parameters.

45
Q

A group of persons (or institutions, events, or other subjects of study) that one wants to describe or about which one wants to generalize.

A

Population. In order to generalize about a population, one often studies a *sample that is meant to be *representative of the population. Also called *target population and a “universe.”

46
Q

Selecting a group of subjects (a sample) for study from a larger group (population) so that each individual (or other *unit of analysis) is chosen entirely by chance.

A

Random sampling. When used without qualification, random sampling means *simple random sampling. Also sometimes called “equal probability sample,” since every member of the population has an equal *probability of being included in the sample. A random sample is not the same thing as a haphazard or whimsical or *accidental sample. Using random sampling reduces the likelihood of *bias. Compare *probability sample, *cluster sample, *quota sample, *stratified random sample. See *random number generator for software useful for drawing random samples.

47
Q

Sample

A

A group of *subjects or *cases selected from a larger group in the hope that studying this smaller group (the sample) will reveal important information about the larger group (the *population).

48
Q

 A number—such as a *mean or a *correlation coefficient—that describes some characteristic of (the “status” of) a *variable or of a group of *data.

A

Statistic

49
Q

 (a) A measure or value that is the same for all units of analysis. (b) A quantity that does not change value in a particular context

A

Constant. In a *regression equation, the *intercept (also called *regression constant and y-intercept) is often referred to as “the constant”; the *beta coefficients are also constants, but are less often so called. Compare *variable, *universal constant.
For example, (a) in research that studied variables explaining unemployment among women only, sex would be a constant; all subjects (units of analysis) are female. An example of (b) would be the sex of the inmates in a federal prison for men, which does not vary over time. For definition (c), the value of a would be the constant in the regression equation Ŷ = a + bX + e.

50
Q

A variable that can be expressed by a large (often infinite) number of points or values. Loosely, a variable that can be measured on an *interval or a *ratio scale. Compare *categorical variable.

A

Continuous Variable

Deciding whether to treat data as continuous can have important consequences for choosing statistical techniques. Ordinal data are often treated as continuous when there are many ranks in the data, but as categorical when there are few. See *discrete variable for further discussion.

For example, height and grade point average (GPA) are continuous variables. Persons’ heights could be 69.38 inches, 69.39 inches, and so on; GPAs could be 3.17, 3.18, and so on. In fact, since values always have to be rounded, theoretically continuous variables are measured as discrete variables. There is an infinite number of values between 69.38 and 69.39 inches, but the limits of our ability to measure or the limits of our interest in precision lead us to round off continuous values.

GPA is a good example of the difficulty of making these distinctions. It is routinely treated as a continuous variable, but it is constructed out of a rank order scale (A, B, C, etc.). Numbers are assigned to those ranks, and the numbers are then treated as though they were an interval scale.

51
Q

(a) The presumed effect in a study, so called because it “depends” on another variable. (b) The variable whose values are predicted or explained by the *independent variable, whether or not caused by it. Also called *outcome, *criterion, and *response variable.

A

Dependent variable:

For example, in a study to see if there were a relationship between students’ drinking of alcoholic beverages and their grade point averages, the drinking behavior would probably be the presumed cause (independent variable); the grade point average would be the effect (dependent variable). But it could be the other way around—if, for instance, one wanted to study whether students’ grades drive them to drink.
Note: Some authors use the term “dependent variable” only for *experimental research; for *nonexperimental research they might use (or argue that others should use) *criterion variable or *outcome variable. Most commonly, however, dependent variable is used in both experimental and nonexperimental research.

52
Q

Commonly, another term for *categorical (or *nominal) variable. Compare *continuous variable.

A

Discrete variable:

For example, the number of people in a family is clearly a discrete variable. So is the outcome of flips of a coin; if you flip a coin 10 times, you can’t get 3.27 tails. But the distinction is not always so clear. Take personal income. It looks like a continuous variable, and it is usually treated as one in research. Millions of possible values stretch from zero to Bill Gates’s income. More strictly, however, income is discrete. Income does not come in units smaller than one cent; there is only one value between $411.01 and $411.03 ($411.02). Thus, while income is measured on a *ratio scale, it is a discrete variable. By contrast, weight is a truly or a strictly continuous variable. No matter how close two individuals’ weight, there is always an intermediate value, although an ordinary scale might not capture it. Because of limits in how accurately we can measure, all measurements are discrete in practice.

53
Q

The presumed *cause in a study. Also, a variable that can be used to predict or explain the values of another variable. A variable manipulated by an experimenter who predicts that the manipulation will have an effect on another variable (the *dependent variable).

A

Independent variable:

Some authors use the term “independent variable” for experimental research only. For these authors, the key criterion is whether the researcher can manipulate the variable; for nonexperimental research, these authors use the term *predictor variable or *explanatory variable. However, most writers say “independent variable” when they mean any causal variable, whether in experimental or nonexperimental research. Some even use it in pure *forecasting, where no causal connection is implied, as when variations in the starting date of the migrating season are used to predict the severity of winter temperatures.

54
Q

A scale or measurement that describes variables in such a way that the distance between any two adjacent units of measurement (or “intervals”) is the same, but in which there is no absolute or true zero point. Strictly speaking, scores on an interval scale can meaningfully be added and subtracted, but not multiplied and divided. Compare *ratio scale.

A

Interval scale:

For example, the Fahrenheit temperature scale is an interval scale because the difference, or interval, between (say) 72 and 73 degrees is the same as that between 20 below and 21 below. Since there is no true zero point (zero is just a line on the thermometer), it is an interval, not a ratio scale. There is a zero, of course, but it is not a true zero; when it’s zero degrees outside, there is still some warmth, more than when it’s 20 below.

To take another example, if on a 20-item vocabulary test Mr. A got 12 right and Mr. B got 6 right, it would be correct to say that A answered two times as many correctly, but it would not be correct to say that A’s vocabulary was twice as large as B’s—unless the test measured all vocabulary knowledge and getting a zero on it meant that a person had no vocabulary at all (in that case, the test would be an example of a ratio scale).

55
Q

 (a) Assigning numbers or symbols to things, usually to characteristics of *variables. (b) The subdiscipline concerned with how to assign numbers or symbols to variables. Compare *coding.

A

Measurement

People often think of measurement and statistics as being the same thing, but there is a distinction: Measurement is how we get the numbers upon which we then perform statistical operations. If we do not have good measurement, the result is *GIGO. See *level of measurement.

56
Q

A scale of measurement in which numbers stand for names but have no order or value. See *categorical variable.

A

Nominal scale:

For example, coding female = 1 and male = 2 would be a nominal scale; females do not come first, two females do not add up to a male, and so on. The numbers are merely labels.

57
Q

A way of measuring that ranks subjects (puts them in an order) on some variable. The differences between the ranks need not be equal (as they are in an *interval scale). Team standings or categories on an attitude scale (highly concerned, very concerned, concerned, etc.) are examples.

A

Ordinal scale:

A question that sometimes arises in statistical analyses is whether ordinal variables ought to be considered *continuous. A rule of thumb is if there are many ranks, it is permissible to treat the variable as continuous, but such rules of thumb leave much room for disagreement.

58
Q

A measurement or scale in which any two adjoining values are the same distance apart and in which there is a true zero point.

A

Ratio Scale

The scale gets its name from the fact that one can make ratio statements about variables measured on a ratio scale. See *interval scale, *level of measurement.

For example, height measured in inches is measured on a ratio scale. This means that the size of the difference between being 60 and 61 inches tall is the same as between being 66 and 67 inches tall. And, because there is a true zero point, 70 inches is twice as tall as 35 inches (ratio of 2 to 1). The same kind of ratio statements cannot be made, for example, about measures on an *interval or *ordinal scale. The person who is second tallest in a group is probably not twice as tall as the person who is fourth tallest.

59
Q

Another term for *level of measurement. A term used to describe measurement scales in terms of how much information they convey about the differences among values—the higher the level, the more information.

A

Scale of Measurement:

According the popular measurement typology developed by S. S. Stevens, there are four levels of measurement. Arranged in order of mathematical strength, from the highest to the lowest, they are *ratio, *interval, *ordinal, and *nominal. It is possible to describe data gathered at a higher level with a lower level of measurement, but the reverse is not true. For example, one can express income in dollars and cents (ratio level) or with ordinal descriptions like high, medium, and low income.

It is important to be aware of the level of measurement you are using because statistical techniques appropriate at one level might produce ridiculous results at another. For example, in a study of religious affiliation, you might number your variables as follows: 1 = Catholic, 2 = Jewish, 3 = Protestant, 4 = Other, 5 = None. The religion variable is measured at the nominal level. The numbers are just convenient labels or names; one cannot treat them as if they mean something at the interval level; one should not add together a Jewish person (2) and a Protestant person (3) to get an atheist (5).
Considerable controversy exists concerning which statistics can validly be used to analyze variables measured at different levels of measurement. The debates usually revolve around questions of how serious a distortion occurs when one violates particular *assumptions presumed by certain statistical techniques. As in constitutional law, so too in statistics there are strict and loose constructionists in the interpretation of adherence to assumptions.

60
Q

a) The uppercase sigma usually means “sum of” and is thus an indication that the numbers following it are to be (or have been) added together. (b) Lowercase sigma is often used to symbolize the *standard deviation of a *population. See *standard score. (c) Lowercase sigma squared means population *variance.

A

Sigma [Σ , σ] 

61
Q

 (a) A condition or characteristic that can take on different categories, levels, or values. (b) Loosely, anything studied by a researcher. (c) Any finding that can change, that can vary, or that can be expressed as more than one value or in various values or categories.

A

Variable

The opposite of a variable is a constant. (d) In algebra, a variable is an unknown.
Here are some of the major types of variables: *categorical, *continuous, *dependent, *independent, *moderator, *mediating, *intervening, *endogenous, *exogenous, and *random.

Examples of variables include anything that can be measured or assigned a number, such as unemployment rate, religious affiliation, experimental *treatment, grade point average, and so on. Much of social science is aimed at discovering and demonstrating how differences in some variables are related to or explain differences in others.

62
Q

The set of cases selected from the population

A

Sample

63
Q

The full group to which one wants to generalize

A

Population

64
Q

A numerical index based on sample data

A

Statistic

65
Q

A numerical characteristic of a population

A

Parameter

66
Q

The theoretical probability distribution of the values of a statistic that would results if you selected all possible samples of a particular size from a population

A

Sampling distribution

67
Q

The theoretical probability distribution of the means of all possible samples of a particular size selected from a population

A

Sampling distribution of the mean

68
Q

The standard deviation of a sampling distribution

A

Standard errors

69
Q

A statistic that follows a known sampling distribution and is used in significance testing

A

Test statistic

70
Q

The brach of inferential statistics focused on obtaining estimates of the values of population parameters

A

Estimation

71
Q

Use of the value of a sample statistic as one’s estimate of the value of a population parameter

A

Point estimation

72
Q

Placement of a range of numbers around a point estimate

A

Interval estimation

73
Q

An interval estimate inferred from sample data that has a certain probability of including the true population parameter

A

Confidence interval

74
Q

The process of testing a predicted relationship or hypothesis by making observations and then comparing the observed facts with the hypothesis or predicted relationship; the branch of inferential statistics focused on determining when the null hypothesis can or cannot be rejected in favor of the alternative hypothesis

A

Hypothesis

75
Q

Typically the hypothesis of no difference between means or no relationship in the population

A

Null hypothesis

76
Q

The logical opposite of the null hypothesis

A

Alternative hypothesis

77
Q

The point at which one would reject the null hypothesis and accept the alternative hypothesis

A

Alpha

78
Q

Another name for the alpha level

A

Level of significance

79
Q

The significance test of the difference between two means that uses the t probability distribution

A

Independent samples t test

80
Q

The area on a null hypothesis sampling distribution where the observed value of the statistic, if it fell in this area, would be considered a rare event

A

Critical region

81
Q

The likelihood of the observed value (or a more extreme value) of a statistic, if the null hypothesis were true

A

Probability value (usually called p value)

82
Q

Shorter name for probability value in significance testing

A

P value

83
Q

Conclusion that an observed finding would be very unlikely if the null hypothesis were true

A

Statistically significant

84
Q

Claim made when statistically significant finding seems large enough to be important

A

Practical significance

85
Q

A type of practical significance

A

Clinical significance

86
Q

An index of magnitude or strength of relationship

A

Effect size indicators

87
Q

The amount of variance in the dependent variable uniquely explained by a single independent variable

A

eta squared

88
Q

An alternative hypothesis that includes the “not equal to sign”

A

nondirectional hypothesis

89
Q

An alternative hypothesis contains a less than sign or a greater than sign

A

Directional alternative hypothesis

90
Q

The probability of correctly rejecting the null hypothesis when it is fake

A

Statistical power

91
Q

The five steps in the process of significance testing

A

Logic of hypothesis testing

92
Q

Rejection of a true null hypothesis

A

Type I Error

93
Q

Failure to reject a false null hypothesis

A

Type II Error

94
Q

Statistical test used to determine if a correlation coefficient is statistically significant

A

t test for correlation coefficient

95
Q

The number of values that are “free to vary;” it’s used when computing a statistic to be used in inferential statistics

A

Degrees of freedom

96
Q

Statistical test used when you have one quantitive DV and one categorical DV

A

One-way analysis of variance

97
Q

Abbreviation for analysis of variance

A

ANOVA

98
Q

Follow-up test to one-way ANOVA when the categorical IV has three or more levels used to determine which pairs of means are significantly different

A

Post hoc tests

99
Q

Statistical test used when you have one qualitative DV and a mixture of categorical and quantitative IVs

A

Analysis of covariance (ANCOVA)

100
Q

Statistical test used when you have one quanitative DV and two categorical iv’s

A

Two-way analysis of variance

101
Q

Statistical test used when you have one quantitative DV and one repeated measures IV

A

One-way repeated measures analysis of variance

102
Q

Statistical test used to determine if regression coefficient is statistically significant

A

t test for regression coefficients

103
Q

The amount of variance in the dependent variable uniquely explained by a single qualitative independent variable

A

Semi-partial correlation squared

104
Q

Statistical test used to determine if a relationship observed in a contingency table is statistically significant

A

chi-squared test for contingency tables