IDE 620 Week 2 Flashcards

1
Q

A way of depicting frequency distributions for categorical (nominal) variables, such as religious affiliation, ethnic group, or state of residence

A

Bar Graph

Note that the bars do not touch in a bar graph, as they do in a histogram.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A distribution having two modes or peaks.

A

Bimodal Distribution

Strictly speaking, for a distribution to be called bimodal, the peaks should be the same height. However, it is quite common to call any two-humped distribution bimodal, even when the high points are not exactly equal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Any of several methods, pioneered by John Tukey, of discovering unanticipated patterns and relationships, often by presenting quantitative data visually. The stem-and-leaf display and the box-and-whisker diagram are well-known examples.

A

Exploratory Data Analysis (EDA) `

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A tally of the number of times each score occurs in a group of scores. More formally, a way of presenting data that shows the number of cases having each of the attributes of a particular variable.

A

Frequency Distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

A graph with frequency shown by the height of contiguous bars, used for variables measured at the interval and ratio levels.

A

Histograph

Because the data in a histogram are interval or ratio, the bars should touch; in a bar graph for nominal or ordinal data, the bars do not touch.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A graphic depiction of data relying on one or more lines.

A

Line graph

The lines can be linear or curvilinear. For example, a line graph of the business cycle over the past 50 years might be plotted on a line graph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The most common (most frequent) score in a set of scores.

A

Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A distribution of scores or measures that, when plotted on a graph, produce a nonsymmetrical curve.

A

Skewed Distribution 

A positively (or upward or right) skewed distribution is one in which the infrequent scores are on the high or right side of the x-axis, such as the scores on a difficult test. A left (or downward or negatively skewed) distribution is one in which the rare values are on the low or left side of the x-axis, such as the scores on an easy test. One way to sort out which is which is to remember that a skewer is a pointy thing; when the pointy end of the distribution is on the right, it is right skewed, and conversely for left skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The points falling between half a measurement unit below and half a unit above the number.

A

Real Limits (of a Number)  

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The degree to which measures or scores are bunched on one side of a central tendency and trail out (become pointy, like a skewer) on the other.

A

Skewness

The more skewness in a distribution, the more variability in the scores.

Computer programs often compute indexes of skewness. Positive values indicate a positive or right skew. Negative values indicate a negative or left skew.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A way of recording the values of a variable, created by John Tukey, that presents raw numbers in a visual, histogram-like display. It is a histogram in which the bars are built out of numbers.

A

Stem-and-Leaf Display

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A distribution with only one mode

A

Unimodal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Any of several statistical summaries that, in a single number, represent the typical or average number in a group of numbers. Examples include the mean, mode, and median.

A

Measures of central tendency

A batting average is a well-known measure of central tendency in the United States. A grade point average might be a more important example for many college students.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

A variable that can take on many possible values

A

Continuous variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A variable that takes on only a few possible values

A

Discrete variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The variable you are measuring

A

Dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

The variable that you manipulate

A

Independent variable

18
Q

Person who created stem-and-leaf display and EDA (exploratory data methods)

A

John Tukey

19
Q

Leftmost digits of a number

A

Leading digits (most significant digits)

20
Q

Vertical axis of na stem-and-leaf display

A

Stem

21
Q

Digits to the right of the leading digits

A

Trailing digits (less significant digits)

22
Q

Horizontal axis of display containing the trailing digits

A

Leaves

23
Q

Average; most popular measure of location or central tendency; has the desirable mathematical property of minimizing the variance.

A

Mean

24
Q

computed by taking the nth root of the product of n scores (e.g., the square root of 2 scores, the cube root of 3, etc.).

A

Geometric mean

25
Q

calculated by dividing n by the sum of the reciprocals of the numbers.

A

Harmonic mean (smaller than the arithmetic and geometric mean)

26
Q

The middle score or measurement in a set of ranked scores or measurements; the point that divides a distribution into two equal halves; the 50th percentile.

A

Median

27
Q

The most common (most frequent) score in a set of scores.

A

Mode

28
Q

A mean computed after removing the extreme observations.

A

Trimmed Mean

Thus, a trimmed mean is a measure of central tendency that allows the researcher to deal separately with a distribution’s outliers.

29
Q

Anything that produces systematic error in a research finding; causes of bias can range from poor data collection to flawed measurements to inappropriate statistical analysis. While the distortions due to systematic error continue to grow in the long run, random errors tend to balance out in the long run.

A

Bias

30
Q

The effects of any factor that the researcher did not expect to influence the dependent variable.

A

Bias `

31
Q

A type of graph in which boxes and lines show a distribution’s shape, central tendency, and variability. The “boxplot,” as it is often called, gives an informative picture of the values of a single variable and is helpful for indicating whether a distribution is skewed and has outliers.

A

Box-and-whisker diagram

32
Q

The upper and lower boundaries of each box in a box-and-whisker diagram

A

Hinges

33
Q

The number of values “free to vary” when computing an inferential statistic. It’s the number of pieces of information that can vary independently of one another or, alternatively stated, the number of unconstrained observations used in calculating an estimate

A

Degrees of freedom

34
Q

 A statistic showing the amount of variation or spread in the scores for, or values of, a variable.

A

Measure of dispersion

When the dispersion is large, the scores or values are widely scattered; when it is small, they are tightly clustered. Two commonly used measures of dispersion are the variance and the standard deviation. A measure of dispersion always implies the presence of a measure of central tendency, such as a mean. For example, the standard deviation measures deviation from the mean.

35
Q

The mean value of a variable in repeated samplings or trials

A

Expected value

36
Q

The mean of the sampling distribution of a statistic.

A

Expected value

37
Q

The middle half of a distribution. A measure of dispersion calculated by taking the difference between the first and third quartiles (that is, the 25th and 75th percentiles). Also called “midspread.”

A

Interquartile Range (IQR) 

38
Q

 A subject or other unit of analysis that has an extreme value on a variable or a combination of variables or has a large residual value.

A

Outlier

Outliers are important because they can distort the interpretation of data or make misleading a statistic that summarizes values (such as a mean). Outliers may also indicate that a sampling error has occurred by including a case from a population different than the target population.

39
Q

Divisions of the total rank-ordered cases or observations in a study into four groups of equal size. Technically, the three points that divide a series of ordered scores into four groups.

A

Quartiles

40
Q

A measure of variability, of the spread or the dispersion of values in a series of values.

A

Range

To get the range of a set of scores, you subtract the lowest value or score from the highest.

41
Q

A statistic that shows the spread, variability, or dispersion of scores in a distribution of scores. It is a measure of the average amount the scores in a distribution deviate from the mean.

A

Standard deviation

The standard deviation is the square root of the variance. As a variable, it is symbolized as SD, Sd, s, or lowercase sigma (σ).

42
Q

 A measure of the spread of scores in a distribution of scores, that is, a measure of dispersion.

A

Variance

The larger the variance, the farther the individual cases are from the mean. The smaller the variance, the closer the individual scores are to the mean.

Specifically, the variance is the mean of the sum of the squared deviations from the mean score divided by number of scores. That is, it’s the average distance from the mean in squared units. (See sum of squares for an example.) Taking the square root of the variance gives you the standard deviation (i.e., it converts the variance into regular, nonsquared units). A variance cannot be less than zero, nor can the standard deviation.