Ch. 12 Flashcards

1
Q

Descriptive statistics

A

Refers to a set of techniques for summarizing and displaying data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

distribution

A

The way scores are distributed across levels of a variable.

Every variable has a distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Frequency Tables

A

A display of each value of a variable and the number of participants with that value.

The first column lists the values of the variable—the possible scores on the Rosenberg scale—and the second column lists the frequency of each score.

There are a few other points worth noting about frequency tables.

First, the levels listed in the first column usually go from the highest at the top to the lowest at the bottom, and they usually do not extend beyond the highest and lowest scores in the data.

Second, when there are many different scores across a wide range of values, it is often better to create a grouped frequency table, in which the first column lists ranges of values and the second column lists the frequency of scores in each range.

In a grouped frequency table, the ranges must all be of equal width, and there are usually between five and 15 of them.

Finally, frequency tables can also be used for categorical variables, in which case the levels are category labels.

The order of the category labels is somewhat arbitrary, but they are often listed from the most frequent at the top to the least frequent at the bottom.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Histograms

A

A graphical display of a frequency distribution.

It presents the same information as a frequency table but in a way that is even quicker and easier to grasp.

The x-axis of the histogram represents the variable and the y-axis represents frequency.

Above each level of the variable on the x-axis is a vertical bar that represents the number of individuals with that score.

When the variable is quantitative, there is usually no gap between the bars.

When the variable is categorical, however, there is usually a small gap between them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Distribution Shapes

A

When the distribution of a quantitative variable is displayed in a histogram, it has a shape.

There is a peak somewhere near the middle of the distribution and “tails” that taper in either direction from the peak.

Another characteristic of the shape of a distribution is whether it is symmetrical or skewed.

symmetrical: When a histogram’s left and right halves are mirror images of each other.

skewed: When a histogram’s peak is either shifted toward the upper end of its range and has a relatively long negative tail (Negatively Skewed) or the peak is shifted toward the lower end of its range and has a relatively long positive tail (Positively Skewed).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

outlier

A

An extreme score that is much higher or lower than the rest of the scores in the distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Measures of Central Tendency and Variability

A

It is also useful to be able to describe the characteristics of a distribution more precisely. Here we look at how to do this in terms of two important characteristics: their central tendency and their variability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Central Tendency

A

Is the middle of a distribution—the point around which the scores in the distribution tend to cluster. (Another term for central tendency is average.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

mean

A

The average of a distribution of scores (symbolized M) where the sum of the scores are divided by the number of scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

median

A

The midpoint of a distribution of scores in the sense that half the scores in the distribution are less than it and half are greater than it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

mode

A

The most frequently occurring score in a distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

variability

A

The extent to which the scores vary around their central tendency in a distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

range

A

A measure of dispersion that measures the distance between the highest and lowest scores in a distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Standard deviation

A

Is the average distance between the scores and the mean in a distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

variance

A

A measurement of the average distance of scores from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

percentile rank

A

For any given score, the percentage of scores in the distribution that are lower than that score.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

z score

A

Is the difference between that individual’s score and the mean of the distribution, divided by the standard deviation of the distribution. It represents the number of standard deviations the score is from the mean.

18
Q

Differences Between Groups or Conditions

A

Differences between groups or conditions are usually described in terms of the mean and standard deviation of each group or condition.

It is also important to be able to describe the strength of a statistical relationship, which is often referred to as the effect size.

The most widely used measure of effect size for differences between group or condition means is called Cohen’s d, which is the difference between the two means divided by the standard deviation:

19
Q

Cohen’s d

A

d = (M1 −M2)/SD

In this formula, it does not really matter which mean is M1 and which is M2.

If there is a treatment group and a control group, the treatment group mean is usually M1 and the control group mean is M2.

Otherwise, the larger mean is usually M1 and the smaller mean M2 so that Cohen’s d turns out to be positive.

Indeed Cohen’s d values should always be positive so it is the absolute difference between the means that is considered in the numerator.

The standard deviation in this formula is usually a kind of average of the two group standard deviations called the pooled-within groups standard deviation.

To compute the pooled within-groups standard deviation, add the sum of the squared differences for Group 1 to the sum of squared differences for Group 2, divide this by the sum of the two sample sizes, and then take the square root of that.

Informally, however, the standard deviation of either group can be used instead.

Conceptually, Cohen’s d is the difference between the two means expressed in standard deviation units.

A Cohen’s d of 0.50 means that the two group means differ by 0.50 standard deviations (half a standard deviation).

A Cohen’s d of 1.20 means that they differ by 1.20 standard deviations

20
Q

Cohen’s d

how should we interpret these values in terms of the strength of the relationship or the size of the difference between the means?

A

Values near 0.20 are considered small, values near 0.50 are considered medium, and values near 0.80 are considered large.

Thus a Cohen’s d value of 0.50 represents a medium-sized difference between two means, and a Cohen’s d value of 1.20 represents a very large difference in the context of psychological research.

Cohen’s d is useful because it has the same meaning regardless of the variable being compared or the scale it was measured on.

A Cohen’s d of 0.20 means that the two group means differ by 0.20 standard deviations whether we are talking about scores on the Rosenberg Self-Esteem scale, reaction time measured in milliseconds, number of siblings, or diastolic blood pressure measured in millimeters of mercury.

Not only does this make it easier for researchers to communicate with each other about their results, it also makes it possible to combine and compare results across different studies using different measures.

21
Q

Correlations Between Quantitative Variables

linear relationships

A

Relationships between two variables whereby the points on a scatterplot fall close to a single straight line.

22
Q

Correlations Between Quantitative Variables

Nonlinear relationships

A

Relationships between two variables in which the points on a scatterplot do not fall close to a single straight line, but often fall along a curved line.

23
Q

restriction of range

A

When one or both variables have a limited range in the sample relative to the population, making the value of the correlation coefficient misleading.

24
Q

Presenting Descriptive Statistics in Writing

A

using words only for numbers less than 10 that do not represent precise statistical results and using numerals for numbers 10 and higher.

However, statistical results are always presented in the form of numerals rather than words and are usually rounded to two decimal places

25
Q

Presenting Descriptive Statistics in Figures

Figures

A

When you have a large number of results to report, you can often do it more clearly and efficiently with a graphical depiction of the data, such as pie charts, bar graphs, or scatterplots.

Graphical depictions of data, such as pie charts, bar graphs, or scatterplots used to clearly and efficiently report a number of results.

26
Q

When you prepare figures for an APA-style research report, there are some general guidelines that you should keep in mind.

A

First, the figure should always add important information rather than repeat information that already appears in the text or in a table (if a figure presents information more clearly or efficiently, then you should keep the figure and eliminate the text or table.)

Second, figures should be as simple as possible.

For example, the Publication Manual discourages the use of color unless it is absolutely necessary (although color can still be an effective element in posters, slide show presentations, or textbooks.)

Third, figures should be interpretable on their own.

A reader should be able to understand the basic result based only on the figure and its caption and should not have to refer to the text for an explanation.

27
Q

There are also several more technical guidelines for presentation of figures that include the following

A

Layout of graphs
In general, scatterplots, bar graphs, and line graphs should be slightly wider than they are tall.
The independent variable should be plotted on the x-axis and the dependent variable on the y-axis.
Values should increase from left to right on the x-axis and from bottom to top on the y-axis.
The x-axis and y-axis should begin with the value zero.

Axis Labels and Legends
Axis labels should be clear and concise and include the units of measurement if they do not appear in the caption.
Axis labels should be parallel to the axis.
Legends should appear within the figure.
Text should be in the same simple font throughout and no smaller than 8 point and no larger than 14 point.

Captions
Captions are titled with the word “Figure”, followed by the figure number in the order in which it appears in the text, and terminated with a period. This title is italicized.
After the title is a brief description of the figure terminated with a period (e.g., “Reaction times of the control versus experimental group.”)
Following the description, include any information needed to interpret the figure, such as any abbreviations, units of measurement (if not in the axis label), units of error bars, etc.

28
Q

bar graphs

A

A graphical presentation of data as bars of varying size, generally used to present and compare the mean scores for two or more groups or conditions.

29
Q

error bars

A

Bars that represent the variability in each group or condition.

30
Q

standard error

A

The standard deviation of the group divided by the square root of the sample size of the group.

31
Q

Line Graphs

A

Graphs used when the independent variable is measured in a more continuous manner (e.g., time) or to present correlations between quantitative variables when the independent variable has, or is organized into, a relatively small number of distinct levels.

32
Q

Scatterplots

A

A graph that presents correlations between two quantitative variables, one on the x-axis and one on the y-axis.

Scores are plotted at the intersection of the values on each axis.

Each point in a scatterplot represents an individual rather than the mean for a group of individuals, and there are no lines connecting the points.

First, when the variables on the x-axis and y-axis are conceptually similar and measured on the same scale where they are measures of the same variable on two different occasions—this can be emphasized by making the axes the same length.

Second, when two or more individuals fall at exactly the same point on the graph, one way this can be indicated is by offsetting the points slightly along the x-axis.

Other ways are by displaying the number of individuals in parentheses next to the point or by making the point larger or darker in proportion to the number of individuals.

Finally, the straight line that best fits the points in the scatterplot, which is called the regression line, can also be included.

33
Q

Expressing Descriptive Statistics in Tables

A

Like graphs, tables can be used to present large amounts of information clearly and efficiently.

The same general principles apply to tables as apply to graphs.

They should add important information to the presentation of your results, be as simple as possible, and be interpretable on their own.

The most common use of tables is to present several means and standard deviations—usually for complex research designs with multiple independent and dependent variables.

Notice that the table includes horizontal lines spanning the entire table at the top and bottom, and just beneath the column headings. Furthermore, every column has a heading—including the leftmost column—and there are additional headings that span two or more columns that help to organize the information and present it more efficiently.

Finally, notice that APA-style tables are numbered consecutively starting at 1 (Table 1, Table 2, and so on) and given a brief but clear and descriptive title.

34
Q

correlation matrix

A

Shows the correlation coefficient between pairs of variables in the study.

Another common use of tables is to present correlations—usually measured by Pearson’s r—among several variables.

Notice here that only half the table is filled in because the other half would have identical values.

35
Q

Even when you understand the statistics involved, analyzing data can be a complicated process.

A

It is likely that for each of several participants, there are data for several different variables: demographics such as sex and age, one or more independent variables, one or more dependent variables, and perhaps a manipulation check.

Furthermore, the “raw” (unanalyzed) data might take several different forms—completed paper-and-pencil questionnaires, computer files filled with numbers or text, videos, or written notes—and these may have to be organized, coded, or combined in some way.

There might even be missing, incorrect, or just “suspicious” responses that must be dealt with.

36
Q

Prepare Your Data for Analysis

Whether your raw data are on paper or in a computer file (or both), there are a few things you should do before you begin analyzing them.

A

First, be sure they do not include any information that might identify individual participants and be sure that you have a secure location where you can store the data and a separate secure location where you can store any consent forms.

Unless the data are highly sensitive, a locked room or password-protected computer is usually good enough.

It is also a good idea to make photocopies or backup files of your data and store them in yet another secure location—at least until the project is complete.

Professional researchers usually keep a copy of their raw data and consent forms for several years in case questions about the procedure, the data, or participant consent arise after the project is completed.

37
Q

Prepare Your Data for Analysis

raw data

A

Unanalyzed data that has several different forms—completed paper-and-pencil questionnaires, computer files filled with numbers or text, videos, or written notes which may have to be organized, coded, or combined in some way.

Next, you should check your raw data to make sure that they are complete and appear to have been accurately recorded

At this point, you might find that there are illegible or missing responses, or obvious misunderstandings (e.g., a response of “12” on a 1-to-10 rating scale).

You will have to decide whether such problems are severe enough to make a participant’s data unusable.

If information about the main independent or dependent variable is missing, or if several responses are missing or suspicious, you may have to exclude that participant’s data from the analyses.

If you do decide to exclude any data, do not throw them away or delete them because you or another researcher might want to see them later.

Instead, set them aside and keep notes about why you decided to exclude them because you will need to report this information.

38
Q

Prepare Your Data for Analysis

data file

A

Now you are ready to enter your data in a spreadsheet program or, if it is already in a computer file, to format it for analysis.

You can use a general spreadsheet program like Microsoft Excel or a statistical analysis program like SPSS to create your data file.

Data that has been entered into a spreadsheet and formatted in order to be analyzed.

The most common format is for each row to represent a participant and for each column to represent a variable (with the variable name at the top of each column).

Categorical variables can usually be entered as category labels (e.g., “M” and “F” for male and female) or as numbers (e.g., “0” for negative mood and “1” for positive mood). Although category labels are often clearer, some analyses might require numbers.

If you have multiple-response measures—such as the self-esteem measure you could combine the items by hand and then enter the total score in your spreadsheet.

However, it is much better to enter each response as a separate variable in the spreadsheet and use the software to combine them.

Not only is this approach more accurate, but it allows you to detect and correct errors, to assess internal consistency, and to analyze individual responses if you decide to do so later.

39
Q

Preliminary Analyses

Before turning to your primary research questions, there are often several preliminary analyses to conduct.

A

For multiple-response measures, you should assess the internal consistency of the measure.

Statistical programs like SPSS will allow you to compute Cronbach’s α or Cohen’s κ.

If this is beyond your comfort level, you can still compute and evaluate a split-half correlation.

Next, you should analyze each important variable separately. (This step is not necessary for manipulated independent variables, of course, because you as the researcher determined what the distribution would be.)

Make histograms for each one, note their shapes, and compute the common measures of central tendency and variability.

Be sure you understand what these statistics mean in terms of the variables you are interested in.

Now is the time to identify outliers, examine them more closely, and decide what to do about them.

You might discover that what at first appears to be an outlier is the result of a response being entered incorrectly in the data file, in which case you only need to correct the data file and move on.

Alternatively, you might suspect that an outlier represents some other kind of error, misunderstanding, or lack of effort by a participant.

do not literally throw away or delete the data that you choose to exclude. Just set them aside because you or another researcher might want to see them later.

Keep in mind that outliers do not necessarily represent an error, misunderstanding, or lack of effort.

They might represent truly extreme responses or participants.

One strategy here would be to use the median and other statistics that are not strongly affected by the outliers.

Another would be to analyze the data both including and excluding any outliers.

If the results are essentially the same, which they often are, then it makes sense to leave the outliers.

If the results differ depending on whether the outliers are included or excluded them, then both analyses can be reported and the differences between them discussed.

40
Q

planned analysis

A

Used to test a relationship that you expected in your hypothesis.

For example, if you expected a difference between group or condition means, you can compute the relevant group or condition means and standard deviations, make a bar graph to display the results, and compute Cohen’s d.

If you expected a correlation between quantitative variables, you can make a line graph or scatterplot (be sure to check for nonlinearity and restriction of range) and compute Pearson’s r.

41
Q

exploratory analysis

A

An analysis used to examine the possibility that there might be relationships in the data that you did not hypothesize.

Once you have conducted your planned analyses, you can move on to examine the possibility there might be relationships in the data that you did not hypothesize.

These analyses will help you explore your data for other interesting results that might provide the basis for future research (and material for the discussion section of your paper).

It is important to differentiate planned from exploratory analyses in writing your results and discussion sections of your report.

This is because complex sets of data are likely to include “patterns” that occurred entirely by chance, and every time you do another unplanned analysis on these data, you increase the likelihood these chance patterns will appear to be real patterns, what is referred to as a “Type 1” error.

Thus results discovered while doing exploratory analyses should be viewed skeptically and replicated in at least one new study before being presented.

But, if you do find interesting relationships you did not expect in the data, explain that they might be worthy of additional research.

42
Q

Understand Your Descriptive Statistics

A

Although inferential statistics are important for reasons beginning researchers sometimes forget that their descriptive statistics really tell “what happened” in their study.

For example, imagine that a treatment group of 50 participants has a mean score of 34.32 (SD = 10.45), a control group of 50 participants has a mean score of 21.45 (SD = 9.22), and Cohen’s d is an extremely strong 1.31.

Although conducting and reporting inferential statistics (like a t test) would certainly be a required part of any formal report on this study, it should be clear from the descriptive statistics alone that the treatment worked.

Or imagine that a scatterplot shows an indistinct “cloud” of points and Pearson’s r is a trivial −.02.

Again, although conducting and reporting inferential statistics would be a required part of any formal report on this study, it should be clear from the descriptive statistics alone that the variables are essentially unrelated.

The point is that you should always be sure that you thoroughly understand your results at a descriptive level first, and then move on to the inferential statistics.