Chapter 3 - Central Tendency Flashcards

1
Q

What is central tendency?

A

It is the center of the distribution.

It is the single value that is most typical/representative of the collected data.

Statistics that measure the average values of data sets.

Usual measures include the mean, median, and mode.

It is a score representative of an entire distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 3 different definitions of center in regard to central tendency?

A

Mean = The average score (our focus for this class)

Median = The middle score

Mode = The score that occurs most often

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is mean?

A

The average score

The mean evenly distributes the total amount of a variable across the sample members.

The “center of gravity” of the data set.

You find the mean by adding up all of your scores and dividing them by the number of scores you have. For example: If you have the following 6 scores 6, 12, 18, 24, 30, and 36 you’d add all of them together and divide that number by 6 (126/6 = 21). The mean is 21.

We focus on the mean in this class, applications
of the median and mode are less common in
psychological research.

You can only calculate one mean

Mean is the most sensitive to the tail of a distribution (graph) (will likely fall towards the “tail” - where the extreme values are).

You cannot calculate it for ordinal or nominal data

The mean is appropriate for numeric data ONLY (histogram/density plot/interval/ratio) and
quasi-interval (e.g., Likert) rating scales

Basically, you can ONLY use it for numerical data (ratio/interval)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is median?

A

The middle score, with half of the observations smaller and half of the observations larger.

The score that divides the distribution in half
(also called the 50th percentile)

To find the middle score (median) you’ll need to:
1. Order the scores from high to low
2. Find the middle score

If two numbers share the middle you’ll need to add those two numbers together and divide them by two

You can only calculate one median

The median is the most sensitive to the “body” of a distribution (graph) (will likely be located closer to the “body” and further from the tail).

You cannot use it to calculate nominal data

Median is especially useful for ordinal variables
with binned responses (bar plot/frequency distribution) or skewed numeric data (histogram/density plot/interval/ratio)

Basically you can use it for numerical data and ordinal data ONLY

  • If using a frequency table it’s best to find the median by using the cumulative percent data!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is mode?

A

The mode is the score(s) that occurs most often (e.g., the peak of the distribution)

Mode is typically the highest and it’s always associated with the peak

There can be more than one mode, even two, or three modes!

To find the mode you’ll need to:
1. Order the scores from high to low
2. Find the scores that occur the most often/most frequently
3. Those scores are your mode

You can use it to calculate nominal data - actually, you can only use mode (the peak/highest point) for nominal data

You can also use it to calculate ordinal data, but the median is especially useful

Mode is especially useful for categorical variables
with qualitatively different responses.

Basically, you can use mode for EVERYTHING!!!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In a symmetric distribution, what do the mean, median, and mode equal?

A

Extreme scores in both tails cancel out, and the
mean, median, and mode are roughly equal.

Mean = median = mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In a negatively skewed asymmetric distribution, what do the mean, median, and mode equal?

A

Mode > median > mean and each data point gets closer and closer to the left (negative) tail.

  • See slide 10
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In a positively skewed asymmetric distribution, what do the mean, median, and mode equal?

A

Mode > median > mean and each data point gets closer and closer to the right (positive) tail.

  • See slide 10
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

True or false: Mean, median, and mode are always the same?

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

True or false: The mean is the most widely
used in psychological research, where we want to compare averages of two or more groups?

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Behavioral scientists are concerned with
learning something about:

A: The entire population
B: The sample population
C: Both A & B

A

A: The entire population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the population mean?

A

The mean or average of all values in a given population.

It is calculated by the sum of all values in the population, denoted by the summation of X (the Greek capital letter, ∑ , is used to represent the sum, followed, by the letter x - so ∑x), divided by the number of population values denoted by Npop (see slide 26 for an image of the formula).

Population mean is denoted by “Mu” = μ

μ = ∑x ÷ Npop
* So basically the sum of all your data divided by the number of data points you have

This data is very hard to obtain because we almost never have access to an entire population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the sample mean?

A

It refers to the mean value of a sample of data calculated from within a large population of data.

It is calculated by the sum of all values in the sample population, denoted by the summation of X (the Greek capital letter, ∑ , is used to represent the sum, followed, by the letter x - so ∑x), divided by the number of sample population values denoted by N (see slide 26 for an image of the formula).

The sample mean is denoted by “x bar” = x̄

x̄ = ∑x ÷ N
* So basically the sum of all your data divided by the number of data points you have

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a population?

A

A population is the collection of all possible
participants we are interested in studying.

Denoted by Npop - all individuals in the U.S. population

Example: All 18 years olds in the United States

  • We can’t know the population mean because
    populations are usually too large to study
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a parameter?

A

Parameter = Population

The parameter is the value of a statistic such as
the mean computed from the full population.

It describes the whole population

The population mean is denoted by Mu = μ (mean of all data)

It’s everything related to the population (mean, median, mode, kurtosis, etc.).

  • We can’t know the population mean because
    populations are usually too large to study
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a sample?

A

A sample is a subset of individuals (hopefully
representative) from the population.

We draw a sample and compute the sample
mean

Denoted by “N”

17
Q

What is an estimate?

A

Estimate = sample

The value of a statistic (e.g., the mean) computed
from a sample

The sample mean is an estimate — an
approximation or best guess from the data (estimates the population mean)

The sample mean is denoted by “x bar” = x̄ (mean of a sample)

Sample mean, median, mode, etc. are all estimates of the population BUT…. sample estimates differ from population parameters

μ ≠ x̄

  • Kurtosis and skewness are also examples of estimates
18
Q

What is a sampling error?

A

The difference between the population/true mean
and our sample mean is a random sampling error

Error means difference

Although they share the same formula, the
population and sample means usually differ

The sample mean could be higher or lower than
the population/true mean of all possible
participants meaning you can either get a positive sampling error or a negative sampling error

You calculate the sampling error by:
* Using the sample mean x̄ and subtracting the population mean from it μ
* x̄ - μ

Examples:
* Positive sampling error - 3.5 x̄ - 3 μ = +.5
*Negative sampling error - 2.25 x̄ - 3 μ = - .75

19
Q

True or false: Monte Carlo computer simulation generates artificial scores from a hypothetical population.

A

True

20
Q

True or False: A simple R script can mimic the process of drawing a sample of scores from a population.

A

True - You can use it to create a study for you

It will look something like this:
sampledata <- rnorm (150, mean = 3, sd = 1)
* rnorm = R function that samples scores from a normal distribution
* 150 = Number of artificial scores/ Number of participants in this sample
* mean = 3, sd = 1 = Population statistics (the 3 was hypothetical, we made it up).

21
Q

What is the simple definition of standard deviation?

A

The avg. distance from the center of a distribution.

For instance, if the scale of the distribution was 1, 2, 3, 4, 5, the standard deviation would be 1 since the average distance from the center (3) is 1 (2 and 4).

22
Q

True or false: Estimates vary across samples?

A

True - The population will be fixed but you might want to use several different samples in a single study to compare against your population.

Those samples will be comprised of different scores so the estimates (sample mean x̄ ) won’t be the same for each sample

23
Q

What do you think will happen to the size of the sampling error if we use a large sample (10^5 = 100,000) compared to a sample size of 150?

A

Increasing the sample size decreases the sampling
error (the difference between the population/true mean
and our sample mean).

This is because as we increase the sample size the number will get closer and closer to the population size.

So the sample mean will be closer to the population mean.

24
Q

Which of the following statements is NOT correct?

A. A sample mean can be higher than the population mean

B. A sample mean can be lower than the population mean

C. A sample mean can be equal to the population mean

D. Sampling error is calculated as the sample mean minus the population mean

E. Sampling error is calculated as the population mean minus the sample mean

A

E

25
Q

What does “N” represent in an equation for finding the mean?

A

It’s used to refer to the number of observations that we’re averaging in the sample

Data example:
* 1, 6, 9, 23, 67, 78
* N = 6 (because there are 6 data points)

26
Q

What does “X” represent in an equation for finding the mean?

A

We need to attach a label to the observations themselves.

It’s traditional to use X for this, and to use subscripts to indicate which observation we’re actually talking about.

We’ll use X1 to refer to the first observation, X2 for the second observation, and so on all the way up to XN for the last one.

Data example:
* 1, 6, 9, 23, 67, 78
* X1, X2, X3, X4, X5, X6 (because there are 6 data points)

27
Q

What does “X̄” represent in an equation for finding the mean?

A

Notation for the sample mean - it’s the total

You would just write X̄ = ……..

28
Q

What does “Σ:” represent in an equation for finding the mean?

A

It’s a summation symbol and it’s used to shorten a written equation with many observations (X1, X2, X3, etc.)

It’s really just a fancy way of writing out the same thing I said in words: add all the values up and then divide by the total number of items.

29
Q

What is the role of descriptive statistics?

A

The role of descriptive statistics is to concisely summarise what we DO know.

30
Q

What is the role of inferential statistics?

A

The role of inferential statistics is to “learn what we do NOT know from what we do”.

The questions that lie at the heart of inferential statistics, are traditionally divided into two “big ideas”:
* Estimation
* Hypothesis testing

31
Q

What is a simple random sample?

A

A procedure in which every member of the population has the same chance of being selected

32
Q

What does sampling without replacement mean?

A

Pulling a sample without replacing it so it cannot be sampled more than once.

Most psychology experiments tend to be sampling without replacement because the same person is not allowed to participate in the experiment twice.

However, most statistical theory is based on the assumption that the data arise from a simple random sample with replacement.

EXAMPLE:
* A bag of chips with letters on them.
* If you do not put the chips back in the bag after pulling them out this means that you can’t observe the same thing twice, and in such cases, the observations are said to have been sampled without replacement.

33
Q

What does sampling with replacement mean?

A

Pulling a sample, replacing it, pulling another sample, replacing it, and so on and so forth. A sample can be pulled more than once.

Data sets generated in this way are still simple random samples, but because we put the sample back immediately after drawing it it’s referred to as a sample with replacement

The difference between this (with replacement) and without replacement is that it is possible to observe the same population member multiple times

Most psychology experiments tend to be sampling without replacement because the same person is not allowed to participate in the experiment twice.

However, most statistical theory is based on the assumption that the data arise from a simple random sample with replacement.