Descriptive Stats / Intro Flashcards

1
Q

A frequency distribution table is a summary table that shows the number of occurrences (frequency) of different values or ranges of values in a dataset.

A

A frequency distribution table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

a graphical representation of the distribution of a dataset, displaying the frequencies of data values within specific intervals or bins.

A

A histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Table that shows the frequencies or proportions up to a certain point in a dataset, providing a running total of the frequencies.

A

A cumulative distribution table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

refers to a distribution with kurtosis equal to the normal distribution, indicating a moderate peakedness and tail behaviour.

A

Mesokurtic kurtosis (normal Kurtosis)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

s a distribution with a higher peak and heavier tails than the normal distribution, indicating more extreme values.

A

Leptokurtic kurtosis (positive kurtosis)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

distribution with a lower peak and lighter tails than the normal distribution, indicating fewer extreme values.

A

Platykurtic kurtosis (negative kurtosis)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between inferential and descriptive statistics?

A

Descriptive statistics summarise and describe data, while inferential statistics make predictions or inferences about a population based on a sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are descriptive statistics?

A

Descriptive statistics are methods used to summarise and describe the main aspects of a dataset, such as central tendency, variability, and distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the main aspects of a dataset that descriptive statistics summarise

A

Central tendency
Variability
distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

If the numbering scheme is arbitrary then it’s probably best to use the —– as a measure of central tendency.

A

Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

If your data are ordinal scale you’re more likely to want to use the ——- as a measure of central tendency.

A

median

(The median only makes use of the order information in your data (i.e., which numbers are bigger) but doesn’t depend on the precise numbers involved. That’s exactly the situation that applies when your data are ordinal scale. The mean, on the other hand, makes use of the precise numeric values assigned to the observations, so it’s not really appropriate for ordinal data.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The —— has the advantage that it uses all the information in the data (which is useful when you don’t have a lot of data). But it’s very sensitive to extreme, outlying values.

A

mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

——- of the data. That is, how “spread out” are the data? How “far” away from the mean or median do the observed values tend to be?

A

variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

the 50th percentile is the same as the ——– value

A

median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The —— ——– (—-) is like the range, but instead of the difference between the biggest and smallest value the difference between the 25th percentile and the 75th percentile is taken.

A

The interquartile range (IQR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Variability
Mean absolute deviation

A

deviations, added and averaged

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is the RMSD

A

“root mean squared deviation”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Properties of distributions.

A
  • What the central tendency is (mean, median or mode).
  • How symmetrical the data is either side of the mean (skew).
  • How variable the data is (e.g. data range, standard deviation and kurtosis). * If it’s a “normal distribution”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

It’s often extremely useful to try to condense the data into a few simple “summary” statistics. In most situations, the first thing that you’ll want to calculate is a measure of ——- ———

A

central tendency

20
Q

If your data are nominal scale you probably why shouldn’t you be using either the mean or the median.

A

Both the mean and the median rely on the idea that the numbers assigned to values are meaningful. If the numbering scheme is arbitrary then it’s probably best to use the Mode instead.

21
Q

f your data are ordinal scale you’re more likely to want to use the median than the mean because

A

The median only makes use of the order information in your data but doesn’t depend on the precise numbers involved.

(The mean, makes use of the precise numeric values so it’s not really appropriate for ordinal data.)

22
Q

For interval and ratio scale data you can use :

A

either one (median or mean) is generally acceptable.

(Which one you pick depends a bit on what you’re trying to achieve. The mean has the advantage that it uses all the information in the data (which is useful when you don’t have a lot of data). But it’s very sensitive to extreme, outlying values.)

23
Q

there are systematic differences between the mean and the median when

A

the histogram is asymmetric (Skew and kurtosis)

(average income example - median is more appropraiate as mean will give an exaggerated view )

24
Q

The mean can be rememebered as the

A

centre of garvity or the balancing point of the data

25
Q

out of “mean absolute deviation” (from the mean)
and
“median absolute deviation” (from the median).

which seems to be the better of the two?

A

the measure based on the median seems to be used in statistics and does seem to be the better of the two.

(But to be honest I don’t think I’ve seen it used much in psychology.)

26
Q

X hat equals

A

The Mean

27
Q

deviation from the mean

A

Score - Xhat

(fisrt step in absolute deviation from the mean)

28
Q

Absolute deviation from the mean -

A

deviation from the mean (avaeraged)

29
Q

The variance of a data set X
is sometimes written as Var( X)
, but it’s more commonly denoted S2

A

s2 (s-squared)

30
Q

How does jamovi calculate varience differently?

A

divides by N-1

31
Q

RMSD Root Mean Squared Deviation is

A

the square root of the varience

32
Q

What are the two catagories of descriptive stats?

A

Measures of Central Tendencies and measures of Dispersion

33
Q

Take the square root of the variance, known as the standard deviation, also called the

A

“root mean squared deviation”, or RMSD.

34
Q

range, varience and standard deviation are all measures of

A

Measures of Dispersion

35
Q

the standard deviation is derived from the ——-

A

variance

36
Q

In general, you should expect –% of the data to fall within 1 standard deviation of the mean, –% of the data to fall within 2 standard deviation of the mean, and –% of the data to fall within 3 standard deviations of the mean

A

68, 95, 99.7

(but it’s not exact. It’s actually calculated based on an assumption that the histogram is symmetric and “bell shaped”) it is approximately correct

37
Q

Gives you the full spread of the data. It’s very vulnerable to outliers and as a consequence it isn’t often used unless you have good reasons to care about the extremes in the data.

A

Range

38
Q

Tells you where the “middle half” of the data sits. It’s pretty robust and complements the median nicely. This is used a lot.

A

Interquartile range

39
Q

Tells you how far “on average” the observations are from the mean. It’s very interpretable but has a few minor issues (not discussed here) that make it less attractive to statisticians than the standard deviation. Used sometimes, but not often.

A

Mean absolute deviation

39
Q

Tells you the average squared deviation from the mean. It’s mathematically elegant and is probably the “right” way to describe variation around the mean, but it’s completely uninterpretable because it doesn’t use the same units as the data. Almost never used except as a mathematical tool, but it’s buried “under the hood” of a very large number of statistical tools.

A

Variance

40
Q

the — and the —— ——– are easily the two most common measures used to report the variability of the data.

A

IQR and the standard deviation

But there are situations in which the others are used. I’ve described all of them in this book because there’s a fair chance you’ll run into most of these somewhere.

40
Q

This is the square root of the variance. It’s fairly elegant mathematically and it’s expressed in the same units as the data so it can be interpreted pretty well. In situations where the mean is the measure of central tendency, this is the default. This is by far the most popular measure of variation.

A

Standard deviation

41
Q

A Standard score is referred to as

A

Z-score

42
Q

The standard score is defined as

A

the number of standard deviations above the mean that my score lies

43
Q

standard score (z-score) =

A

35-mean divided by sample

44
Q

—– ——- allow you to interpret a raw score in relation to a larger population (and thereby allowing you to make sense of variables that lie on arbitrary scales),

A

standard scores (z-scores)

45
Q

stadard scores can also be used to

A

compared to one another in polls where the raw scores arescaled idfferently to each other.