Descriptive Stats / Intro Flashcards by Edwina O'Connell

A frequency distribution table is a summary table that shows the number of occurrences (frequency) of different values or ranges of values in a dataset.

A frequency distribution table

How well did you know this?

Not at all

Perfectly

a graphical representation of the distribution of a dataset, displaying the frequencies of data values within specific intervals or bins.

A histogram

How well did you know this?

Not at all

Perfectly

Table that shows the frequencies or proportions up to a certain point in a dataset, providing a running total of the frequencies.

A cumulative distribution table

How well did you know this?

Not at all

Perfectly

refers to a distribution with kurtosis equal to the normal distribution, indicating a moderate peakedness and tail behaviour.

Mesokurtic kurtosis (normal Kurtosis)

How well did you know this?

Not at all

Perfectly

s a distribution with a higher peak and heavier tails than the normal distribution, indicating more extreme values.

Leptokurtic kurtosis (positive kurtosis)

How well did you know this?

Not at all

Perfectly

distribution with a lower peak and lighter tails than the normal distribution, indicating fewer extreme values.

Platykurtic kurtosis (negative kurtosis)

How well did you know this?

Not at all

Perfectly

What is the difference between inferential and descriptive statistics?

Descriptive statistics summarise and describe data, while inferential statistics make predictions or inferences about a population based on a sample.

How well did you know this?

Not at all

Perfectly

What are descriptive statistics?

Descriptive statistics are methods used to summarise and describe the main aspects of a dataset, such as central tendency, variability, and distribution.

How well did you know this?

Not at all

Perfectly

What are the main aspects of a dataset that descriptive statistics summarise

Central tendency
Variability
distribution

How well did you know this?

Not at all

Perfectly

If the numbering scheme is arbitrary then it’s probably best to use the —– as a measure of central tendency.

Mode

How well did you know this?

Not at all

Perfectly

If your data are ordinal scale you’re more likely to want to use the ——- as a measure of central tendency.

median

(The median only makes use of the order information in your data (i.e., which numbers are bigger) but doesn’t depend on the precise numbers involved. That’s exactly the situation that applies when your data are ordinal scale. The mean, on the other hand, makes use of the precise numeric values assigned to the observations, so it’s not really appropriate for ordinal data.)

How well did you know this?

Not at all

Perfectly

The —— has the advantage that it uses all the information in the data (which is useful when you don’t have a lot of data). But it’s very sensitive to extreme, outlying values.

mean

How well did you know this?

Not at all

Perfectly

——- of the data. That is, how “spread out” are the data? How “far” away from the mean or median do the observed values tend to be?

variability

How well did you know this?

Not at all

Perfectly

the 50th percentile is the same as the ——– value

median

How well did you know this?

Not at all

Perfectly

The —— ——– (—-) is like the range, but instead of the difference between the biggest and smallest value the difference between the 25th percentile and the 75th percentile is taken.

The interquartile range (IQR)

How well did you know this?

Not at all

Perfectly

Variability
Mean absolute deviation

deviations, added and averaged

How well did you know this?

Not at all

Perfectly

what is the RMSD

“root mean squared deviation”

How well did you know this?

Not at all

Perfectly

Properties of distributions.

What the central tendency is (mean, median or mode).
How symmetrical the data is either side of the mean (skew).
How variable the data is (e.g. data range, standard deviation and kurtosis). * If it’s a “normal distribution”

How well did you know this?

Not at all

Perfectly

It’s often extremely useful to try to condense the data into a few simple “summary” statistics. In most situations, the first thing that you’ll want to calculate is a measure of ——- ———

Study These Flashcards

central tendency

If your data are nominal scale you probably why shouldn’t you be using either the mean or the median.

Study These Flashcards

Both the mean and the median rely on the idea that the numbers assigned to values are meaningful. If the numbering scheme is arbitrary then it’s probably best to use the Mode instead.

f your data are ordinal scale you’re more likely to want to use the median than the mean because

Study These Flashcards

The median only makes use of the order information in your data but doesn’t depend on the precise numbers involved.

(The mean, makes use of the precise numeric values so it’s not really appropriate for ordinal data.)

For interval and ratio scale data you can use :

Study These Flashcards

either one (median or mean) is generally acceptable.

(Which one you pick depends a bit on what you’re trying to achieve. The mean has the advantage that it uses all the information in the data (which is useful when you don’t have a lot of data). But it’s very sensitive to extreme, outlying values.)

there are systematic differences between the mean and the median when

Study These Flashcards

the histogram is asymmetric (Skew and kurtosis)

(average income example - median is more appropraiate as mean will give an exaggerated view )

The mean can be rememebered as the

Study These Flashcards

centre of garvity or the balancing point of the data

out of “mean absolute deviation” (from the mean) and “median absolute deviation” (from the median). which seems to be the better of the two?

the measure based on the median seems to be used in statistics and does seem to be the better of the two. (But to be honest I don’t think I’ve seen it used much in psychology.)

X hat equals

The Mean

deviation from the mean

Score - Xhat (fisrt step in absolute deviation from the mean)

Absolute deviation from the mean -

deviation from the mean (avaeraged)

The variance of a data set X is sometimes written as Var( X) , but it’s more commonly denoted S2

s2 (s-squared)

How does jamovi calculate varience differently?

divides by N-1

RMSD Root Mean Squared Deviation is

the square root of the varience

What are the two catagories of descriptive stats?

Measures of Central Tendencies and measures of Dispersion

Take the square root of the variance, known as the standard deviation, also called the

“root mean squared deviation”, or RMSD.

range, varience and standard deviation are all measures of

Measures of Dispersion

the standard deviation is derived from the -------

variance

In general, you should expect --% of the data to fall within 1 standard deviation of the mean, --% of the data to fall within 2 standard deviation of the mean, and --% of the data to fall within 3 standard deviations of the mean

68, 95, 99.7 (but it’s not exact. It’s actually calculated based on an assumption that the histogram is symmetric and “bell shaped”) it is approximately correct

Gives you the full spread of the data. It’s very vulnerable to outliers and as a consequence it isn’t often used unless you have good reasons to care about the extremes in the data.

Range

Tells you where the “middle half” of the data sits. It’s pretty robust and complements the median nicely. This is used a lot.

Interquartile range

Tells you how far “on average” the observations are from the mean. It’s very interpretable but has a few minor issues (not discussed here) that make it less attractive to statisticians than the standard deviation. Used sometimes, but not often.

Mean absolute deviation

Tells you the average squared deviation from the mean. It’s mathematically elegant and is probably the “right” way to describe variation around the mean, but it’s completely uninterpretable because it doesn’t use the same units as the data. Almost never used except as a mathematical tool, but it’s buried “under the hood” of a very large number of statistical tools.

Variance

the --- and the ------ -------- are easily the two most common measures used to report the variability of the data.

IQR and the standard deviation But there are situations in which the others are used. I’ve described all of them in this book because there’s a fair chance you’ll run into most of these somewhere.

This is the square root of the variance. It’s fairly elegant mathematically and it’s expressed in the same units as the data so it can be interpreted pretty well. In situations where the mean is the measure of central tendency, this is the default. This is by far the most popular measure of variation.

Standard deviation

A Standard score is referred to as

Z-score

The standard score is defined as

the number of standard deviations above the mean that my score lies

standard score (z-score) =

35-mean divided by sample

----- ------- allow you to interpret a raw score in relation to a larger population (and thereby allowing you to make sense of variables that lie on arbitrary scales),

standard scores (z-scores)

stadard scores can also be used to

compared to one another in polls where the raw scores arescaled idfferently to each other.

Descriptive Stats / Intro Flashcards

(47 cards)