Unit 2 Flashcards

Univariate descriptive statistics

1
Q

What is univariate descriptive statistics?

A

Univariate descriptive statistics provides a summarized description and analysis of a single variable. It aims to answer questions like:
●What are the scores in the variable?
●Are there significant differences among its values?
●What is the proportion of subjects exceeding a certain value?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is absolute frequency (fi)?

A

Absolute frequency (fi) represents the number of times a specific value of a variable is repeated within a dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is relative frequency (f’i)?

A

Relative frequency (f’i) is the proportion of a particular value’s frequency compared to the total sample size. It is calculated as: f’i = (frequency of a value) / (total sample size)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is percentage (pi)

A

Percentage (pi) expresses the proportion of a value within the sample as a percentage out of 100. It is calculated by multiplying the relative frequency (f’i) by 100: pi = f’i * 100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is cumulative relative frequency (F’i)?

A

Cumulative relative frequency (F’i) represents the cumulative proportion of values up to a certain point in the distribution. It is calculated as: F’i = (cumulative frequency of a value) / (total sample size)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is cumulative percentage (Pi)?

A

Cumulative percentage (Pi) expresses the cumulative relative frequency (F’i) as a percentage. It is calculated as: Pi = F’i * 100.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a cyclogram/pie chart? When is it used?

A

A cyclogram, also known as a pie chart, is a circular graph divided into slices, with each slice’s size representing the frequency of a corresponding value. It can be used to display absolute frequency, relative frequency, or percentages. Cyclograms are suitable for nominal, ordinal, and discrete quantitative variables with relatively few distinct values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a bar chart? When is it used?

A

A bar chart uses bars of varying heights to represent the frequencies of different values. The height of each bar corresponds to the frequency of the value it represents. Bar charts can display absolute, relative, or percentage frequencies. They are suitable for nominal, ordinal, and discrete quantitative variables with a limited number of values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a polygon of frequencies? When is it used?

A

A polygon of frequencies, also known as a frequency polygon, is a line graph that connects points representing the frequencies of different values. Points on the graph represent the frequency of each value and are connected by lines. It is particularly useful for comparing groups or illustrating data profiles, typically for quantitative variables, especially discrete ones.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a histogram? When is it used?

A

A histogram resembles a bar chart but uses connected bars to represent the frequencies of continuous data grouped into intervals. This connected bar format highlights the continuous nature of the variable. Histograms are suitable for displaying continuous quantitative variables, and data is grouped into class intervals when dealing with a large number of values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a stem-and-leaf diagram? When is it used?

A

A stem-and-leaf diagram is a way to visualize data by separating each data point into a ‘stem’ and a ‘leaf,’ revealing the distribution’s shape. It is helpful in identifying potential outliers or unusual patterns in the variable’s distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a box plot? When is it used?

A

A box plot, also known as a box-and-whisker plot, provides a visual summary of a dataset’s distribution based on quartiles, effectively showing the data’s form, including its symmetry and potential outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the four properties of a frequency distribution?

A

●Central tendency: The point around which the data tends to cluster.
●Variability: The degree to which data points are spread out from the center.
●Skewness: The extent to which data is distributed symmetrically or asymmetrically around the central tendency.
●Kurtosis: The peakedness or flatness of the distribution, indicating data concentration around the center.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the types of kurtosis?

A

●Mesokurtic: A normal distribution with a balanced shape.
●Leptokurtic: Positive kurtosis, characterized by a tall and narrow peak, indicating data concentration at the center.
●Platykurtic: Negative kurtosis, characterized by a flatter distribution with more data in the tails.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can you determine skewness from a SPSS output?

A

●Statistic < (Error x 2) = Symmetrical distribution
●Statistic > (Error x 2) = Asymmetrical distribution
●- = Negative skewness
●+ = Positive skewness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can you determine kurtosis from a SPSS output

A

●-0.5 – 0.5 = Mesokurtic
●<-0.5 = Platykurtic

17
Q

What are measures of position? When are they used?

A

Measures of position, also known as quantiles, pinpoint a value’s location within a distribution relative to other values. They divide datasets into equal parts, helping to understand a value’s relative standing. These measures are commonly used in psychological test scales and require variables to be at least on an ordinal scale

18
Q

What are the types of quantiles?

A

●Centiles/Percentiles (Ck or Pk): Values that divide the data into 100 equal parts, with a specific centile representing the value below which a certain percentage of the sample falls.
●Deciles (Dk): Values that divide the data into 10 equal parts, each representing 10% of the sample.
●Quartiles (Qk): Values that divide the data into 4 equal parts, each representing 25% of the sample

19
Q

What is the formula to calculate the position of a percentile?

A

The formula to calculate the position of a percentile is: Pk = (k / 100) * (n + 1) Where:
●k = The desired percentile
●n = The sample size

20
Q

What is the formula to calculate the position of a decile?

A

The formula to calculate the position of a decile is: Dk = (k / 10) * (n + 1) Where:
●k = The desired decile
●n = The sample size

21
Q

What is the formula to calculate the position of a quartile?

A

The formula to calculate the position of a quartile is: Qk = (k * (n + 1)) / 4 Where:
●k = The desired quartile
●n = The sample size

22
Q

What is interpolation and when is it used with quantiles?

A

When the calculated position of a quantile (percentile, decile, or quartile) falls between two data points (resulting in a decimal), interpolation is used to determine the precise quantile value.

23
Q

What is the formula for interpolation when calculating quantiles?

A

The interpolation formula is: Quantile (P, D, or Qk) = E1 + ((E2 - E1) * e) Where:
●E1 = Value at the position of the quantile
●E2 = The following value in the dataset
●e = The decimal portion of the calculated position

24
Q

What are measures of central tendency? What are some examples

A

Measures of central tendency represent the average or typical value of a dataset, providing insights into the data’s central tendency. The most common measures of central tendency are:
●Mode (Mo): The value that occurs most frequently in the distribution.
●Median (Mdn): The middle value when the data is arranged in order.
●Mean (M): The sum of all values divided by the number of values (the average)

25
Q

What is the median?

A

The median is the middle value in a dataset when it is sorted from smallest to largest. The median splits the data in half, with 50% of the values above it and 50% below it. In a normal distribution, it is equivalent to the 50th percentile (P50), the 5th decile (D5), and the 2nd quartile (Q2).

26
Q

What is the mean?

A

The mean is the arithmetic average of a dataset, calculated by summing all values and dividing by the number of values. It is often represented by the symbol x̄. The mean is sensitive to extreme values (outliers), which can distort its representation of the typical value.

27
Q

What are measures of variability?

A

Measures of variability describe the spread or dispersion of data points in a dataset. They tell us how much individual values differ from each other and from the central tendency. They provide information about the data’s homogeneity or heterogeneity

28
Q

What are some common measures of variability?

A

Common measures of variability include:
●Range: The difference between the highest and lowest values in the data.
●Amplitude: Similar to range, it represents the difference between the maximum and minimum values.
●Interquartile Range (IQR): The range of the middle 50% of the data, calculated as the difference between the third quartile (Q3) and the first quartile (Q1).
●Variance: The average squared deviation of each data point from the mean.
●Standard Deviation: The square root of the variance, representing the average distance of data points from the mean.