Unit 2 Flashcards

Univariate descriptive statistics

1
Q

What does ‘univariate’ means in statistics?

A

it refers to the analysis of one single variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are univariate descriptive statistics?

A

they provide a summarized description and analysis of one single variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What kind of questions can univariate descriptive statistics answer?

A

questions like “What are the scores in the variable X?”, “Are there many differences among its values?”, and “What is the proportion of subjects over 10 points?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is data?

A

Values that define the features of the participants in a set of variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the 7 steps in data analysis

A
  • building the dataset
  • label and identify the variables
  • exploratory analysis and solutions
  • description of the variables and the sample
  • inferences and hypothesis testing
  • presentation of results
  • Interpretation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In the dataset: what does each column represent?

A

represents a variable (e.g.: ID, gender, age, alcohol consumption, average grade)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is it important to label and identify variables in a dataset?

A

It ensures clarity on what each variable represents, making it easier to conduct analyses and interpret the results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What do we have to do in order to draw conclusions?

A

start counting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What do we have to do in the first 2 steps (building dataset, labeling and identifying the variables?)

A

labeling the rows and columns, writing in the variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is absolute frequency (fi)?

A

it is the number of times a value of a variable is repeated in a dataset
-> frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the absolute frequency of females (gender = 1) in a dataset where there are 3 females and 2 males?

A

The absolute frequency of females is 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does the ‘I’ stand for?

A

Category or value being analyzed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Relative frequency (f’i)?

A

Proportion (over 1) of the frequency of a certain value with respect to the total sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you calculate the Relative frequency (f’i)?

A

𝑓′𝑖 = fi:N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the Percentage (pi)?

A

Proportion over 100% that represent the value in the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you calculate the Percentage?

A

pi = f’i x 100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the Cumulative absolute frequency (Fi)?

A

Number of times a value or lower values are repeated in the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is Cumulative relative frequency (F’i)?

A

cumulative proportion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How do you calculate the cumulative relative frequency (F’i)?

A

F’i = Fi : N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How do you calculate the Cumulative percentage (Pi)

A

Pi = F’i x 100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is (fi)?

A

Absolute frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is (f’i)?

A

Relative frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is (pi)?

A

Percentage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is (Fi)?

A

Cumulative absolute frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is (F’i)?

A

Cumulative relative frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What purpose does Graphical representation have?

A

facilitating the understanding of the data and their characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What kind of charts are there?

A
  • Cyclograms / Pie charts
  • Bar chart / frequencies
  • Polygons
  • Histograms
  • Stem and leaf diagram
  • Box plot
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How are Cyclograms / Pie charts structured?

A

form of a circle divided into portions proportional to the frequency of the value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What types of frequencies can be shown in a pie chart?

A

absolute, relative frequency or percentage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

For which types of variables is a pie chart typically used?

A

Pie charts are used for nominal, ordinal, and discrete quantitative variables with a few distinct values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is a bar chart?

A

Bars representing the frequency (ordinate axis, Y-axis) of each value (abscissa axis,
X-axis).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What types of frequencies can be shown in a bar chart?

A

absolute, relative frequency or percentage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What types of variables are bar charts typically used for?

A

Bar charts are used for nominal, ordinal, and discrete quantitative variables with few values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What does a polygon of frequencies represent?

A

the frequency of each value, where points are plotted and connected by lines to show the distribution of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

what is the polygon of frequencies useful for?

A

comparing groups or describing profiles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What kind of variables are best suited for a frequency polygon?

A

Frequency polygons are most useful for quantitative variables, preferably discrete ones.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What does a histogram represent in a frequency distribution?

A

A histogram uses bars to represent the frequency (on the Y-axis) of each value or class interval (on the X-axis)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Why are the bars in a histogram unseparated?

A

to represent the continuity of the variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What types of frequencies can a histogram display?

A

A histogram can display absolute, relative, or percentage frequencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

How are large amounts of quantitative data handled in a histogram?

A

they are grouped into class intervals or classes to simplify the representation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What type of variables are best suited for a histogram?

A

continuous quantitative variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What is the purpose of a stem and leaf diagram?

A

Shows the order and shape of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What is the Stem and leaf diagram useful for?

A

evaluating possible anomalies in the distribution of the variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What does a box plot show?

A

the distribution of a variable using position indexes like the median and quartiles, providing information on symmetry and outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What are the four properties that characterize the shape of a frequency distribution?

A

Central tendency
Variability
Skewness
Kurtosis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What is Central tendency?

A

Place where the distribution is centered. Where the data are grouped.
e.g.: index of central tendency is the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What is Variability?

A

Degree of dispersion/concentration of observations with respect to the mean
or the rest of values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What is the difference between high and low variability?

A
  • Low: the data differ little from each other. They are more concentrated.
  • High: data differ a lot from each other. They are more scattered/dispersed.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What is Skewness?

A

degree to which most of the values presented by the participants are evenly distributed above and below the central tendency

50
Q

There are 3 types of distributions in Skewness, which ones?

A

Symmetrical
Positively skewed
Negatively skewed

51
Q

What is a symmetrical distribution in terms of skewness?

A

occurs when the mean divides the distribution into two identical and symmetrical halves

52
Q

What does a positively skewed distribution indicate?

A

data are concentrated in lower values, with most data grouped on the left side of the distribution.

53
Q

What does a negatively skewed distribution indicate?

A

data are concentrated in higher values, with most data grouped on the right side of the distribution

54
Q

What is Kurtosis?

A

degree of concentration of the data with respect to the central values. How flat or peaked it is.

55
Q

There are 3 kind of distributions in kurtosis, which ones?

A

Mesokurtic
Leptokurtic
Platykurtic

56
Q

Which graph is Mesocurtic?

A

Normal distribution. Balanced

57
Q

Which graph is Leptokurtic?

A

Positive kurtosis. greater concentration on the center (peak). pointy.

58
Q

Which graph is Platykurtic?

A

Negative kurtosis. greater concentration on the tails. flatter shape.

59
Q

can the same date be shown in different ways (in one of the charts)?

A

Yes

60
Q

What is a normal distribution in terms of frequency distribution?

A

symmetric and mesokurtic, no skewness and a moderate peak in data

61
Q

How is asymmetry (skewness) determined in SPSS?

A
  • Statistic < (Error x 2) = Symmetric.
  • Statistic > (Error x 2) = Asymmetric.
    • Negative skewness
    • Positive skewness
62
Q

What values represent which distribution in curtosis?

A
  • -0,5 – 0,5 = Mesokurtic
  • <-0,5 = Platykurtic
  • > 0,5 = Leptokurtic
63
Q

What are quantiles used for?

A

to locate values within a data set, dividing the data into equal parts and showing where a value lies relative to the total.

64
Q

What are the most common types of quantiles?

A

Quartiles (4 parts)
Deciles (10 parts)
Percentiles (100 parts)

65
Q

Where are quantiles often used?

A

in psychological test scales and other ordinal scale variables to divide data into equal parts

66
Q

What are centimes (Ck) or percentiles (Pk)?

A

Percentiles (Pk) or centiles (Ck) show how much of the data is below a certain value. They range from 1 to 99 and tell you what percentage of the data is smaller than that value.

67
Q

What does the value of a percentile indicate?

A

It indicates the POSITION in the data where a certain percentage (K%) of the observations fall below it

68
Q

How do you calculate the position of a percentile (Pk)?

A

Pk = k x (N + 1) : 100

Pk is percentile
k is the percentile number
N is the total number of observations

69
Q

What are deciles (Dk) in quantiles?

A

Deciles divide the data into 10 equal parts, where each portion represents 10% of the data.
- range from 1 to 9, indicate percentage of data below a specific value

70
Q

What is the formula to calculate the position of a decile (Dk)?

A

Dk = k x (N + 1) : 10

k is decile number
N is total number of observations

71
Q

What are quartiles (Qk) in quantiles?

A

Quartiles divide the data into 4 equal portions -> each section representing 25% of the data

72
Q

What is the formula to calculate the position of a quartile (Qk)?

A

Qk = k x (N + 1) : 4

k is decile number
N is total number of observations

73
Q

what is step one before calculating?

A

ordering the numbers (positions)

74
Q

What do we do, when the calculated position lies between two numbers (decimal) instead of an exact position?

A

additionally calculating with the interpolation formula:

P/D/Qk = E1 + (E2 - E1) x e

E1= Value matching the position of the quantile.
E2= Following value.
e = decimal of the position

75
Q

Recall all 3 formulas (Pk, Dk and Qk)

A

Pk = k x (n + 1) : 100
Dk = k x (n + 1) : 10
Qk = k x (n + 1) : 4

76
Q

What do measures of central tendency represent?

A

the average magnitude of all observed values of a variable, summarizing the dataset with a single value.

77
Q

What is central tendency regarding values?

A

Typical or most representative value of a group of scores.

78
Q

What do measures of central tendency establish in a dataset?

A

they establish a middle point or point of balance
-> center of the distribution.

79
Q

Why are measures of central tendency important in descriptive analysis?

A

They are the most used measure in descriptive analysis because they summarize the characteristics of a variable and allow for comparisons between datasets

80
Q

How do different types of central tendency measures help?

A

Different indices (mean, median, mode) are more appropriate depending on the characteristics and types of variables, helping to capture where the data are concentrated.

81
Q

What is the mode (Mo) in a dataset?

A

it is the values with the greatest frequency in the distribution, representing the most commonly occurring score

82
Q

Can the mode (Mo) be absent or have multiple values?

A

Yes, the mode may not exist at all, or there may be multiple modes:
Bimodal: 2 modes
Trimodal: 3 modes
Multimodal: More than 3 modes

83
Q

For which types of variables is the mode applicable?

A

can be used with nominal, ordinal and quantitative variables

84
Q

What is the median (Mdn) in a dataset?

A

the middle score when all scores are arranged from lowest to highest, effectively dividing the distribution into two equal halves (50%).

85
Q

How is the median represented in terms of percentiles and quartiles?

A

In a normal distribution, the median is equivalent to:
P50 = D5 = Q2

86
Q

What must be done before calculating the median?

A

The values must be lined up in ascending order

87
Q

For which types of variables is the median applicable?

A

ordinal and quantitative

88
Q

What is the mean (M)?

A

The average value of the distribution

89
Q

What is the formula for the Mean?

A

x = ∑xi : n

𝛴 = Sum of all scores
𝑥 = Scores of the variable
𝑖 = Position of each observation
𝑛 = Sample size

90
Q

What is the mean useful for?

A

Useful to compare the same variable between groups.

91
Q

What is the mean sensitive to?

A

outliers

92
Q

What do we complementary need for the mean?

A

measures of dispersion (how the data are distributed)

93
Q

What do measures of variability indicate?

A

They indicate the degree to which the values observed in the variable move away from/close to a central tendency value

94
Q

In Variability, which degrees of the values can be observed in a dataset?

A

spread out or close to a central tendency (mean, median, or mode)
-> more or less distant, dispersed

95
Q

What does greater dispersion in a dataset suggest?

A

the dataset is more heterogeneous

96
Q

What does low variability in a dataset suggest?

A

the dataset is more homogeneous

97
Q

What are the absolute measures of variability?

A
  • range
  • amplitude
  • interquartile range
  • variance
  • standard deviation
98
Q

What are the relative measures of variability?

A
  • variance quotient
  • standard score (Z)
99
Q

What does absolute in dataset mean?

A

direct measure, no comparison

100
Q

What does relative in dataset mean?

A

it is compared or associated with other values in the distribution, or with respect to a specific context

101
Q

What are the simplest ways to observe the lowest and highest values?

A

range and amplitude

102
Q

What does range mean in measures of variability?

A

from which value to which other value the data goes

103
Q

What does amplitude mean in measures of variability?

A

difference between the highest and lowest value
-> it is affected/distorted if there are very extreme values (95-12 = 83)

104
Q

What is one ways to solve the distortion by extreme values in amplitude?

A

The Interquartile Range

105
Q

What is the formula in Interquartile Range?

A

IQR = Q3 - Q1
-> Distance between Q3 and Q1

106
Q

When is the semi-interquartile deviation usually used?

A

when the median is provided as a measure of central tendency (asymmetric distributions)

107
Q

What is the formula in semi-interquartile deviation?

A

SIR = Q3-Q1 : 2
-> average value of the distance between the two quartiles

108
Q

What does Higher values in measures of variability indicate?

A

higher variability

109
Q

What do variance and standard deviation measure in statistics?

A

degree to which the observed values of the variable deviate from the mean

110
Q

What are variance and standard deviation based on?

A

the average of the distances from the mean

111
Q

What is the first step in calculation variance and standard deviation?

A

calculate the mean - if not already provided

112
Q

what is the next step after calculating the mean in finding variance and standard deviation?

A

calculating the difference of each score from the mean

113
Q

what do we have to do to solve (∑=0)?

A

square the distances from the mean and then summ it

114
Q

How do we get the variance out of the sum?

A

when we divide it by the number of observations

115
Q

What is the variance?

A

the squared average of distances from the mean

116
Q

How do we get the units on the same scale (not squared)?

A

we calculate the square root of the variance
-> results in standard deviation

117
Q

What is the formula for the standard deviation?

A

SD = sx = √s²x

118
Q

What are 3 characteristics of the variance?

A
  • reliable and frequently used
  • may suffer big changes if there are outliers
  • cannot be calculated if the mean is unknown
119
Q

What are main characteristics of standard deviation?

A
  • not different from the variance - it is calculated from it
  • BUT: provides a value in the same units as the distribution
  • the most used along with the mean
120
Q

What can we say if the variance or standard deviation is low?

A

the mean represents the data well

121
Q

What is the variance quotient used for?

A

relates the standard deviation to the mean, allowing comparison of dispersion between two distributions.

122
Q
A