Unit 2 Flashcards
Univariate descriptive statistics
What does ‘univariate’ means in statistics?
it refers to the analysis of one single variable
What are univariate descriptive statistics?
they provide a summarized description and analysis of one single variable
What kind of questions can univariate descriptive statistics answer?
questions like “What are the scores in the variable X?”, “Are there many differences among its values?”, and “What is the proportion of subjects over 10 points?”
What is data?
Values that define the features of the participants in a set of variables
What are the 7 steps in data analysis
- building the dataset
- label and identify the variables
- exploratory analysis and solutions
- description of the variables and the sample
- inferences and hypothesis testing
- presentation of results
- Interpretation
In the dataset: what does each column represent?
represents a variable (e.g.: ID, gender, age, alcohol consumption, average grade)
Why is it important to label and identify variables in a dataset?
It ensures clarity on what each variable represents, making it easier to conduct analyses and interpret the results
What do we have to do in order to draw conclusions?
start counting
What do we have to do in the first 2 steps (building dataset, labeling and identifying the variables?)
labeling the rows and columns, writing in the variables
What is absolute frequency (fi)?
it is the number of times a value of a variable is repeated in a dataset
-> frequency
What is the absolute frequency of females (gender = 1) in a dataset where there are 3 females and 2 males?
The absolute frequency of females is 3
What does the ‘I’ stand for?
Category or value being analyzed
What is Relative frequency (f’i)?
Proportion (over 1) of the frequency of a certain value with respect to the total sample
How do you calculate the Relative frequency (f’i)?
𝑓′𝑖 = fi:N
What is the Percentage (pi)?
Proportion over 100% that represent the value in the sample
How do you calculate the Percentage?
pi = f’i x 100
What is the Cumulative absolute frequency (Fi)?
Number of times a value or lower values are repeated in the sample
What is Cumulative relative frequency (F’i)?
cumulative proportion
How do you calculate the cumulative relative frequency (F’i)?
F’i = Fi : N
How do you calculate the Cumulative percentage (Pi)
Pi = F’i x 100
What is (fi)?
Absolute frequency
What is (f’i)?
Relative frequency
What is (pi)?
Percentage
What is (Fi)?
Cumulative absolute frequency
What is (F’i)?
Cumulative relative frequency
What purpose does Graphical representation have?
facilitating the understanding of the data and their characteristics
What kind of charts are there?
- Cyclograms / Pie charts
- Bar chart / frequencies
- Polygons
- Histograms
- Stem and leaf diagram
- Box plot
How are Cyclograms / Pie charts structured?
form of a circle divided into portions proportional to the frequency of the value
What types of frequencies can be shown in a pie chart?
absolute, relative frequency or percentage
For which types of variables is a pie chart typically used?
Pie charts are used for nominal, ordinal, and discrete quantitative variables with a few distinct values.
What is a bar chart?
Bars representing the frequency (ordinate axis, Y-axis) of each value (abscissa axis,
X-axis).
What types of frequencies can be shown in a bar chart?
absolute, relative frequency or percentage
What types of variables are bar charts typically used for?
Bar charts are used for nominal, ordinal, and discrete quantitative variables with few values.
What does a polygon of frequencies represent?
the frequency of each value, where points are plotted and connected by lines to show the distribution of the data.
what is the polygon of frequencies useful for?
comparing groups or describing profiles
What kind of variables are best suited for a frequency polygon?
Frequency polygons are most useful for quantitative variables, preferably discrete ones.
What does a histogram represent in a frequency distribution?
A histogram uses bars to represent the frequency (on the Y-axis) of each value or class interval (on the X-axis)
Why are the bars in a histogram unseparated?
to represent the continuity of the variable
What types of frequencies can a histogram display?
A histogram can display absolute, relative, or percentage frequencies
How are large amounts of quantitative data handled in a histogram?
they are grouped into class intervals or classes to simplify the representation
What type of variables are best suited for a histogram?
continuous quantitative variables
What is the purpose of a stem and leaf diagram?
Shows the order and shape of the data
What is the Stem and leaf diagram useful for?
evaluating possible anomalies in the distribution of the variable
What does a box plot show?
the distribution of a variable using position indexes like the median and quartiles, providing information on symmetry and outliers
What are the four properties that characterize the shape of a frequency distribution?
Central tendency
Variability
Skewness
Kurtosis
What is Central tendency?
Place where the distribution is centered. Where the data are grouped.
e.g.: index of central tendency is the mean
What is Variability?
Degree of dispersion/concentration of observations with respect to the mean
or the rest of values.
What is the difference between high and low variability?
- Low: the data differ little from each other. They are more concentrated.
- High: data differ a lot from each other. They are more scattered/dispersed.