Task 2 Flashcards
Which 2 main types of data are there?
Categorical data
Quantitative data
Which 2 categories of categorical data do you know?
Nominal (naming)
Ordinal (ordering)
Which type of categories does nominal data have?
Name an example
Equal categories (without order) gender: male/female
Which type of categories does ordinal data have?
Name an example
ordern categories (with order) education level (low/ middle / high)
By using categorical data, are the distances on scale meaningful?
no
Which suitable table do you use for categorical data?
frequency table
Which suitable graphs do you use for nominal data?
What are classical representations?
Name 2
Pie chart
Bar chart
Which suitable measure of central tendency do you use for nominal data?
mode
Which suitable measures of central tendency do you use for ordinal data?
Mode
Median
Which suitable measure of central tendency do you use for quantitative data (interval & ratio) ?
Mode
Median
Mean
Is there a suitable measure of dispersion for categorical data?
no
Which suitable measures of dispersion (Verteilung) do you use for quantitative data?
Interquartile range (IQR) Standard deviation
Name 2 main data scores which are used by quantitative data
Interval (distance)
Ratio (rate)
Which suitable table do you use for quantitative data?
Stem and leaf plot
Which suitable graph do you use for quantitative data?
Name 2 classical representations
Histogram
boxplot
What do you know about the distance between the variables by using interval? Example
Meaningful numbers, for instance IQ (50-150)
The distance between consecutive units is always equally large
What do you know about ratio
an interval variable with an absolutions zero point, for instance age
When do you use a stem and leaf plot?
if you have a low number of recordings
What are the advantages of a histogram?
easy to represent many data points
easy to interpret data roughly (ungefähr)
Which are the 3 special interest points of a histogram?
Extreme values (outliers) Number of peaks (modes) Symmetry
When do you use the box-plot?
visual representation of large data set
fast recognition of outliers
easy comparison of several groups
Name 4 Middle values / suitable measures of central tendency
mean
median
mode
IQR
Mean
Definition?
Calculation?
How to deal with Outliers?
Example
• Average value / the scores center of mass
• Calculation: adding all values, dividing sum by total numbers of observations
• Sensitive to outliers
(be careful using it in case of skewed distributions or outliers)
Example:
1-3-3-4-9
Sum=20
Mean= 20/5= 4
Usage of mean
for quantitive data (interval variables and up)
tells you something about the frequency, order and value of scores
Median
Definition?
Determing?
How to deal with outliers?
Example
- Middle value (50% lies below and 50% above)
- Determining: ordering all values from small to large and picking the middle (N+1):2
- Robust towards outliers
Example:
1-2-4-4-5
• Median: 4
Usage of median
okay for ordinal variables and up (interval and ratio)
tells you some thing about the frequency and order of scores
Mode
Definition?
Determing?
How to deal with outliers?
Example
- Highest frequency of value
- Determining: which value or category appears most frequent / often
- Robust towards outliers
Example:
1-2-2-4-12
• Mode: 2
usage of mode
is always okay
tells you something about the frequency of scores
deutsches Wort für distribution
Verteilung
deutsches Wort für skewed
verzerrt
What is a Z-score?
- A standardized measure of standard deviations.
- Example: if SD=15 and a person has a score 15 points above the mean his Z-score= +1
- Measures how many standard deviations an observed value differs from the mean
- Does not change shape of distribution
- Measure of center and spread might change
What is the Standard deviation
the average deviation from the mean
the squared value of this is called the variance
How do you calculate the standard deviation?
take the difference of each score from the mean, square these difference, sum them up, divide them by the total amount of scores minus 1 and take the square rout of the outcome
Usage of standard deviation
okay for quantitative data (interval variables)
is not resistant to outliers
Interquartile range IQR
the range between which the 50% middle scores fall
IQR= Q3-Q1
for interval variables
Five-number-summary
five quantities in a row which summaries the complete distribution
minimum-Q1(25%)-median-Q3(75%)-maximum
1,5* IQR criterion
to identify a score as an outlier
downward outlier:
Q1-(1,5IQR)
is the score even lower than this outcome, we consider it as an outlier
upward outlier:
Q3+(1,5IQR)
if a score is even higher than this outcome, we say its an outlier indeed
centring
shift all scores such that the mean becomes 0
C=X-mean
the shape of the distribution gets not effected at all; the only thing that changes is the mean
Standardising
shift all the scores such that the mean becomes 0, and then change the scores such that the standard deviation becomes 1 as well
(Z-formula)
Multiplying
multiply all scores by a certain number
X´= a*X
What is the difference between a histogram and a bar graph?
A bar graph presents different groups (categorical data), a bar graph has space between its bars
Can the variance be negative? Explain
no it can’t because its the standard deviation im Quadra
if you have a normal distribution how much percent does Z1 include?
68%
If you have a normal distribution how much percent does Z2 include
95%
If you have a normal distribution how much does Z3 include?
99%