Task 2: The characteristic score. Flashcards
Statistic is
the science of learning from data.
Data can be
Numerical and qualitative
Cases are
the objects described by a set of data.
A label is a
special variable used in some data sets to distinguish the different cases.
A variable is
the characteristic of a case.
Variables can be
Categorical (places the case into a group)
Quantitative (takes numerical values for which arithmetic operations make sense).
Distribution of a variable tells us
what values it takes and how often it takes these values.
Explanatory data analysis
Statistical tools and ideas that help us examine data to describe their main features.
Categorial representations of a set of variables are
Bar graphs (more flexible) and pie charts (include all categories that make up a whole).
Quantitative representations are made with the help of..
Stemplots (work better for small numbers that are greater than 0);
and
Histograms (columns don’t have spaces between them).
Tails of the distribution contain..
the extreme values.
The two principles of data examination:
1- plot your data
2- look for an overall pattern and any striking deviations.
When examining a distribution, take the further three steps:
1- Overall patterns + striking deviations
2- Look at the shape (does it have modes = major peaks)
3- Is it symmetric? (mirror image)
4- Is it skewed? (skewed to the right if the right tail is longer)
5. Outliers.
An outliers is
an individual value that falls outside the overall pattern.
We can measure the centre with the help of:
The mean x(bar) and the median M.
Characteristics of the mean.
- average value;
- is sensitive to the influence of outliers => NOT a resistant measure of the centre.
Characteristics of the median.
- midpoint of a distribution;
- more resistant than the mean;
- typical. value.
How to find the median?
- arrange all obs. from the smallest to the largest.
- (n+1)/2 = M
that gives the location, not the actual mean.
Methods for measuring spread:
1-Quartiles Q1 and Q3;
2- Standard deviation s
To calculate Q1 and Q3:
- Find the M.
- Q1 (1/4 of the obs.) = the M of the observations whose position is to the left of the location of the overall M.
- Q3 (3/4 of the obs.) = the M of those obs. that are located to the right of the overall M.
The five-number summary consists of:
Minimum Q1 M Q3 Maximum.
The interquartile range is used for describing..
skewed distributions.
IQR =
Q3 - Q1
The 1.5 x IQR is used for detecting
suspected outliers.
Variance s2 is the..
average of squares of the deviations of the observations from their mean.
Standard deviation s is the
square root of the variance.
The sum of the deviations of the observation from their mean will always be..
0
When there is no spread s =
0
We only know s when we know the..
Mean.
s is not
resistant.
Linear transformations change…
the original variable x into the new variable x new.
x new =
a + bx
The characteristics of linear transformation are:
- it does not change the shape of a distribution.
- it changes the origin if a is not equal to 0.
- if b > 0 => changes the size of the unit of measurement.