Descriptive Stats Flashcards

1
Q

What are measures of central tendency

A

Methods employed to determine central point in a given distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Mode

A
  • Corresponds to score that has highest frequency in a frequency distribution (visually its the
    highest value)
  • For grouped distribution (histogram) its defined as the most frequently occurring interval (or
    the mid-point of that interval).
  • It is applicable to almost all kinds of data-sets.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Disadvantages of mode

A
  • Lack of reliability
  • Lack of precision in some cases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Unimodal distribution

A

Distributions with single highest values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Bimodal distribution

A

Distributions with two highest values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Properties of mean

A
  • If a constant is added (or subtracted) to every score in a distribution, the mean is increased
    (or decreased) by that constant.
  • If every score is multiplied (or divided) by same constant, the mean will be multiplied (or
    divided) by the same constant.
  • The sum of deviations from the mean will be equal to zero.
  • The sum of squared deviations from the mean will be less than the sum of squared
    deviation around any other point in the distribution.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Median

A
  • Corresponds to determining middle score of a distribution, after arranging the data in ascending order.
  • Corresponds to 50th percentile of a distribution.
  • If a distribution has odd number of scores then median is literally the middle value in a distribution
    (provided the data is arranged in ascending order).
  • If a distribution has even number of scores then median corresponds to average of two middle scores.
  • In principle it divides a distribution into two equal halves.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Disadvantages of median

A
  • Its not applicable to all kinds of data-sets.
    (e.g., The median cannot be identified for categorical nominal data, as it cannot be
    logically ordered).
  • Median is more informative if there are not many ties, and the distribution is skewed.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Whathappens if there is a great deal of variability

A

No measure of central tendency is very representative of the scores, if the
distribution contains a great deal of variability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Different measures of variability

A
  • Range
  • Semi-Interquartile range
  • Mean deviation
  • Variance
  • Standard deviation.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Range

A
  • Evaluates width of a distribution by subtracting lowest (lowest real limits) from highest
    score (highest real limits).
  • The advantage is that it captures whole distribution.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Disadvantages of range

A
  • The major disadvantage of range is that, just like mode, it’s unreliable.
  • The range can be changed drastically by removing or adding just one score in the
    distribution.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Semi interquartile range

A
  • This type of measure of variability can be used for open-ended distribution.
  • The interquartile range is obtained by subtracting the 25th percentile from the 75th
    percentile. The semi-interquartile range is half the interquartile range.
  • It does not get affected much by addition or subtraction of extreme scores from a
    distribution.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Mean deviation

A
  • It evaluates distance of every score from the mid point of the distribution and averages it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Deviation score = ?

A

Mean - Individual score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Mean deviation calculation

A
  • Deviation score
  • Mean of al deviation scores (take absolute deviation scores
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is variance also referred to as

A

Mean square

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

SS = ?

A

summation of (individual score - mean)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Variance = ?

A

SS / N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Standard deviation

A
  • calculated by taking the root of teh variance
  • also called the root mean square
  • affected by scores having large deviations in distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Properties of standard deviation

A
  • If a constant is added (or subtracted) to every score in a distribution, the standard deviation
    is not affected.
  • If every score is multiplied (or divided) by same constant, the standard deviation will be
    multiplied (or divided) by the same constant.
  • The standard deviation from the mean will be smaller than the standard deviation from any
    other point in the distribution.
22
Q

What is positive skewness

A

A positive skewness represents asymmetrical
distribution with long right tail.

23
Q

What is negative skewness

A

A negative skewness represents
asymmetrical distribution with long left tail.

24
Q

Skewness = ?

A

Summation of (individual score - mean)^3 / N

25
Q

Central tendencies of a skewed distribution

A
  • When the distribution is negatively skewed, the mean will be to the left of the median
  • When the distribution is positively skewed, the mean will be to the right of the median
26
Q

Important distinction

A

two distributions can both be symmetric (i.e., skewness equals
zero), unimodal, and bell-shaped and yet not be identical in shape.

27
Q

How can kurtosis be measured

A

by raising deviations from the mean to the fourth power,
taking their average, and then dividing by the square of the population variance.

28
Q

What does negative kurtosis indicate?

A

relatively thin tails and a lesser peakedness in the middle (a
platykurtic distribution).

29
Q

What does positive kurtosis indicate

A

relatively fat tails and
more peakedness in the middle of the distribution (a leptokurtic distribution),

30
Q

What is mesokurtic distribution?

A

If the kurtosis measure is set to zero for the normal (mesokurtic) distribution (by
subtracting 3 in the above formula),

31
Q

What is kurtosis measured relative to?

A

relative to the kurtosis of a normal distribution, which
is 3. Therefore, we are always interested in the “excess“ kurtosis,

32
Q

Excess kurtosis= ?

A

Excess kurtosis = sample kurtosis – 3

33
Q

What is kurtosis used for quantifying?

A

non-normality—the deviation from a normal distribution—of a distribution.

34
Q

What does a value of 3 or more indicate?

A

large departure from normality.

35
Q

What does a very small value of kurtosis indicate?

A

a deviation from normality, but it is
considered as benign deviations.

36
Q

What is population analysis?

A

statistics applied to the whole data set

37
Q

What is ample analysis

A

Statistics applied ot a sub set of teh whole data

38
Q

Sample variance can be

A

Larger or smaller than population variance

39
Q

What equals to the population variance

A

If infinitely many sample variances are calculated and their average is taken

40
Q

What is degree of freedom

A

The number of deviations that are free to vary

41
Q

df = ?

A

N-1

42
Q

What is confidence interval?

A

the range of likely values of the parameter

43
Q

What is teh standard error of mean

A

the standard deviation divided by the square root of the number
of samples.

44
Q

What is the variance

A

the average of the squared deviations from the mean across the number of
samples.

45
Q

What are outliers

A

those observations that differ strongly (different properties) from the other data
points in the sample of a population.

46
Q

Sources of outliers

A

Human errors (wrong
data entry), Measurement errors (faulty system/ tool), Data manipulation error (Faulty data
pre-processing), Sampling errors (creating samples from heterogeneous sources),

47
Q

Methods for indicating outliers

A
  1. Tukey’s Fences (or Quartile method)
  2. Z – Score
  3. Local Outlier Function
  4. Angle based Outlier Detection (AbOD)
  5. Silhouette (K-Means Clustering)
  6. Confidence Interval (CI) of fit
48
Q

What is H-spread

A

The length of the box and is equal to teh interquartile range, not teh semi-interquartile range

49
Q

What are the inner fences

A

The outermost limits of teh plot

50
Q

Inner fence is equal to

A

1.5 times the H spread

50
Q

The whiskers do not generally extend to the

A

inner fences

51
Q

End of upper and lower inner fences are known as

A

upper adjacent value and lower adjacent value