descriptive statistics (A) Flashcards

1
Q

what is meant by descriptive statistics?

A

Descriptive statistics are brief descriptive coefficients that summarize a given data set, which can be either a representation of the entire or a sample of a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is descriptive statistics are involved with this set?

A
  • measure of central tendency

- measures of spread (dispersion)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are the measures of central tendency?

A
  • mean / average
  • mode / popular
  • median / middle
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are the measures of dispersion?

A
  • range
  • IQR
  • Sandard deviation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is meant by measures of central tendency?

A

term term average is used in everyday life ro express an amount that is typical for a group of people / things

typical tendency of groups

measures of CT is a single value that attempts to describe a set of data by identifying the central position in data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

why is average useful?

A

useful indicator of the general trend

  • summarise large amounts of data
  • indicate that there is some variability around the single value within original data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what are the pros of the mean?

A
  • most popular measure
  • yields one distinct answer
  • useful for comparing data sets
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are the cons of the mean/

A
  • answer can be affected by extreme values (outliers) or when skewed data is present
  • mean has tendency to be pulled towards extreme values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are the pros of the median?

A
  • unaffected by extreme values

- if there is skewed data it may be more informative decriptce measure then the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what are the cons of the median?

A
  • less amenable then the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what ate the pros of them doe?

A
  • easy to obtain

- only measure that can used for data on nominal scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what are the cons of the mode?

A
  • it is not very stable from sample to sample

- there may be more than one mode for a particular score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is non skewed data?

A

equal (perfect distribution)

mean is preferred

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is skewed data?

A

large amounts of outliers

median is less affected by skewed dat and id generally considered to be the best

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

when do you use the mode?

A

for nominal data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

when do you use the median?

A

ordinal

interval/ration (skewed)

17
Q

when do you use the mean?

A

interval/ratio (non skewed)

18
Q

what are measures of spread?

A

describe how similar or varied the set of observed values are for a particular variable

large spread = large differences
seen as positive if there is little deviation

19
Q

what are the types of measures of spread?

A
  • range
  • quatiles
  • deviation (absolute deviation, variance, standard deviation)
20
Q

what are the types of deviation?

A

absolute deviation, variance and standard deviation

21
Q

what is the range?

A

difference between the highest and the lowest variable

22
Q

what are the pros of range?

A
  • useful when measuring variable with a critical low/high threshold that should not be crossed e.g. drinking age
  • easy
23
Q

what are the cons of the range?

A
  • value is sensitive to outliers
24
Q

what are quartiles?

A

tell us about the spread of data set by breaking data set into quarters, just as the median breaks in half

25
Q

how many quartiles are there?

A

Q1
Q2
Q3

26
Q

how do you work out the quartiles?

A
  • if you find the median you will find Q2

- then work out the median of the number outside to find Q2 and 3

27
Q

what is the interquartile range?

A

the difference between 1st and 3rd quartile

IQR = Q3 - Q1

28
Q

what is a box plot?

A

visual description of the distribution based on minimum Q1, median, Q3, maximum

29
Q

what is an outlier?

A

observation which does not appear to belong with other data (measurement or recording error / equipment failure)
have to find them as can skew the data

30
Q

how do you find outliers?

A

Q1-1.5 x IQR
Q3+1.5 x IQR

anything outside of these fences is considered an outlier

31
Q

what is the disadvantage to quartiles?

A

does not take into account every score just 25%

32
Q

what is the difference between mean absolute deviation and standard?

A

both looking at the distance of the data oto its mean
standard is calculating the square of the difference

absolute is only looking at the absolute difference

33
Q

how do yo work out mean absolute deviation?

A
  • find the mean of all values
  • find the distance of each value from that mean, subtract mean from each value, ignore minus)
  • find the mean of those distances

SUM OF (x-u) / N

x = each value 
u = mean 
n = number of values
34
Q

what is the variance?

A

the average of the squared differences from the mean

35
Q

how do you calculate the variance?

A
  • work out the mean
  • for each number subtract the mean and square the result
  • the work out the average of those squared differences (but minus one)

SUM OF (x-u)2 / n-1

36
Q

why do you use n-1 for standard deviation and variance?

A

to overcome bias

37
Q

what are the problems with sample variance?

A
  • as they are squared this gives more weight to extreme scores
  • it contains outliers, variance ay not represent data as a whole
  • not same units as data (unit squared) cannot directly relate
38
Q

wat is standard deviation?

A

measure of spread of scores within data set

use in conjunction with mean to summarise continuous data
- normally apparopraite if data not skewed or has outliers

39
Q

how do you work out standard deviation

A
(square root of variance) 
work out mean 
subtract mean and quare 
add all square 
find mean of those squared 
find square root