Statistics formula Flashcards

1
Q

What is the formula for standard deviation?

A

Sqrt(Sxx / n-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is sxx?

A

(X1-meanx)^2 + (X2-meanx)^2+…+(Xn-meanx)^n

- Sigmax^2 - nxbarsqaured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is variance?

A

S^2 so sxx/n-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When do you use n and when do you use n-1?

A
  1. For a full population, n

2. For a sample, n-1 (usually this)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you work out the probability of (AnB) if they are independent?

A

P(AnB) = P(A) x P(B)

Independent means with replacement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you work out the probability of (AnB) if they are mutually exclusive?

A

P(AnB) = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you work out (AuB) regardless if they are independent or mutually exclusive?

A

P(AuB) = P(A) + P(B) - P(AnB)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you work out P(AIB) always? And Independent? And mutually exclusive?

A

P(AIB)= P(AnB) / P(B)
If independent t it is just P(AIB) = P(A) since the P(B) cancels out
If mutually exclusive P(AIB) = O as you are dividing O by something

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you always work out P(AnB)?

A

P(AnB) = P(BIA) x P(A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the linear coding for Fahrenheit and Celsius?

A
F = 9/5C + 32
C = 5/9(F-32)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you calculate outliers?

A
  • Data which is more than 1.5 x IQR
  • Q1 - 1.5xIQR
  • Q3 + 1.5xIQR
  • mean - 2xsd
  • mean + 2xsd
  • If test scores, include all outliers (distribution continuous) except zeros where the candidates did not take the test)
  • If not include say why bad experiment, taking a joke
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you do quartile?

A

Look at notes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you work out mean?

A

sigmax/ n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you find IQR of even set of numbers?

A
  • Split down middle
  • Odd number on both sides
  • Find middle position number of both
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is msd? (Mean squared deviation)

A

sxx / n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is rmsd? (Root mean squared deviation)?

A

sqrt ( sxx / n )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the modal group? What is mode? What is bimodak?

A
  • The range where the large promotion of the sample are in and it is the group with the most members
  • The single value is called the mode
  • If a second peak, even if one frequency is higher than the other
  • Look at notes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the skew if the hump is closest to the y axis?

A

Positive Skewness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the distribution is the hump is further from the y axis?

A

Negative skewness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is frequency density?

A

frequency / class width

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Which boundary do you plot of cumulative frequency?

A

Upper boundary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

When do you use a vertical line graph?

A
  • Used to illustrate discrete data
  • The scores can only take integer values
  • Y axis is labelled frequency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How do you work out the median of any data?

A

The median Q2 is the value of the n+1 / 2 th item of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the four measures of central tendency?

A

Mean, median, mode and midrange

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

If A and B are independent then P(BIA) is?

A

P(B)

26
Q

If A and B are dependant then P(BIA) is?

A
  • not P(BIA’)

- P(BIA) = P(BnA) / P(A)

27
Q

If A and B are dependent then what is P(AnB)?

A

P(A) x P(BIA)

28
Q

If A and B are independent then what is P(AnB)?

A

P(A) x P(B)!

29
Q

How do you work out P(B)?

A

P(B) = P(AnB) + P(A’nB)

30
Q

How do you work out E(x)?

A

Sigma rP(x=r)

  • r x P(x=r) all added
  • also known as mu
31
Q

How do you work out E(x^2) ?

A

Sigma r^2P(x=r)

32
Q

How do you work out variance?

A

E(x^2) - mu^2

33
Q

How else can you work out variance?

A

E([X-mu]^2)

Sigma (r-mu)^2 x P(x=r)

34
Q

When do you reject the null hypothesis in binomial testing?

A

When your number is in the critical region so it is less than the significance level

35
Q

When do you use binomial and when do you use normal?

A
  1. Binomial:
    - success/fail
    - discrete
    - constant p(success)
36
Q

What do you use PD or CD?

A

PD: (X=Y)
CD: (X_Y) = 1-(X_

37
Q

What is the layout of a binomial hypothesis test?

A
  1. Ho, H1
  2. Define P and X
  3. State type of test (one tailed or two) and significance level
  4. Test observation
  5. Find critical region
  6. Conclude significant and null hypothesis
  7. Include with context
38
Q

What are the properties of binomial?

A
  • Events independent of each other

- Only two possible outcomes

39
Q

What are the properties of normal?

A
  • Rough bell shape
  • Symmetrical around mean
  • All data within 3sd of the mean
  • Points of inflexion are each 1sd from the mean
40
Q

What is random sampling?

A

In simple random sample ever person or item in population has equal chance of being selected and each selection is independent of every other selection

41
Q

How do you choose a random sample?

A
  1. Give a number to each population member from a full list of population (ever single possible sample is equally likely)
  2. Generate a list of random numbers using calculation or computer or dice or random number generator
  3. Match these numbers to the population numbers to select your sample
42
Q

What are the advantages of random sampling?

A
  1. Every member of population has equal chance of being selected so completely unbiased
43
Q

What are the disadvantages of random sampling?

A
  1. Inconvenient if population spreader a large area and it may be difficult to tack down the selected member (e.g. in a nation wide sample)
44
Q

How do you choose in systematic sampling?

A
  • Every Nth member selected
    1. Give a number to each population member from a full list of the population
    2. Calculate a regular interval to use by dividing the population size by sample size e.g. every tenth
    3. Generate a random starting point that is less than or equal to size of interval. Corresponding member of population is first member of sample. Keep adding interval to the starting point to select your sample
45
Q

What are the advantages of systematic sample?

A
  1. Can be sued for quality control on a production line (a machine can be set up to sample every nth item)
  2. Should also give an unbiased sample
46
Q

What are the disadvantages of systematic sample?

A
  1. Regular interval could coincide with a pattern (lead to bias e.g. if every tenth faulty will seem like every item faulty)
47
Q

What is opportunity sampling? (convenience sampling)

A

-Where sample is chosen from a section of population at a particular place and time (whatever is convenient for the sampler)

48
Q

What are advantages of opportunity sampling?

A

Data gathered quickly and easily

49
Q

What are the disadvantages of opportunity sampling?

A

Is not random and very biased and could be unrepresentative

50
Q

What is stratified sampling?

A
  • Population divided into categories (age or gender)

- Used same proportion of each category in sample as there are in population

51
Q

How do you choose stratified sampling?

A
  1. Divide population into categories
  2. Calculate total population
  3. Calculation the number needed for each category in sample using size of category in sample = (size of category in pop / total size of pop) x total sample size
  4. Select the sample for each category at random
52
Q

What are advantages of stratified sampling?

A
  1. If population can be divided up into distinct categories (e.g. age) it is likely to give a representative sample and it is useful when result may vary depending on category
53
Q

What are disadvantages of stratified sampling?

A
  1. It is not useful when there are not ay obvious categories

2. It can be expensive because of the extra detail involved

54
Q

How do you choose a quota sample?

A
  1. Divide the population into categories
  2. Give each category a quota (no of members to sample)
  3. Collect data until quotas are met in all categories (without using random sampling)
55
Q

What are the advantages of quota sampling?

A
  1. It can be done when there is not a full list of the population
  2. Every sample member responds because the interview continues to sample until all quotas are met
56
Q

What are the disadvantages of quota sampling?

A
  1. Can be easily biased by the interview e.g. could exclude some of the population
57
Q

How do you choose a cluster sample?

A
  1. Divide pop into clusters covering while population where no member of the population belongs to multiple clusters
  2. Randomly select clusters to use in the sample, based on the required sample size
  3. Either use all of the members of the selected clusters (a one stage cluster sample) or randomly sample within each cluster to form the sample (a two stage cluster)
58
Q

What is the distance between clusters and categories?

A
  • Categories should be groups that you expect to give different results to each other
  • While clusters should give similar results
59
Q

What is difference between quota and stratified sampling?

A

No attempt is made to be random in quota sampling and it is often used in market research

60
Q

When should rank and correlation coefficient be used?

A
  • linear correlation is a correlation coefficient

- Rank correlation is always linked to association not correlation and should be used if the points are close to a curve