Stats Flashcards

1
Q

median is also known as

A

Q2, second quartile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

how to calculate the median, and rules for calculating the median

A

take n and divide it by 2
- if n is a decimal, round up
- if n is a whole number, take an average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what does n represent

A

the sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how to calculate the mean from raw data

A

sum of x / n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how to calculate the mean from a frequency table

A

sum of fx / sum of f

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

median
lower quartile
upper quartile equations

A
  • n/2
  • n/4
    -3n/4
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how to find the xth percentile

A

Px = x * n / 100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is dicrete data
what is continuous data

A

data that takes a finite number of values

continuous takes an infinite number of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

examples of continous data

A
  • grouped data
  • time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

rules for discrete data

A

if you get a decimal number, round up

if you get a whole number, take the average of that number at that position and the number after it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

rules for continuous data

A
  • check for gaps in the data
  • take exact values
  • use linear interpelation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what to do if your data has gaps

A

rewrite the frequency classes in boundary form

eg 300 - 349 would be 299.5 - 349.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

INTERQUARTILE RANGE

A

Q3 - Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to find the distance between n% to m% (interpercentile range)

A

Pm - Pn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the variance and standard deviation of a statistical data

A

measure the average spread of the data values from its mean value
- variance is a square measure
- standard deviation is the square root of the variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is random sampling

A

when each sampling unit in our sampling frame has an equal chance of being chosen in order to avoid bias

17
Q

what is simple random sampling, and what are its advantages and disadvantages

A
  • every sampling unit in sampling frame has equal chance of being selected
  • each item in sampling frame has an identifying number
  • use random number generator

adv:
- bias free
- easy/cheap to implement
- fair

disadv:
- not suitable when population size is large

18
Q

what is systematic sampling
what are its advantages and disadvantages

A
  • when required elements are chosen at regular intervals in an ordered list

adv:
- simple/quick to use
- suitable for larger samples

disadv:
u need to have a list
- can introduce bias if sampling frame not random

19
Q

how to carry out systematic sampling

A
  • do pop size/sample size
  • pick a number from 1 to N
  • eg pick 17 out of 50
  • select every 50th person. eg 17,67,117
20
Q

what is stratified sampling
hows it carried out
advantages and disadvantagess

A
  • population divided into groups and a simple random sample is carried out in group
  • used when sample is large and population naturally divides into groups
  • sample size / pop size from each strata

adv:
- reflects population structure
- guarantees proportional representation of groups within population

disadv:
-population must be clearly classified into distinct strata
- selection within each strata suffers from the same disadvantages as simple random sampling

21
Q

why would random sampling be problematic

A

might not know the sampling frame

22
Q

non random sampling - quota sampling
adv and disadv

A
  • divide population into groups according to characteristic of interest
  • choose people within each group via suitable means until the quota for each geoup is filled

adv - allows small sample to still be representative of population
- no sampling frame required
- quick, easy, inexpensive
- allows for easy comparison between different groups in population

disadv - non random sampling can introduce bias
-population must be divided into groups, can be costly or inaccurate
- increasing scope of study increases number of groups, adding time/expense
- non responses are not recorded

23
Q

non random sampling - opportunity/convenience sampling

A
  • sample taken from people who are available at the time of study
  • interviewer selects the actual sampling units according to the set criteria

adv
- easy and inexpensive to carry out
disadv
- unlikely to provide a representative sample
- highly dependent researcher

24
Q

types of data

A
  • qualitative/categorical - non numerical values
  • quantitative- numerical data
  • discrete - can only take specific values eg shoe size - part of quantitative
    continuous - can take any decimal value - part of quantitate
25
Q

locations of the 5 uk weather stations
3 international weather stations

A

UK
- Heathrow
- Camborne - on the coast
- Hurn - on the coast
- Leeminh
- Leuchars - on the coast

INTERNATIONAL
- Beijing - northern
- Jacksonville - hot /tropical florida lots of hurricanes and tornadoes - northern
- Perth - southern hemisphere

26
Q

difference between northern and southern hemisphere

A

when northern is in winter southern is in summer and vice versa

27
Q

rules for events which are independent

A

Two events
𝐴 and 𝐵 are independent if the occurrence of one event does not affect the probability of the other. This means:

AnB = P(A) . P(B)

  • The probability of both events occurring is the product of their individual probabilities.
  • Independent events can overlap—they are not necessarily disjoint
28
Q

rules for mutually exclusive events

A
  • Two events
    𝐴 and 𝐵 are mutually exclusive if they cannot both occur at the same time. This means:
    AnB = 0
    and
    P(AUB) = P(A) + P(B)
29
Q

why might you include an outlier in the readings
and
why might you not include it

A
  • because it is still part of the distribution data
  • it might be unlikely to be an anomaly
  • but it is an outlier so may not be representative of the typical data
  • it might be a mistake
30
Q

in y = a + bx, what do the coefficients represent

A
  • the coefficient b tells you the change in y for each unit change in x
31
Q

what is interpolation (scatter diagram) and what is extrapolation

A
  • interpolation is making a prediction based on a value inside the range
  • extrapolation is making a prediction based on a value outside the range, and gives a less reliable estimate
32
Q

how predictions are made with scatter diagrams

A
  • only make predictions for the dependent variable
  • the independent variable is used to make the prediction