Statistics Year 1 Flashcards

1
Q

population

A

the whole set of items that are of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

census

A

+ It should give a completely accurate result * Time consuming and expensive

  • Cannot be used when the testing
  • process destroys the item
  • Hard to process large quantity of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

sample

A

+ Less time consuming and expensive than
a census
+ Fewer people have to respond
+ Less data to process than in a census

  • The data may not be as accurate
  • The sample may not be large enough
    to give information about small subgroups of the population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

population
census
sampling form
samling units

A

average heights
population : eveyone thats ever walked into the school
census : must find eveyone whos walked into the school- impossible!
sampling frame : practical list from which you can pick people to survey, list of all students/ teachers narrowed down focus, the best we can get of the population
sampling units : individual voters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

bias

A

sample doesn’t represent population fairly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

random sampling

A

all sampling units has an equal chance of being picked

+ Free of bias
+ Easy and cheap to implement for small
populations and small samples
+Each sampling unit has a known and equal
chance of selection

  • Not suitable when the population size or the
    sample size is large
  • A sampling frame is needed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

systematic sampling

A

the required elements are chosen at regular intervals from an
ordered list

+ Simple and quick to use
+ Suitable for large samples and large
populations

  • people who are picked may not want to take part in survey
    -A sampling frame is needed
  • It can introduce bias if the sampling frame is
    not random
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

stratified

A

the population is divided into mutually exclusive strata (males and
females, for example) and a random sample is taken from each.

+Sample accurately reflects the population
structure
+ Guarantees proportional representation of
groups within a population

-Population must be clearly classified into
distinct strata
- Selection within each stratum suffers from
the same disadvantages as simple random
sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

quota sampling

A

an interviewer or researcher selects a sample that reflects the
characteristics of the whole population.

+Allows a small sample to still be
representative of the population
+No sampling frame required
+Quick, easy and inexpensive
+Allows for easy comparison between different
groups within a population

  • Non-random sampling can introduce bias
  • Population must be divided into groups,
    which can be costly or inaccurate
  • Increasing scope of study increases number
    of groups, which adds time and expense
  • Non-responses are not recorded as such
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

difference with quota and straified

A

q: you meet the people and select them, no sampling frame involved, allocate the people in the appropriate quota

s: if you want 5 tall people, you randomly pick from a list of 5 tall people

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

opportunity sampling

A

+Easy to carry out
+ Inexpensive

  • Unlikely to provide a representative sample
  • Highly dependent on individual researcher
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

continuous variable

A

■ A variable that can take any value in a given range is a continuous variable.
For example, time can take any value, e.g. 2 seconds, 2.1 seconds, 2.01 seconds etc

e.g. foot size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

discrete

A

A variable that can take only specific values in a given range is a discrete variable.
For example, the number of girls in a family is a discrete variable as you can’t have 2.65 girls in a family

e.g. soe size, goes up in 1/2 s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

mode/ modal class

A

■ The mode or modal class is the value or class that occurs most often.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

median

A

■ The median is the middle value when the data values are put in order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

For data given in a frequency table, the
mean can be calculated using the formula

A

x bar = ∑ x f / ∑ f

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

mean calculated

A

x̄ = ∑ x/ n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

how to find the quartiles

A

Q1 = 1/4 x n
Q2 = 1/2 x n
Q3 = 3/4 x n

the xth value found is where the upper/ lower quaritle lies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

interpolation

A

if modal class is 34-36

numerline = 33.5………………….36.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

interpolation steps

A
  1. find the mean/ value your interested in
  2. find the modal class its in
  3. draw a numberline, round first down, second up
  4. on the bottom write the total at the start and end of the modal class
  5. find fraction of where the mean is on number line
  6. times fraction by the difference of the modal class valuses numberline
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

interpolation data is

A

evenly spaced out

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

spread

A

dispersion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

variance and standard deviation

A

takes account of all pieces of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Variance

A

Σ (x- x̄)² - the mean square distance from the mean

cant do x - x̄ as some values will be negative

easier formula to calculate:

Σx² / n - ( Σx/n)²

26
Q

lower case sigma

A

σ

27
Q

difference between sd and v

A

σ- standard deviation, measure of spread
σ² - variant,

28
Q

variance and standard deviation frequency

A

σ² = Σfx² / fΣ - ( Σfx/fΣ )²
σ = square root of Σfx² / fΣ - ( Σfx/fΣ )²

29
Q

coding

A

makes numbers easier to deal with

30
Q

coding measure of spread

A

ignore adding/ taking away when converting back as this doesn’t change the measure of spread

31
Q

outliers

A

greater than Q3 + k(Q3 − Q1)
less than Q1 − k(Q3 − Q1) (IQR)

standard for k = 1.5

32
Q

Anomalies

A

Anomalies can be the result of experimental or recording
error, or could be data values which are not relevant to
the investigation.

we dont want to include anonalies as they arent representation for population

33
Q

box plots and outliers

A

bottom line in box plot goes on the data point right after the outlier

34
Q

Area of histogram

A

proportional to the frequency

35
Q

f.d.

A

dividing the frequency by the class width

36
Q

frequency polygon

A

middle of bars

37
Q

regression line to work out y value

A

you cant predict, as the regression is of y on x not regression of x on y

38
Q

independant variable

A

x axis

39
Q

dependant variable

A

y axis

40
Q

can you use graph to predict poits outside graph

A

you shouldnt extrapolate outside of the range

41
Q

interpolation

A

using values in range to predict a value in the set of data

42
Q

an experiment is

A

a repeatable process that gives rise to a number of outcomes

43
Q

an event

A

An event is a collection of one or more outcomes.

44
Q

sample space

A

A sample space is the set of all possible outcomes.

45
Q

The event A and B

A

intersection

AnB

46
Q

The event A or B

A

AUB

47
Q

A not B

A

B’

48
Q

ven diagrams filling values

A

always work out centre first, if unknown make it x

49
Q

Mutually exclusive events

A

P(AUB) = P(A) + P(B)

50
Q

independant events

A

(AnB) = P(A) x P(B)

51
Q

probability questions

A

which diagram to draw?

  • venn
  • tree
  • simple sample space diagram

try one, if it doesn’t work try the other

52
Q

random variable

A

the ‘thing’ whose value is the outcome of the experiment

  • all the outcomes of the experiment
  • the result of the experiment
53
Q

P(X=4)

A

whats the probability of random variable x being 4

54
Q

X~B(n,p)

A

n= number of trials
p= number of success for each trial

55
Q
A
56
Q

■ You can model X with a binomial distribution, B(n, p), if:

A

● there are a fixed number of trials, n
● there are two possible outcomes (success and failure)
● there is a fixed probability of success, p
● the trials are independent of each other

57
Q

can’t look up in tables:

A

strictly <, has to be less than or equal to

58
Q

Null hypothesis

A

H0, is the hypothesis that you
assume to be correct.

59
Q

Alternative hypothesis

A

H1, tells you about the
parameter if your assumption is shown to be wrong.

60
Q

Steps to answering a hypothesis testing question

A
  1. state what propality we are looking at
    e.g. we are looking at whether the probablity of landing heads is less than 0.5
  2. let X be the number of times the coin lands on heads
    X~B (10,0.5)
  3. Test statistic is X=O
  4. let P be the probability of getting heads
  5. H0 = P =1/2
    H1= P>1/2
  6. set significance level at 5%
    P(X=O) = 0.001

0.0010< 0.05

  1. we can reject H0 as there is sufficient evidence at the 5% level to suggest that the coin is bias against heads
61
Q

reject H 0 if

A

its smaller than 0.05
outside the critical region

62
Q
A