+ It should give a completely accurate result Time consuming and expensive - Cannot be used when the testing - process destroys the item - Hard to process large quantity of data

+ Less time consuming and expensive than a census + Fewer people have to respond + Less data to process than in a census - The data may not be as accurate - The sample may not be large enough to give information about small subgroups of the population

the required elements are chosen at regular intervals from an ordered list + Simple and quick to use + Suitable for large samples and large populations - people who are picked may not want to take part in survey -A sampling frame is needed - It can introduce bias if the sampling frame is not random

+Easy to carry out + Inexpensive - Unlikely to provide a representative sample - Highly dependent on individual researcher

1. find the mean/ value your interested in 2. find the modal class its in 3. draw a numberline, round first down, second up 4. on the bottom write the total at the start and end of the modal class 5. find fraction of where the mean is on number line 6. times fraction by the difference of the modal class valuses numberline

Statistics Year 1 Flashcards by Emine Matviyenko

population

the whole set of items that are of interest

How well did you know this?

Not at all

Perfectly

census

+ It should give a completely accurate result * Time consuming and expensive

Cannot be used when the testing
process destroys the item
Hard to process large quantity of data

How well did you know this?

Not at all

Perfectly

sample

+ Less time consuming and expensive than
a census
+ Fewer people have to respond
+ Less data to process than in a census

The data may not be as accurate
The sample may not be large enough
to give information about small subgroups of the population

How well did you know this?

Not at all

Perfectly

population
census
sampling form
samling units

average heights
population : eveyone thats ever walked into the school
census : must find eveyone whos walked into the school- impossible!
sampling frame : practical list from which you can pick people to survey, list of all students/ teachers narrowed down focus, the best we can get of the population
sampling units : individual voters

How well did you know this?

Not at all

Perfectly

bias

sample doesn’t represent population fairly

How well did you know this?

Not at all

Perfectly

random sampling

all sampling units has an equal chance of being picked

+ Free of bias
+ Easy and cheap to implement for small
populations and small samples
+Each sampling unit has a known and equal
chance of selection

Not suitable when the population size or the
sample size is large
A sampling frame is needed

How well did you know this?

Not at all

Perfectly

systematic sampling

the required elements are chosen at regular intervals from an
ordered list

+ Simple and quick to use
+ Suitable for large samples and large
populations

people who are picked may not want to take part in survey
-A sampling frame is needed
It can introduce bias if the sampling frame is
not random

How well did you know this?

Not at all

Perfectly

stratified

the population is divided into mutually exclusive strata (males and
females, for example) and a random sample is taken from each.

+Sample accurately reflects the population
structure
+ Guarantees proportional representation of
groups within a population

-Population must be clearly classified into
distinct strata
- Selection within each stratum suffers from
the same disadvantages as simple random
sampling

How well did you know this?

Not at all

Perfectly

How well did you know this?

Not at all

Perfectly

quota sampling

an interviewer or researcher selects a sample that reflects the
characteristics of the whole population.

+Allows a small sample to still be
representative of the population
+No sampling frame required
+Quick, easy and inexpensive
+Allows for easy comparison between different
groups within a population

Non-random sampling can introduce bias
Population must be divided into groups,
which can be costly or inaccurate
Increasing scope of study increases number
of groups, which adds time and expense
Non-responses are not recorded as such

How well did you know this?

Not at all

Perfectly

difference with quota and straified

q: you meet the people and select them, no sampling frame involved, allocate the people in the appropriate quota

s: if you want 5 tall people, you randomly pick from a list of 5 tall people

How well did you know this?

Not at all

Perfectly

opportunity sampling

+Easy to carry out
+ Inexpensive

Unlikely to provide a representative sample
Highly dependent on individual researcher

How well did you know this?

Not at all

Perfectly

continuous variable

■ A variable that can take any value in a given range is a continuous variable.
For example, time can take any value, e.g. 2 seconds, 2.1 seconds, 2.01 seconds etc

e.g. foot size

How well did you know this?

Not at all

Perfectly

discrete

A variable that can take only specific values in a given range is a discrete variable.
For example, the number of girls in a family is a discrete variable as you can’t have 2.65 girls in a family

e.g. soe size, goes up in 1/2 s

How well did you know this?

Not at all

Perfectly

mode/ modal class

■ The mode or modal class is the value or class that occurs most often.

How well did you know this?

Not at all

Perfectly

median

■ The median is the middle value when the data values are put in order.

How well did you know this?

Not at all

Perfectly

For data given in a frequency table, the
mean can be calculated using the formula

x bar = ∑ x f / ∑ f

How well did you know this?

Not at all

Perfectly

mean calculated

x̄ = ∑ x/ n

How well did you know this?

Not at all

Perfectly

how to find the quartiles

Q1 = 1/4 x n
Q2 = 1/2 x n
Q3 = 3/4 x n

the xth value found is where the upper/ lower quaritle lies

How well did you know this?

Not at all

Perfectly

interpolation

if modal class is 34-36

numerline = 33.5………………….36.5

How well did you know this?

Not at all

Perfectly

interpolation steps

find the mean/ value your interested in
find the modal class its in
draw a numberline, round first down, second up
on the bottom write the total at the start and end of the modal class
find fraction of where the mean is on number line
times fraction by the difference of the modal class valuses numberline

How well did you know this?

Not at all

Perfectly

interpolation data is

evenly spaced out

How well did you know this?

Not at all

Perfectly

spread

dispersion

How well did you know this?

Not at all

Perfectly

variance and standard deviation

takes account of all pieces of data

How well did you know this?

Not at all

Perfectly

Variance

Σ (x- x̄)² - the mean square distance from the mean cant do x - x̄ as some values will be negative easier formula to calculate: Σx² / n - ( Σx/n)²

lower case sigma

difference between sd and v

σ- standard deviation, measure of spread σ² - variant,

variance and standard deviation frequency

σ² = Σfx² / fΣ - ( Σfx/fΣ )² σ = square root of Σfx² / fΣ - ( Σfx/fΣ )²

coding

makes numbers easier to deal with

coding measure of spread

ignore adding/ taking away when converting back as this doesn't change the measure of spread

outliers

greater than Q3 + k(Q3 − Q1) less than Q1 − k(Q3 − Q1) (IQR) standard for k = 1.5

Anomalies

Anomalies can be the result of experimental or recording error, or could be data values which are not relevant to the investigation. we dont want to include anonalies as they arent representation for population

box plots and outliers

bottom line in box plot goes on the data point right after the outlier

Area of histogram

proportional to the frequency

f.d.

dividing the frequency by the class width

frequency polygon

middle of bars

regression line to work out y value

you cant predict, as the regression is of y on x not regression of x on y

independant variable

x axis

dependant variable

y axis

can you use graph to predict poits outside graph

you shouldnt extrapolate outside of the range

interpolation

using values in range to predict a value in the set of data

an experiment is

a repeatable process that gives rise to a number of outcomes

an event

An event is a collection of one or more outcomes.

sample space

A sample space is the set of all possible outcomes.

The event A and B

intersection AnB

The event A or B

AUB

A not B

ven diagrams filling values

always work out centre first, if unknown make it x

Mutually exclusive events

P(AUB) = P(A) + P(B)

independant events

(AnB) = P(A) x P(B)

probability questions

which diagram to draw? - venn - tree - simple sample space diagram try one, if it doesn't work try the other

random variable

the 'thing' whose value is the outcome of the experiment - all the outcomes of the experiment - the result of the experiment

P(X=4)

whats the probability of random variable x being 4

X~B(n,p)

n= number of trials p= number of success for each trial

■ You can model X with a binomial distribution, B(n, p), if:

● there are a fixed number of trials, n ● there are two possible outcomes (success and failure) ● there is a fixed probability of success, p ● the trials are independent of each other

can't look up in tables:

strictly <, has to be less than or equal to

Null hypothesis

H0, is the hypothesis that you assume to be correct.

Alternative hypothesis

H1, tells you about the parameter if your assumption is shown to be wrong.

Steps to answering a hypothesis testing question

1. state what propality we are looking at e.g. we are looking at whether the probablity of landing heads is less than 0.5 2. let X be the number of times the coin lands on heads X~B (10,0.5) 3. Test statistic is X=O 4. let P be the probability of getting heads 5. H0 = P =1/2 H1= P>1/2 6. set significance level at 5% P(X=O) = 0.001 0.0010< 0.05 7. we can reject H0 as there is sufficient evidence at the 5% level to suggest that the coin is bias against heads

reject H 0 if

its smaller than 0.05 outside the critical region

sample space dice thrown 4 times

{0,1,2,3,4}

Units for cloud cover

Oktas

Units for mean visibility

Units of mean pressure

HPa

Large data set Uk

Camborne (costal windier) Hurn Heathrow Leering Leuchars (costal -windier) From north to south alphabetical order expect from Hetheor and Hurn

When do we have data from large data set

We only have dates from may to October

Statistics Year 1 Flashcards

(69 cards)