STATS Flashcards

1
Q

What is a population?

A

The whole set of items of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a census?

A

A census observes every member of a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a sample?

A

A selection of observations taken from the subset of a population to find out information about the population as a whole

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Pros and cons of a census

A

It should give a completely accurate result

Time consuming
Cannot be used when testing involves destruction of item
Hard to process large data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Pros and cons of a sample

A

Less time consuming than a census
Fewer people required to respond
Less data to process than in a census

Data may not be as accurate as census
Sample may not be large enough to give information about small sub groups of population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can the size of a sample affect its validity

A

Larger sample - more accurate, more resources needed
If population is varied, larger sample is needed as opposed to if population is uniform
Different samples lead to different conclusions due to natural variation in population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a sampling unit?

A

Individual units of a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a sampling frame?

A

A list in which sampling units of a population are named/numbered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Characteristics of random sampling
+ what are the three types

A

Every member of the population has an equal chance of being selected - sample is therefore representative, and should be free of bias

Simple random
Systematic
Stratified

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to perform simple random sampling
+pros/cons

A

Use a sampling frame, e.g. a list of the units. Each unit is allocated a specific number and a number is selected at random. Can be done using a random number generator (if you generate a repeated number, ignore and go again)

No bias, easy to implement for small populations/samples, each unit has a known and equal chance of selection.

not suitable for large populations/sample sizes
Sampling frame required

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to carry out systematic sampling

+pros/cons

A

Required units are chosen at regular intervals from a randomly ordered list.

If a sample of size 20 was required, choose a random number between 1 and 5, then continue to pick each 5th item.

Simple, suitable for large samples/populations
Sampling frame needed, bias could be present if list isn’t randomly ordered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to carry out stratified sampling
+pros/cons

A

Divide population into mutually exclusive strata, e.g. males and females, and a random sample is taken from both

Sample accurately reflects population structure. Proportional representation guaranteed.

Population must be clearly classified into distinct strata. Selection in each stratum suffers same cons as simple random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Types of non random sampling

A

Quota sampling - interviewer selects a sample that reflects characteristics of the whole population. Divide population into groups according to given characteristic, the size of each group determines the proportion of the sample with each characteristic. Interviewer would meet people, asses their group, interview them then allocate into correct quota. Repeat until each quota is filled. If someone refuses to answer just onto the next.

Opportunity sampling - Taking sample from people available at the time who fit the criteria. e.g. standing outside tesco to ask ppl if they shop at tesco 3x a week

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Pros/cons of quota sampling

A

Allows for comparison between diff groups
Allows a small sample to still be representative of population
No sampling frame needed
Quick easy cheap

Non random sampling can introduce bias
Population must be divided into groups - cld be costly or inaccurate
Increasing scope of study increases number of groups - can be costly/time consuming
non responses aren’t recorded

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Pros/cons of opportunity sampling

A

Easy to carry out, inexpensive

Unlikely to provide a representative sample, results depend on interviewer (chariz)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Quantitative vs Qualitative data

Discrete vs continuous

A

Quantitative - variables/data associated with numerical values
Qualitative - associated with non numerical values

Continuous - can take any value in a given range
Discrete - can only take specific values

17
Q

What is variance?

remember proof

A

The average squared distance from the mean

(A measure of spread that takes all values into account,+ the fact that each data point varies from the mean by an amount x-x̄)

https://www.youtube.com/watch?v=9EgRztlWQH4

18
Q

Variance equation + units

A

Variance (σ^2) = Σ(x-x̄)²/n
= Sₓₓ/n (Sₓₓ is a summary statistic)

or x²/n - (x/n)²

units are in units of data squared

19
Q

What is standard deviation?

A

The square root of the variance σ

20
Q

Variance for grouped data

A

Σf(x-x̄)² ÷ Σf

basically find mean of data, then find mean distance squared and divide by frequency

21
Q

Effects of different linear coding on standard deviation and mean

A

Addition/subtraction - increases mean by added amount. no change to standard deviation
Multiplication/division - Scales mean up to multiplied value, as well as standard deviation

Any transformation done to standard deviation will be squared to variance

note squaring all values won’t square the mean

22
Q

For a random sample of size n

A

every member of the population is equally
likely to be included
all subsets of the population of size n must be
possible
or that
● every possible sample of size n must be
equally likely to occur.

23
Q

Snowball sampling

A

Interview one person who refers another person to be sampled, can be one person (linear) to multiple (exponential)
Used when participants are hard to find + reach hidden populations
Short duration

May be only able to reach out small population. Can lead to sampling bias

Exponential discriminante when only one is recruited from an exponential sample

24
Q

Cluster sampling

A

Cluster sampling is where a population is split into groups and then only one of the groups is used as the sample

25
Q

Quota and convenience sampling

A

Quota sampling is similar to stratified except the members from each group are not chosen randomly. For example, 15 fish from a lake are required for a sample. 10 should be trout and 5 should be cod. The person doing the experiment might just use the first 10 trout and the first 5 cod they catch as their sample.

Convenience sampling is where the person doing the experience using whatever is the easiest method. For example, they could ask the first 100 people to walk past them on a street.

26
Q

Simple vs unrestricted random sampling

A

Simple - subject selected once
Unrestricted - subject selected multiple times (without replacement)

27
Q

Plotting cumulative frequency vs polygon

A

Cumulative is lowest value, polygon is midpoint

28
Q

What to comment on when comparing data sets

A

Measures of location
Measures of spread

29
Q

Correlation only used

A

With linear relation, variables with no linear correlation could still have a relationship

30
Q

Bivariate data

A

Data with pairs of values for two variables

31
Q

Least squares regression line

A

Line that minimises sum of the squares of the distances from each data point to the line