Collecting, presenting and summarising data Flashcards

Question 1

Q

Random variables (definition)

Answer

A

The quantities measured in a study

Question 2

Q

Data (definition)

Answer

A

A collection of such observations

Question 3

Q

Observation (definition)

Answer

A

A particular outcome

Question 4

Q

Population (definition)

Answer

A

The collection of all possible outcomes

Question 5

Q

Example

Study on the height of students of A&F course at Newcastle

What would be the random variable?
What would a value of Joe Blogs measured height be called?

What would be our data?

This would be a _______ from the ____________ which consists of all students registered on A&F degrees

Answer

A

Our random variable is “the height of students on A&F courses at Newcastle”.
If Joe Bloggs is an A&F student, and we measured his height, then that value would be a single observation.
If we measured the height of every first year A&F student, we would have a collection of such observations which would be our data.
This would be a sample from the population which consists of all students registered on A&F degrees

Question 6

Q

Ideally, to get a true idea of what is going on, we’d like to observe the whole population (take a _______). However, this can be difficult:

Why would it be difficult?

Answer

A

A census

If the population is huge, then this would take ages!
And it would be very costly!
In reality, we usually observe a subset of the population… but how do we choose who to observe?

Question 7

Q

Quantitative variables (2 types and explanation)

Answer

A

Discrete random variables

can only take a sequence of distinct values (usually integers);
are usually countable - e.g. the number of people attending a tutorial group;
can be ordinal - where the outcomes are ordered.

Continuous random variables

can take any value over some continuous scale - e.g. height or weight.
can be measured to a very high degree of accuracy (provided we have the equipment to do so) (often decimals)
however, we can never say precisely how much someone weighs, for example,
might be measured to the nearest whole number - and so could “look” discrete - be careful!

Question 8

Q

Sampling

What is a sample?
What is the difficulty?
What is a biased sample?

Answer

A

Subset of the whole population
Obtaining a representative sample
Unrepresentative and unfair

Question 9

Q

What are the general forms of sampling techniques?

Answer

A

Random sampling - where the members of the sample are chosen by some random (i.e. unpredictable) mechanism.
Quasi-random sampling - where the mechanism for choosing the sample is only partly random.
Non-random sampling - where the sample is specifically selected rather than randomly selected.

Question 10

Q

Simple Random Sampling disadvantages

Answer

A

We don’t have a complete list of the population
Not all elements, of the population are equally accessible
By chance, you could pick an unrepresentative sample

Question 11

Q

Stratified sampling

What is it?
What is its main idea?

Answer

A

Form of random sample where clearly defined groups or strata exist within the population
If we know the overall proportion of the population that falls into each of these groups, we can take a simple random sample from each f the groups and then adjust the results according to the known proportions

Question 12

Q

Systematic sampling

What is it a form of?
Example?
Disadvantage?

Answer

A

Form of quasi-random sampling
For example picking every 10th item to come off the production line
Not entirely random and can be biased

Question 13

Q

Multi–stage Sampling

What is it a form of?
When is it common?
How does it work?
Example?
Advantage?
Disadvantage?

Answer

A

This is another form of quasi–random sampling.
These types of sampling schemes are common where the population is spread over a wide geographic area which might be
difficult or expensive to sample from.
Multi–stage sampling works, for example, by dividing the area into geographically distinct smaller areas, randomly selecting one (or more) of these areas and then sampling, whether by random, stratified or systematic sampling schemes within these areas.
For example, if we were interested in sampling school children, we might take a random (or stratified) sample of education authorities, then, within each selected authority, a random (or stratified) sample of schools, then, within each selected school, a random (or stratified) sample of pupils.
This is likely to save time and cost less than sampling from the whole population.
The sample can be biased if the stages are not carefully selected. Indeed, the whole scheme needs to be carefully thought through and designed to be truly representative.

Question 14

Q

Cluster Sampling

What is it?
What does it differ from?
Advantage?
Disadvantage?
Example?

Answer

A

This is a method of non–random sampling. For example, a geographic area is
sub–divided into clusters and all the members of a particular cluster are then surveyed.

This differs from multi–stage sampling covered in Section 3.2.4 where the members of the cluster were sampled randomly. Here, no random sampling occurs.

The advantage of this method is that,
because the sampling takes place in a concentrated area, it is relatively inexpensive to perform.
The very fact that small clusters are picked to allow an entire cluster to be surveyed introduces the strong possibility of bias within the sample. If you were interested in the take up of organic foods and were sampling via the cluster method you could easily get biased results;
if, for example, you picked an economically deprived area, the proportion of those surveyed that ate organically might be very low, while if you picked a middle class suburb the proportion is likely to be higher than the overall population

Question 15

Q

Judgemental sampling

What is it?
Advantage?
Example?
Disadvantage?

Answer

A

Here, the person interested in obtaining the data decides whom they are going to ask.
This can provide a coherent and focused sample by choosing people with experience
and relevant knowledge to provide their opinions.
For example, the head of a service
department might suggest particular clients to survey based on his judgement. They
might be people he believes will be honest or have strong opinions.
This methodology is non–random and relies on the judgement of the person making the choice. Hence, it cannot be guaranteed to be representative. It is prone to bias

Question 16

Q

Accessibility sampling

What is it?
Disadvantage?
Example?

Answer

A

Here, only the most easily accessible individuals are sampled.
This is clearly prone to bias and only has convenience and cheapness in its favour.
For example, a sample of grain taken from the top of a silo might be quite unrepresentative of the silo as a whole
in terms of moisture content.

Question 17

Q

Quota Sampling

How is it similar/different?
What do we do?
Example?
Advantages?
Disadvantages?

Answer

A

This method is similar to stratified sampling but uses judgemental (or some other)
sampling rather than random sampling within groups.
We would classify the population by any set of criteria we choose to sample individuals and stop when we have reached our quota.
For example, if we were interested in the purchasing habits of 18–23 year old male students, we would stop likely candidates in the street; if they matched the requirements we would ask our questions until we had reached our quota of 50 such students.
This type of sampling can lead to very accurate results as it is specifically targeted, which saves time and expense.
The accurate identification of the appropriate quotas can be problematic. This method is highly reliant on the individual interviewer selecting people to fill the quota. If this is done poorly bias can be introduced into the sample.

Question 18

Q

Frequency tables for categorical data

Answer

A

This gives us a much clearer picture of the methods of transport used. Also of interest
might be the relative frequency of each of the modes of transport. The relative
frequency is simply the frequency expressed as a proportion of the total number of
students surveyed. If this is given as a percentage, as here, this is known as the
percentage relative frequency

https://newcastle-my.sharepoint.com/:i:/r/personal/c3023551_newcastle_ac_uk/Documents/Pictures/Screenshot%202023-12-09%20043741.png?csf=1&web=1&e=w2g0be

Question 19

Q

Frequency tables for continuous data

What are some things to think about:

Answer

A

With discrete data, and especially with small data sets, it is easy to count the
quantities in the defined categories. With continuous data this is not possible. Strictly
speaking, no two observations are precisely the same. With such observations we group
the data together

Some things to think about:

Often for simplicity we would write the class intervals up to the number of
decimal places in the data and avoid using the inequalities; for example, 20 up to
29.999 if we were working to 3 decimal places.
We need to include the full range of data in our table and so we need to identify
the minimum and maximum points (sometimes our last class might be “greater
than such and such”).
The class interval width should be a convenient number – for example 5, 10, or
100, depending on the data. Obviously we do not want so many classes that each
one has only one or two observations in it.
The appropriate number of classes will vary from data set to data set; however,
with simple examples that you would work through by hand, it is unlikely that
you would have more than ten to fifteen classes

https://newcastle-my.sharepoint.com/:i:/r/personal/c3023551_newcastle_ac_uk/Documents/Pictures/Screenshot%202023-12-09%20044132.png?csf=1&web=1&e=SflZ6g

Question 20

Q

Stem and Leaf plots

Answer

A

Stem and leaf plots are a quick and easy way of representing data graphically. They
can be used with both discrete and continuous data

Extra digits are cut and not rounded

https://newcastle-my.sharepoint.com/:i:/r/personal/c3023551_newcastle_ac_uk/Documents/Pictures/Screenshot%202023-12-09%20044336.png?csf=1&web=1&e=ersoHH

Question 21

Q

hy use percentage relative frequency?

Answer

A

It puts both samples on the same scale

Question 22

Q

how do you use polygons ?

Answer

A

Join the midpoints with straight lines in the histogram

Question 23

Q

How do you do cumulative relative polygons ?

Answer

A

Add data on top oh each other
take the endpoints instead of the midpoints
Start with 0

Question 24

Q

What to do with grouped data for means

Answer

A

Multiply the midpoint by the quantities, add it all together and divide by the frequency

Question 25

Q

What to note about quartiles?

Answer

A

If there are 20 observations, you will pick the 5 1/4th smallest observation because that is 20+1=21/4

To find the upper quartile it would be the 21/4 *3 = 15 3/4th smallest observation

Question 26

Q

How to calculate variance

Answer

A

Determine the mean of your data.
Find the difference of each value from the mean.
Square each difference.
Calculate the squared values.
Divide this sum of squares by n – 1 (sample) or N (population).

Question 27

Q