Collecting, presenting and summarising data Flashcards

1
Q

Random variables (definition)

A

The quantities measured in a study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data (definition)

A

A collection of such observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Observation (definition)

A

A particular outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Population (definition)

A

The collection of all possible outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Example

Study on the height of students of A&F course at Newcastle

What would be the random variable?
What would a value of Joe Blogs measured height be called?

What would be our data?

This would be a _______ from the ____________ which consists of all students registered on A&F degrees

A
  • Our random variable is “the height of students on A&F courses at Newcastle”.
  • If Joe Bloggs is an A&F student, and we measured his height, then that value would be a single observation.
  • If we measured the height of every first year A&F student, we would have a collection of such observations which would be our data.
  • This would be a sample from the population which consists of all students registered on A&F degrees
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Ideally, to get a true idea of what is going on, we’d like to observe the whole population (take a _______). However, this can be difficult:

Why would it be difficult?

A

A census

  • If the population is huge, then this would take ages!
  • And it would be very costly!
  • In reality, we usually observe a subset of the population… but how do we choose who to observe?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Quantitative variables (2 types and explanation)

A

Discrete random variables

  • can only take a sequence of distinct values (usually integers);
  • are usually countable - e.g. the number of people attending a tutorial group;
  • can be ordinal - where the outcomes are ordered.

Continuous random variables

  • can take any value over some continuous scale - e.g. height or weight.
  • can be measured to a very high degree of accuracy (provided we have the equipment to do so) (often decimals)
  • however, we can never say precisely how much someone weighs, for example,
    might be measured to the nearest whole number - and so could “look” discrete - be careful!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sampling

What is a sample?
What is the difficulty?
What is a biased sample?

A
  • Subset of the whole population
  • Obtaining a representative sample
  • Unrepresentative and unfair
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the general forms of sampling techniques?

A
  1. Random sampling - where the members of the sample are chosen by some random (i.e. unpredictable) mechanism.
  2. Quasi-random sampling - where the mechanism for choosing the sample is only partly random.
  3. Non-random sampling - where the sample is specifically selected rather than randomly selected.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Simple Random Sampling disadvantages

A
  • We don’t have a complete list of the population
  • Not all elements, of the population are equally accessible
  • By chance, you could pick an unrepresentative sample
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Stratified sampling

What is it?
What is its main idea?

A
  • Form of random sample where clearly defined groups or strata exist within the population
  • If we know the overall proportion of the population that falls into each of these groups, we can take a simple random sample from each f the groups and then adjust the results according to the known proportions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Systematic sampling

What is it a form of?
Example?
Disadvantage?

A
  • Form of quasi-random sampling
  • For example picking every 10th item to come off the production line
  • Not entirely random and can be biased
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Multi–stage Sampling

What is it a form of?
When is it common?
How does it work?
Example?
Advantage?
Disadvantage?

A
  • This is another form of quasi–random sampling.
  • These types of sampling schemes are common where the population is spread over a wide geographic area which might be
    difficult or expensive to sample from.
  • Multi–stage sampling works, for example, by dividing the area into geographically distinct smaller areas, randomly selecting one (or more) of these areas and then sampling, whether by random, stratified or systematic sampling schemes within these areas.
  • For example, if we were interested in sampling school children, we might take a random (or stratified) sample of education authorities, then, within each selected authority, a random (or stratified) sample of schools, then, within each selected school, a random (or stratified) sample of pupils.
  • This is likely to save time and cost less than sampling from the whole population.
  • The sample can be biased if the stages are not carefully selected. Indeed, the whole scheme needs to be carefully thought through and designed to be truly representative.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Cluster Sampling

What is it?
What does it differ from?
Advantage?
Disadvantage?
Example?

A
  • This is a method of non–random sampling. For example, a geographic area is
    sub–divided into clusters and all the members of a particular cluster are then surveyed.

This differs from multi–stage sampling covered in Section 3.2.4 where the members of the cluster were sampled randomly. Here, no random sampling occurs.

  • The advantage of this method is that,
    because the sampling takes place in a concentrated area, it is relatively inexpensive to perform.
  • The very fact that small clusters are picked to allow an entire cluster to be surveyed introduces the strong possibility of bias within the sample. If you were interested in the take up of organic foods and were sampling via the cluster method you could easily get biased results;
  • if, for example, you picked an economically deprived area, the proportion of those surveyed that ate organically might be very low, while if you picked a middle class suburb the proportion is likely to be higher than the overall population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Judgemental sampling

What is it?
Advantage?
Example?
Disadvantage?

A
  • Here, the person interested in obtaining the data decides whom they are going to ask.
  • This can provide a coherent and focused sample by choosing people with experience
    and relevant knowledge to provide their opinions.
  • For example, the head of a service
    department might suggest particular clients to survey based on his judgement. They
    might be people he believes will be honest or have strong opinions.
  • This methodology is non–random and relies on the judgement of the person making the choice. Hence, it cannot be guaranteed to be representative. It is prone to bias
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Accessibility sampling

What is it?
Disadvantage?
Example?

A
  • Here, only the most easily accessible individuals are sampled.
  • This is clearly prone to bias and only has convenience and cheapness in its favour.
  • For example, a sample of grain taken from the top of a silo might be quite unrepresentative of the silo as a whole
    in terms of moisture content.
16
Q

Quota Sampling

How is it similar/different?
What do we do?
Example?
Advantages?
Disadvantages?

A
  • This method is similar to stratified sampling but uses judgemental (or some other)
    sampling rather than random sampling within groups.
  • We would classify the population by any set of criteria we choose to sample individuals and stop when we have reached our quota.
  • For example, if we were interested in the purchasing habits of 18–23 year old male students, we would stop likely candidates in the street; if they matched the requirements we would ask our questions until we had reached our quota of 50 such students.
  • This type of sampling can lead to very accurate results as it is specifically targeted, which saves time and expense.
  • The accurate identification of the appropriate quotas can be problematic. This method is highly reliant on the individual interviewer selecting people to fill the quota. If this is done poorly bias can be introduced into the sample.
17
Q

Frequency tables for categorical data

A

This gives us a much clearer picture of the methods of transport used. Also of interest
might be the relative frequency of each of the modes of transport. The relative
frequency is simply the frequency expressed as a proportion of the total number of
students surveyed. If this is given as a percentage, as here, this is known as the
percentage relative frequency

https://newcastle-my.sharepoint.com/:i:/r/personal/c3023551_newcastle_ac_uk/Documents/Pictures/Screenshot%202023-12-09%20043741.png?csf=1&web=1&e=w2g0be

18
Q

Frequency tables for continuous data

What are some things to think about:

A

With discrete data, and especially with small data sets, it is easy to count the
quantities in the defined categories. With continuous data this is not possible. Strictly
speaking, no two observations are precisely the same. With such observations we group
the data together

Some things to think about:

  • Often for simplicity we would write the class intervals up to the number of
    decimal places in the data and avoid using the inequalities; for example, 20 up to
    29.999 if we were working to 3 decimal places.
  • We need to include the full range of data in our table and so we need to identify
    the minimum and maximum points (sometimes our last class might be “greater
    than such and such”).
  • The class interval width should be a convenient number – for example 5, 10, or
    100, depending on the data. Obviously we do not want so many classes that each
    one has only one or two observations in it.
  • The appropriate number of classes will vary from data set to data set; however,
    with simple examples that you would work through by hand, it is unlikely that
    you would have more than ten to fifteen classes

https://newcastle-my.sharepoint.com/:i:/r/personal/c3023551_newcastle_ac_uk/Documents/Pictures/Screenshot%202023-12-09%20044132.png?csf=1&web=1&e=SflZ6g

19
Q

Stem and Leaf plots

A

Stem and leaf plots are a quick and easy way of representing data graphically. They
can be used with both discrete and continuous data

Extra digits are cut and not rounded

https://newcastle-my.sharepoint.com/:i:/r/personal/c3023551_newcastle_ac_uk/Documents/Pictures/Screenshot%202023-12-09%20044336.png?csf=1&web=1&e=ersoHH

20
Q

hy use percentage relative frequency?

A
  • It puts both samples on the same scale
21
Q

how do you use polygons ?

A
  • Join the midpoints with straight lines in the histogram
22
Q

How do you do cumulative relative polygons ?

A
  • Add data on top oh each other
  • take the endpoints instead of the midpoints
  • Start with 0
23
Q

What to do with grouped data for means

A

Multiply the midpoint by the quantities, add it all together and divide by the frequency

24
Q

What to note about quartiles?

A

If there are 20 observations, you will pick the 5 1/4th smallest observation because that is 20+1=21/4

To find the upper quartile it would be the 21/4 *3 = 15 3/4th smallest observation

25
Q

How to calculate variance

A
  1. Determine the mean of your data.
  2. Find the difference of each value from the mean.
  3. Square each difference.
  4. Calculate the squared values.
  5. Divide this sum of squares by n – 1 (sample) or N (population).
26
Q
A