Wk2 L1 Introduction to statistical variables Flashcards
Define Random?
Random is something where the outcome is uncertain
Define what a variable is?
a Variable is a thing that can take many different values
Define what a random variable is?
a Random variable is a thing that can take many values, and you can’t predict the outcome in advance
Define what a domain is?
A Domain is the set of all possible values (outcomes) of a random variable
Define what an event is?
Event: observing one of these possible outcomes in the domain
Define what frequency is?
Frequency is the number of times a particular outcome is observed
What are the 2 types of Statistical Variable?
The 2 types of Statistical Variable are:
1) Categorical (Qualitative)
2) Numerical (Quantitative)
What are the 2 types of Categorical (Qualitative) Variable?
The 2 types of Categorical (Qualitative) Variable are:
1) Nominal (or binary)
2) Ordinal
What are the 2 types of Numerical
(Quantitative) Variable?
the 2 types of Numerical
(Quantitative) Variable:
1) Ratio scale
2) Interval scale
What is a nominal variable?
A nominal variable is just a name used to represent a category, class or group
{Spain, France, Italy, Germany,…}
{Red, Yellow, Green, Blue, …}
{Jo, Alex, Sara, Dan, …}
It does not indicate size or level
It makes no sense to do numerical calculations with nominal variables (e.g. Spain - France =??)
What are binary nominal variables?
binary variables are a class of 2
{True, False}; {Alive, Dead};
What are Ordinal categorical variables?
Ordinal categorical variables:
Now there is a logical order to the categories
{Gold, Silver, Bronze}
We still can’t do numerical calculations
Most numerical calculations directly using ordinal data are meaningless
What are cardinal variables?
Quantitative Cardinal variables are:
Each level denotes an actual number and there are equal intervals between consecutive levels
What are interval variables?
Interval scale variables have no “natural” or true zero: the zero is chosen arbitrarily
E.g. the interval between 12⁰C and 13⁰C is the same as the interval between 55.1⁰C and 56.1⁰C
We can’t use ratios for interval variables: it doesn’t make sense to say 32⁰C is twice as hot as 16⁰C
What are ratio variables?
This is the same as an interval variable, except now there is a true, universal zero
What is discrete data?
Discrete data is where each data point is distinct and separate, and there are gaps between the points. e.g small medium or large in tshirts
Discrete variables are made up of naturally distinct events: count data which can only take certain values, with gaps between them
Number of goals scored in a football match; number of songs in a playlist; shoe sizes
Not always just whole numbers, but usually
What is continuous data?
Continuous data is where there are no gaps anywhere between data. e.g height rather than shoe size
can take any value between some (possibly theoretical) minimum and maximum, with no gaps
Time taken to run a marathon; latitude and longitude on a map; speed of a F1 racing car
What are the three sources of primary data?
The three Primary Sources of Data Collection are:
1) Observation
2) survey
3) experimentation
What are the two sources of secondary data?
The two sources of secondary data are:
1) print
2) electronic
what is the advantage of observation?
The advantage of observation is that it is reliable
what is the disadvantage of observation?
The disadvantage of observation is that it is subjective to measuerer
what are the two advantages of face to face interview?
the two advantages of face to face interview are:
1) High response rate
2) clarifies questions
what are the two disadvantages of face to face interview?
The two disadvantages of face to face interview are:
1) costly
2) open to interpretation by interviewer
what are the two disadvantages of a telephone interview?
The two disadvantages of telephone interview are:
1) annoys people
2) bias to those with phones
what are the two advantages of a telephone interview?
The two advantages of telephone interview are:
1) high responses
2) cheaper than face to face
What is the advantage of a postal interview?
the advantage of a postal interview is that its is relatively cheap
What are the two disadvantages of a postal interview?
the two disadvantages of a postal interview are:
1) bias to people who open their mail
2) low response
What is the advantage of a panel survey?
the advantage of a postal interview is that its is dynamic over time
What are the three disadvantages of a panel survey?
the three disadvantages of a panel survey are:
1) expensive
2) panel may shrink
3) panellists become experts (bias)
What is the advantage of a longitudinal survey?
the advantage of a longitudinal survey is long term change
What is the disadvantage of a longitudinal survey?
the disadvantage of a longitudinal survey is small sample sizes
what are the two advantages of internet/email survey?
the two advantages of internet/email survey are:
1) low response rate
2) bias to those with internet access
Define a population?
A population is the whole,
complete set; e.g. all students
at the University of Southampton
define a sample?
A sample is a subset (a part) of the population: e.g. all students taking MANG 1019
What is bias?
A sample is biased if some parts of the population have a better chance of being represented than others
What is a quota sample?
A quota sample specifies that we must include a given number from this subset
E.g. we may be curious to know how many Man Utd supporters actually come from Manchester, so a quota sample would specify that at least 20 (say) students in the sample must be from Manchester
What is simple random sampling?
We first need a complete sampling frame, i.e. a list of all members of the population, which we can then use to select our random sample
E.g. as we have an email address for every Southampton student, we could easily pick a random sample to ask
What is Stratified random sampling?
Divide population into “strata’’
Each stratum is a group or segment that is distinctive from others
The elements in each stratum are “homogeneous”, i.e. have similar characteristics
E.g. undergrad/postgrad (2 strata); by Faculty (5 strata); by academic discipline (at least 30 strata)
Select a simple random sample from each stratum
Guarantees a “representative” sample