Chapter 1: Statistics Flashcards
What is a measure of central tendency?
The centralpoint around which the data seems to be clustered.
What is the most important measure of central tendency?
The arithmetic average
What is a measure of dispersion?
How closely clustered data points are to the central tendency.
What is the most important measure of dispersion?
Standard deviation
The bigger the SD the bigger the what?
The bigger the standard deviation figure the bigger the level of dispersion around the arithmetic mean. In other words, the bigger the standard deviation, the more spread out the data will be.
What is primary data?
Data that an investigator has collected themselves.
What are the advantages of primary data?
The investigator knows the conditions under which the data was collected and is aware of any limitations it may contain.
What is secondary data?
Secondary data is collected by many organisations, such as companies, government agencies and other bodies which have been formed specifically to gather economic and social data in a convenient form. The Office for National Statistics (ONS), for example, collects economic data on inflation and employment.
What are the disadvantages of using secondary data?
Users of secondary data may not have a full understanding of the background and circumstances under which the data was initially collected. Consequently, users of secondary data may be unaware of any limitations it may contain.
What are some other sources of secondary data?
- Bank of England
- HM Treasury
- Credit rating agencies, such as Fitch, Moody’s and S&P
What is discrete data?
Data where the units of measurement cannot be split up. For example, if the data refers to the number of people using a particular tube station each day, then the recorded figures might be 824 or 825 people, but never 824 ½.
What is categorical data?
Data can be put into groups or categories, for example, the answers to a question could be coded 1 for yes, 2 for no and 3 for maybe. This process separates the responses to form categorical data.
What are descriptive statistics?
Descriptive statistics are used to describe the basic features of the data. They provide simple summaries about the sample and the measures.
Is it typically possible to apply descriptive statistics to categorical data?
No! It is not generally possible to directly apply descriptive statistics to categorical data as the actual number itself is arbitrary.
What is ordinal data?
Categorical data may be ranked or ordered according to set criteria, e.g. a first or second class degree. It is the order of these numbers that matters. This is known as an ordinal data. Ordinal data allows for the use of descriptive statics to compare the data using numbers and scales.
What is continuous data?
Continuous data is where the units have a constant scale and all points between the units have meaning. For example, the distance travelled by a person to work can be expressed as 5 miles, 5.1 miles, 5.12 miles and so on, to an unlimited number of decimal places.
What does the level of accuracy in recording continuous data depend on?
The precision of the measuring device itself.
What is a population?
A population is the entire set of items which have the desired characteristics under investigation. For example, if the TV viewing habits of males under 40 years of age was under investigation, then the population refers to all males under 40 years of age
What is an advantage and disadvantage of using a population?
A population will give a complete set of data but will be very difficult and time consuming to collect.
What is a sample?
A sub-set of items taken from the population with the characteristics under investigation.
What are the two ways a sample can be selected?
On a random or non-random basis.
What is a random sample?
A sample selected in such a way that every member of the population has an equal chance of being selected.
What is a random sample?
A sample selected in such a way that every member of the population has an equal chance of being selected.
What is another name for non-random sampling?
Non-probability method of selection.
What is quota-sampling?
It is non-random selection which is often used in market research. Such a quota is usually categorised into different types of individual members, e.g. professional or manual workers, with ‘sub-quotas’ for each type.
Explain sampling vs. quota sampling vs. stratified sampling
Sampling might involve interviewing the first 100 people an investigator meets in a city centre, (i.e. the quota). Quota sampling might involve using data on the first 52 women and 48 men interviewed in order to reflect the gender split of the UK. If the 52 women and 48 men were selected randomly this would be called stratified sampling. Stratified sampling is designed to reduce sampling error, it does this by selecting a sample that represents the population.
What is systematic sampling?
It’s another form of non-random sampling. This is where researchers select the nth record of a population. For example, if analysing how far your employees travel to work on average, we may ask every fifth person on an alphabetical list of employees.
What is convenience sampling?
Choosing the sample that is easiest to collect information from. Choosing people in your local town to represent the UK, for example.
What is judgement sampling?
Making a judgement of the sample that would best represent the population, for example, believing Swindon is a good representation of the UK.
What is snowball sampling?
This is typically used when the subjects of the data are rare. It relies on referrals from initial subjects.
What is a relative frequency distribution table?
A relative frequency distribution table allows us to see the category in comparison with the total frequency. Each frequency is calculated as a percentage of the whole.
What are the two main methods used to present discrete data?
bar charts and pie charts
What are the 4 main methods of visually presenting continuous data?
- Histograms
- Time series graphs
- Semi-log graphs
- Scatter diagrams
Histograms and bar charts look similar, but what is the difference between the two?
The area (not the height) of the bar on a histogram represents the frequency of occurrence.
What is a time-series graph?
A time series graph displays the path of a variable (e.g. a share price) in chronological order.
What is a log (semi-log) graph used to illustrate?
A (semi-) log graph is used to illustrate the rate of change of a variable. A log graph is constructed in order to determine the rate of acceleration over time.