Quantitative Methods - Sources of Data Flashcards
How may primary data be obtained?
Scientific investigation or research
Observation
Discussion
Questionnaires and market research
What are the types of analysis?
Fundamental, technical and quantitative analysis
What is primary data?
Collection or generation of new data
What are two key characteristics of primary data?
It is both time consuming and expensive to produce
What is secondary data?
Data gained through a source
What is important about secondary data?
It is readily available and relatively low cost
Where is secondary data available (what mediums)
In written publications and on computer databases
What are some frequently used sources of published/written secondary data?
Office for national statistics - provides economic, financial, social and employment data
Bank of England quarterly bulletin - show interest rates, inflation rates etc
Federal reserve bulletin - similar info to BOE produced monthly
IMF’s international financial statistics
Bank for international settlement published data regarding international banking cash flows
World bank and the Organisation for Economic co-operation and development
What are some computerised sources of secondary data?
Datastream (part of Thompson Reuters) and Bloomberg provide historical data regarding securities and a range of other economic variables
Extel (part of Thompson Reuters) provides summarised data and stats such as P/E ratios obtained from company accounts
What is a population?
All members of a specifically defined group e.g. All FTSE 100 companies or everyone of voting age living in London
What is the primary reason sample are used more than populations?
Samples are much cheaper
What is a sample?
A subset of the full population e.g the CPI uses a sample even though it is taken to be representative of all goods
What is important about choosing a sample?
As we are drawing conclusions about the full population then the sample must be representative of the full population. Great care is therefore needed in the selection and size of the sample
What are the key sampling methods?
Random Non-random Quota sampling (non-random) Panelling Postal or Telephone Surveys (this is non random even though it try's to be random)
What are the two types of samples?
Random
Non random
What is random sampling?
Where every item in the population has an equal chance of being selected.
What is important about random sampling?
If a sample is large enough it should be representative of the population. Indeed the margin for error can be statistically evaluated when such a technique is used correctly.
It is hard to achieve a purely random sample e.g. a survey conducted in a city centre would probably exclude the possibility of obtaining the views of a substantial part of the population. Therefore this is not a pure random sample
What is non random sampling?
A sample selection on a basis that will, to a degree, involve and element of judgement
What is Panelling?
A hopefully representative sample is selected to provide continuous info over a period of time. TV figures are obtained this way. A panel of individuals report on their viewing habits - TV companies don’t know what everyone is watching all the time!
What are Postal or telephone surveys?
Conducted by phone or post. However it is not random as not everyone has a telephone (hence they can’t be selected) and not everyone will choose to respond to a postal survey.
There is a reasonable chance of obtaining an atypical response since the average person may not reply but those with strong views may.
What is an appropriate sample size?
Based on statistical theory, random sampling around 1,000 are considered adequate and reliable in relation to individuals in the UK
What are the types of data?
Continuous
Discrete
Categorical or nominal
Original
What is continuous data?
Data that can take any value whatsoever, stats such as height, weight, temperature etc fall into this category
What is discrete data?
Data which can only take certain specific values, such as whole numbers. In the financial markets, data is most frequently in this form as money changes hands in whole units.
What is categorical or nominal data?
Data classified into a number of distinct categories. Collection of this data is seen on census forms and market research questionnaires, where a box is ticked in response to questions such as ‘which of the following newspapers do you read (followed by a list of popular dailies), do you drive a car, did you vote in the last election etc
When processing this data on a computer, we may assign a number to each band. However this number does not convey any other information and cannot be used to calculate such statistics as the standard deviation. Such data can only be used as simple statistic, such as 30% of people drive cars
What is ordinal data?
Data classified into a number of distinct ranked categories. The star system for hotels is an example or the classification of university degrees
When assigning numbers to this type of data for processing, care should be taken in trying to draw statistical conclusions. Standard deviation would be inappropriate and only such measures which are based on the position within the order, such as the median (also covered later), should be considered.
What is frequency distribution?
It groups data into bands of specific value and displays the frequency of occurrence of arch band.
Tabulating into a frequency distribution represents a very powerful way of presenting and summarising data, though care needs to be taken in the selection of the size of the bands.
What is a relative frequency distribution?
It displays the same data as a percentage of the sample or population size, rather than as actual observed frequencies
A relative frequency distribution would, possibly, be more appropriate where a more direct comparison between ban dings is desired or where the sample size has been exceptionally large and the scale of the numbers may obscure their understanding
Relative frequency distribution is useful for determining the relative historical frequency of occurrence that may, in turn, be useful for determining the probable future distribution. In this context it may be referred to as a probability distribution.
In probability occurrence pathetic sum of all the probabilities must add up to to 1 or 100%.
Probability and relative frequency are synonymous, for example the probability of a fair coin landing on heads when tossed is 0.5 or 50%, there are two sides and it will land on them with equal frequency.
What is cumulative frequency distribution ?
Shows the number/percentage of a sample or population with a value less than or equal to a given figure. It can be used in addition to either frequency distribution or relative frequency distribution.
What is quota sampling?
A popular non-random sampling technique where a sample is selected which is believed to be representative of the full population.
Help for this may be obtained from census info which enables us to get a picture of the proportion of the population exhibiting a range of characteristics. A sample can then be selected which displays these characteristics and, hence, should be reasonably representative.
This is the typical approach used in market research
What is key about data interval or band width?
Selection is very important
It is perhaps an even more acute problem when we are considering continuous data where we must be very careful to ensure that all items are included within a band, but only within one band.
Data must be described as thus. Greater than or equal to 20 but less than 24 etc
What is an issue with tables?
The detail in numbers may obscure the understanding slightly. It is possible that the same level of info can be conveyed more easily by charts and tables
What is a pie chart?
Method of representing relative frequency by dividing a circle into sections whose area is proportionate to the relative frequency. It is of most use when communication categorical data.
1% represents 3.6 degrees on a pie chart
What is a bar chart?
A chart which represents through the height of the bar, he number or percentage of items displaying a particular characteristic.
What is a component bar chart?
Convey more info than a bar chart where each column is broken down to show e.g the number of stores in the north who sold between 20-29, 30-39 etc whilst still showing the total number of stores in the north
What is a histogram?
It displays the number or percentage of items falling within a given band through the area of a bar.
Usually a histogram is used to describe circumstances where one bar is used to represent a range of values for continuous data. Where discrete data is grouped, it may be represented as a histogram as if it were continuous.
What is a potential problem with histograms?
When extreme bands are described as greater or less than something. Bands must be bounded, but what width should they be made.
There are no rules for this and require judgement by the researcher. If a definite upper and lower limit is known then, these will provide obvious bounds. If they are unknown, a bound must be assumed since this histogram cannot be drawn unless all areas are bound.
What is a consequence of bounding histograms?
How tall they need to be made. If any bands are wider than other, their heights will have to be scaled down proportionally to ensure that the area of that band still reflects the number of items.
This problem is most likely to arise in the context of extreme bands however it may also be applicable to other bands of differing width
In graphs which is the x-axis and which is the y-axis?
The x-axis is the horizontal one and the y-axis is the vertical one
What is the independent variable?
The variable thought to be responsible for causing the change
What is the term for the variable thought to be responsible for causing the change?
Independent variable
What is the dependent variable?
The variable whose value is driven by the x value and whose change we are seeking to predict
What is the name of the variable whose value is driven by the x-value and whose change we are seeking to predict?
The dependent variable
What variable is placed on the-axis and which is placed on the y-axis?
The independent variable is plotted along the x-axis whilst the dependent variable is plotted on the y-axis
What is frequently one of the variables plotted and how does this affect the graph?
A frequent requirement is plot how something has changed with time. Here the item alters with time, meaning the item is the dependent variable and time is the independent one. Time can never be altered and therefor is always the independent variable and always on the x-axis
What is an issue with curve graph?
They do not lend themselves very well to extrapolation, or predicting forward, meaning this may not be the preferred presentation and a semi-logarithmic graph may be more useful
What is important about choosing the values of the y-axis?
The value should be representative of the data being graphed
What does a semi-logarithmic scale do?
It plots the log of the value instead of the value itself on the y-axis
How will a semi-logarithmic graph read?
If something is growing at a constant rate it will appear as a straight line which is much more useful for prediction purposes. Any move away from steady growth would be shown. If growth increased the line would get steeper whilst flatter is growth decreased