Statistics Flashcards
Which of the following correctly describes the formula for a line of best fit?
A y=bx
B y=(a+b)x
C y=a+bx
D y = a+b+x
C - y=a+bx
y is the dependent variable and x is the independent variable.
If 10% of the workforce earn £30,000 or less, 13% earn between £30,000 and £40,000, 42% earn between £40,000 and £50,000, 25% earn between £50,000 and £60,000, and 10% earn in excess of £60,000, what is the cumulative percentage earning less than £50,000?
A 35%
B 90%
C 10%
D 65%
D - 65%
Cumulative percentage earning less than £50,000 is 10% + 13% + 42% = 65%.
Andy Nyman, a research analyst, collects monthly share price data for a company. How can this data be best categorised?
A Cross-sectional
B Continuous
C Discrete
D Categorical
B - Continuous
Share prices are continuous data, limited only in the equipment used to measure (or report) the data.
The best way to compare two variables is to use:
A A scatter diagram (scattergram)
B A histogram
C A pie chart
D A cumulative frequency graph
A - A scatter diagram (scattergram)
A scatter diagram plots one variable against another in order to determine whether there is any relationship between the two. The independent variable is always plotted on the x-axis.
The sample standard deviation of a series is:
A The square of the sample variance
B The sample variance divided by the number of values in the series
C The sample variance divided by the number of values in the series minus one
D The square root of the sample variance
D - The square root of the sample variance
Which of the following is true of scattergrams?
A They are used to represent the rate of change over time
B The vertical axis is the dependent variable
C The horizontal axis is the dependent variable
D The area under a scattergram represents the frequency
B - The vertical axis is the dependent variable
Scattergrams show the relationship between variables.
A commodity has annual inflation rates for the last five years as follows:
5% 11% -3% -9% 22%
What is the average rate of inflation?
4.65
The ‘average rate’ means the geometric mean.
First, multiply the rates together: 1.05 x 1.11 x 0.97 x 0.91 x 1.22 = 1.25512 Take the fifth root of this number: = 1.0465 Lastly subtract the 1: = 0.0465 or 4.65%.
Calculate the mean and median of the following returns on 11 shares:
-1.2, 2.8, 3.1, 4.1, 4.3, 4.4, 4.5, 6.7, 6.7, 5.2, 12.2
A Mean = 4.4, median = 4.8
B Mean = 4, median = 4.4
C Mean = 4.8, median = 4.4
D Mean = 4.8, median = 6.7
C - Mean = 4.8, median = 4.4
The mean is the sum of all the returns divided by the number of yields:
(-1.2 + 2.8 + 3.1 + 4.1 + 4.3 + 4.4 + 4.5 + 6.7 + 6.7 + 5.2 + 12.2 ) / 11
= 4.8
The median is the central value after putting them all in numerical order:
-1.2 2.8 3.1 4.1 4.3 4.4 4.5 5.2 6.7 6.7 12.2
i.e. 4.4.
In a positively skewed distribution:
A The mode will be less than the median and mean
B The mean and median are equal, but less than the mode
C The mean and mode are equal, but greater than the median
D Mean, mode and median are equal
A - The mode will be less than the median and mean
A positively skewed distribution is not symmetrical. Its peak is to the left of the spread of data, and it has a gently sloping tail to the right. The peak represents the mode, the median comes next (the central value), and above this is the mean.
Which of the following is not true of bivariate linear regression?
A x is the independent axis
B a is the intersect with the x axis
C b is the coefficient of the gradient
D y is the dependent variable
B - a is the intersect with the x axis
The inter-quartile range is:
A A measure of distribution dominated by extreme values
B A measure of distribution not dominated by extreme values
C The variance of the range
D A measure of central tendency
B - A measure of distribution not dominated by extreme values
The inter-quartile range includes only the SECOND and THIRD quartiles and is therefore not dominated by extreme values.
Jane, an analyst, wants to present what percentage of a particular gilt issue is held by each category of investor, such as hedge funds, insurance companies, etc.
Which of the following would best represent this information?
A Scattergram
B Histogram
C Tubularogram
D Pie Chart
D - Pie Chart
A pie chart shows relative percentages and is ideal for situations where the total adds up to 100%, such as the asset allocation of a portfolio, or the example in the question.
Given the following information regarding employees’ salaries:
Salary (£,000s) - Number of employees
Less than 30 - 15
31 to 40 - 25
41 to 50 - 35
51 to 60 - 15
61 or above - 10
What is the cumulative percentage of employees with an annual salary less than or equal to £50,000?
A 25%
B 35%
C 75%
D 80%
C - 75%
The total number of employees is 100. The total number of employees with a salary below £50,000 is 15 + 25 + 35 = 75. The cumulative percentage is therefore 75 / 100 = 0.75 or 75%.
An analyst wishes to plot a graph showing rates of change of share prices over time. Which of the following graphs would be the most appropriate?
A Scatter diagram
B Lorenz curve
C Pie chart
D Semi-log graph
D - Semi-log graph
Semi-log graphs (or log graphs) illustrate the RATE OF CHANGE over time.
Calculate the median of the following returns achieved by seven fund managers:
2.0% 3.2% 5.0% 6.6% 2.0% 3.3% 6.2%
A 2.0
B 3.3%
C 6.6%
D There is no median
B - 3.3%
The median is the value of the central number in an array of data. First put the data in numerical order (an array), and then find the item in the centre:
2.0 2.0 3.2 3.3(median) 5.0 6.2 6.6
NB: 2.0 is the mode
What is the most frequently occurring value in a set of data?
A The mean
B The median
C The mode
D The standard deviation
C - The mode
The mode is the most frequently occurring number in a set of data. There might be more than one mode or perhaps none at all.
The total GDP of four economies is £200m, which is made as follows: UK £100m, US £50m, France £30m, and Spain £20m. If the information were to be displayed as a pie chart, what would be the respective angles?
A 50, 25, 15, 10
B 180, 90, 74, 36
C 180, 90, 54, 36
D 100, 50, 30, 20
C - 180,90,54,36
The angles needed are assessed as follows:
GDP of country / total GDP x 360 degrees.
So the angles for each country can be assessed as follows:
UK
100 / 200 x 360 = 180 degrees.
US
50 / 200 x 360 = 90 degrees.
France
30 / 200 x 360 = 54 degrees.
Spain
20 / 200 x 360 = 36 degrees.
How are values derived from a bar chart?
A Area of the bar
B Height multiplied by width
C Width of bars
D Height of bars
D - Height of bars
Bar charts are not to be confused with histograms (which appear very similar to bar charts) where frequency is determined from the AREA of the bar.
A disadvantage of using the mode as a statistical tool is:
A It is possible to have more than one mode
B Modes are distorted by extremely high values
C Modes are distorted by extremely low values
D The mode might not be an actual value that appears in the data
A - It is possible to have more than one mode
The mode is the most frequently occurring value in a stream of data. There may be any number of values all occurring equally as often, which means that there may be more than one mode. If there are too many modes, then their usefulness as a central tendency becomes questionable.
Which of the following are advantages of the geometric mean over the arithmetic mean?
I It is less affected by extreme observations
II It is intuitively better when used for quantities that increase at a constant rate over time
III It can always be used when observations are negative
A I only
B I and II
C II and III
D I, II, and III
B - I and II
A geometric mean measures the RATE OF CHANGE of its constituents. This makes it less affected by extreme values. Handling negative values is not an advantage of geometric means over arithmetic - both can handle negative values.
Market research companies use which type of sampling technique?
A Action sampling
B Number sampling
C Quota sampling
D Taste sampling
C - Quota sampling
Quota sampling is a popular type of non-random sampling.
When R squared is low:
A The skilled user will have to question the validity of the model being used
B It shows improvement in the accuracy of the model being used
C Errors (observations away from the line of best fit) are low
D The observations are generally close to the line of best fit
A - The skilled user will have to question the validity of the model being used
A low R-squared shows poor predictive power of any model. R-Squared is a coefficient of determination and gives us an impression of the accuracy of our forecasts. It ranges from 0 to 100 where the higher the number, the more accurate the predictive power.
An analyst working for your firm has worked very hard on historical trading data and identified a strategy that would have delivered abnormal returns based on the data analysed. When the strategy is applied, however, the strategy does not seem to work.
The quantitative analyst is most likely to have engaged in:
A Data anomaly
B Data dependency
C Data incursion
D Data mining
D - Data mining
Data mining is the use of large amounts of information to try to discover relationships. The technique can be a valuable way to identify previously hidden causation, but the large scale indiscriminate processing of data means that correlations may be purely coincidental. Just because a particular trading strategy would yield abnormal returns on a particular set of securities at a particular period in history does not necessarily mean that the same strategy will yield abnormal returns in the future.
Which measure of dispersion measures the variability of a set of data about the mean of that data?
A Range
B Quartile deviation
C Standard deviation
D Covariance
C - Standard deviation
A high-standard deviation indicates greater variation in a data set.