Statistics Flashcards

1
Q

Which of the following correctly describes the formula for a line of best fit?

A y=bx
B y=(a+b)x
C y=a+bx
D y = a+b+x

A

C - y=a+bx

y is the dependent variable and x is the independent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

If 10% of the workforce earn £30,000 or less, 13% earn between £30,000 and £40,000, 42% earn between £40,000 and £50,000, 25% earn between £50,000 and £60,000, and 10% earn in excess of £60,000, what is the cumulative percentage earning less than £50,000?

A 35%
B 90%
C 10%
D 65%

A

D - 65%

Cumulative percentage earning less than £50,000 is 10% + 13% + 42% = 65%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Andy Nyman, a research analyst, collects monthly share price data for a company. How can this data be best categorised?

A Cross-sectional
B Continuous
C Discrete
D Categorical

A

B - Continuous

Share prices are continuous data, limited only in the equipment used to measure (or report) the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The best way to compare two variables is to use:

A A scatter diagram (scattergram)
B A histogram
C A pie chart
D A cumulative frequency graph

A

A - A scatter diagram (scattergram)

A scatter diagram plots one variable against another in order to determine whether there is any relationship between the two. The independent variable is always plotted on the x-axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The sample standard deviation of a series is:

A The square of the sample variance

B The sample variance divided by the number of values in the series

C The sample variance divided by the number of values in the series minus one

D The square root of the sample variance

A

D - The square root of the sample variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which of the following is true of scattergrams?

A They are used to represent the rate of change over time

B The vertical axis is the dependent variable

C The horizontal axis is the dependent variable

D The area under a scattergram represents the frequency

A

B - The vertical axis is the dependent variable

Scattergrams show the relationship between variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A commodity has annual inflation rates for the last five years as follows:
5% 11% -3% -9% 22%

What is the average rate of inflation?

A

4.65

The ‘average rate’ means the geometric mean.
First, multiply the rates together: 1.05 x 1.11 x 0.97 x 0.91 x 1.22 = 1.25512 Take the fifth root of this number: = 1.0465 Lastly subtract the 1: = 0.0465 or 4.65%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Calculate the mean and median of the following returns on 11 shares:
-1.2, 2.8, 3.1, 4.1, 4.3, 4.4, 4.5, 6.7, 6.7, 5.2, 12.2

A Mean = 4.4, median = 4.8
B Mean = 4, median = 4.4
C Mean = 4.8, median = 4.4
D Mean = 4.8, median = 6.7

A

C - Mean = 4.8, median = 4.4

The mean is the sum of all the returns divided by the number of yields:
(-1.2 + 2.8 + 3.1 + 4.1 + 4.3 + 4.4 + 4.5 + 6.7 + 6.7 + 5.2 + 12.2 ) / 11
= 4.8

The median is the central value after putting them all in numerical order:
-1.2 2.8 3.1 4.1 4.3 4.4 4.5 5.2 6.7 6.7 12.2
i.e. 4.4.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In a positively skewed distribution:

A The mode will be less than the median and mean

B The mean and median are equal, but less than the mode

C The mean and mode are equal, but greater than the median

D Mean, mode and median are equal

A

A - The mode will be less than the median and mean

A positively skewed distribution is not symmetrical. Its peak is to the left of the spread of data, and it has a gently sloping tail to the right. The peak represents the mode, the median comes next (the central value), and above this is the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which of the following is not true of bivariate linear regression?

A x is the independent axis
B a is the intersect with the x axis
C b is the coefficient of the gradient
D y is the dependent variable

A

B - a is the intersect with the x axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The inter-quartile range is:

A A measure of distribution dominated by extreme values

B A measure of distribution not dominated by extreme values

C The variance of the range

D A measure of central tendency

A

B - A measure of distribution not dominated by extreme values

The inter-quartile range includes only the SECOND and THIRD quartiles and is therefore not dominated by extreme values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Jane, an analyst, wants to present what percentage of a particular gilt issue is held by each category of investor, such as hedge funds, insurance companies, etc.

Which of the following would best represent this information?

A Scattergram
B Histogram
C Tubularogram
D Pie Chart

A

D - Pie Chart

A pie chart shows relative percentages and is ideal for situations where the total adds up to 100%, such as the asset allocation of a portfolio, or the example in the question.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Given the following information regarding employees’ salaries:

Salary (£,000s) - Number of employees
Less than 30 - 15
31 to 40 - 25
41 to 50 - 35
51 to 60 - 15
61 or above - 10

What is the cumulative percentage of employees with an annual salary less than or equal to £50,000?

A 25%
B 35%
C 75%
D 80%

A

C - 75%

The total number of employees is 100. The total number of employees with a salary below £50,000 is 15 + 25 + 35 = 75. The cumulative percentage is therefore 75 / 100 = 0.75 or 75%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

An analyst wishes to plot a graph showing rates of change of share prices over time. Which of the following graphs would be the most appropriate?

A Scatter diagram
B Lorenz curve
C Pie chart
D Semi-log graph

A

D - Semi-log graph

Semi-log graphs (or log graphs) illustrate the RATE OF CHANGE over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Calculate the median of the following returns achieved by seven fund managers:
2.0% 3.2% 5.0% 6.6% 2.0% 3.3% 6.2%

A 2.0
B 3.3%
C 6.6%
D There is no median

A

B - 3.3%

The median is the value of the central number in an array of data. First put the data in numerical order (an array), and then find the item in the centre:
2.0 2.0 3.2 3.3(median) 5.0 6.2 6.6
NB: 2.0 is the mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the most frequently occurring value in a set of data?

A The mean
B The median
C The mode
D The standard deviation

A

C - The mode

The mode is the most frequently occurring number in a set of data. There might be more than one mode or perhaps none at all.

17
Q

The total GDP of four economies is £200m, which is made as follows: UK £100m, US £50m, France £30m, and Spain £20m. If the information were to be displayed as a pie chart, what would be the respective angles?

A 50, 25, 15, 10
B 180, 90, 74, 36
C 180, 90, 54, 36
D 100, 50, 30, 20

A

C - 180,90,54,36

The angles needed are assessed as follows:
GDP of country / total GDP x 360 degrees.
So the angles for each country can be assessed as follows:
UK
100 / 200 x 360 = 180 degrees.
US
50 / 200 x 360 = 90 degrees.
France
30 / 200 x 360 = 54 degrees.
Spain
20 / 200 x 360 = 36 degrees.

18
Q

How are values derived from a bar chart?

A Area of the bar
B Height multiplied by width
C Width of bars
D Height of bars

A

D - Height of bars

Bar charts are not to be confused with histograms (which appear very similar to bar charts) where frequency is determined from the AREA of the bar.

19
Q

A disadvantage of using the mode as a statistical tool is:

A It is possible to have more than one mode

B Modes are distorted by extremely high values

C Modes are distorted by extremely low values

D The mode might not be an actual value that appears in the data

A

A - It is possible to have more than one mode

The mode is the most frequently occurring value in a stream of data. There may be any number of values all occurring equally as often, which means that there may be more than one mode. If there are too many modes, then their usefulness as a central tendency becomes questionable.

20
Q

Which of the following are advantages of the geometric mean over the arithmetic mean?

I It is less affected by extreme observations

II It is intuitively better when used for quantities that increase at a constant rate over time

III It can always be used when observations are negative

A I only
B I and II
C II and III
D I, II, and III

A

B - I and II

A geometric mean measures the RATE OF CHANGE of its constituents. This makes it less affected by extreme values. Handling negative values is not an advantage of geometric means over arithmetic - both can handle negative values.

21
Q

Market research companies use which type of sampling technique?

A Action sampling
B Number sampling
C Quota sampling
D Taste sampling

A

C - Quota sampling

Quota sampling is a popular type of non-random sampling.

22
Q

When R squared is low:

A The skilled user will have to question the validity of the model being used

B It shows improvement in the accuracy of the model being used

C Errors (observations away from the line of best fit) are low

D The observations are generally close to the line of best fit

A

A - The skilled user will have to question the validity of the model being used

A low R-squared shows poor predictive power of any model. R-Squared is a coefficient of determination and gives us an impression of the accuracy of our forecasts. It ranges from 0 to 100 where the higher the number, the more accurate the predictive power.

23
Q

An analyst working for your firm has worked very hard on historical trading data and identified a strategy that would have delivered abnormal returns based on the data analysed. When the strategy is applied, however, the strategy does not seem to work.
The quantitative analyst is most likely to have engaged in:

A Data anomaly
B Data dependency
C Data incursion
D Data mining

A

D - Data mining

Data mining is the use of large amounts of information to try to discover relationships. The technique can be a valuable way to identify previously hidden causation, but the large scale indiscriminate processing of data means that correlations may be purely coincidental. Just because a particular trading strategy would yield abnormal returns on a particular set of securities at a particular period in history does not necessarily mean that the same strategy will yield abnormal returns in the future.

24
Q

Which measure of dispersion measures the variability of a set of data about the mean of that data?

A Range
B Quartile deviation
C Standard deviation
D Covariance

A

C - Standard deviation

A high-standard deviation indicates greater variation in a data set.

25
Q

Regarding modern portfolio statistics, which of the following best describes R squared?

A A fund’s sensitivity in relation to a benchmark index

B The difference between a fund’s actual and expected performance

C The risk adjusted performance of a fund

D The percentage of a fund’s movement that is explained by the benchmark’s movement

A

D - The percentage of a fund’s movement that is explained by the benchmark’s movement

R squared ranges from 1 to 100, and reflects the percentage. An R squared of 100 means that the benchmark movements explain all the movements in the fund.

26
Q

In bivariate regression, which of the following is true if the independent and dependent variables are negatively correlated?

A The intersect on the y-axis will be positive

B The intersect on the y-axis will be negative

C The correlation coefficient will be 1

D The slope of the line will be negative

A

D - The slope of the line will be negative

It is not possible to ascertain whether the intercept will be positive or negative from the information given. However, the gradient of the line (coefficient b) will always be negative.

27
Q

When evaluating the usefulness of a Beta, R squared:

A Is irrelevant as long as there are enough observations to be statistically significant

B Needs to be as low as possible

C Needs to be negative or greater than 1

D Needs to be as high as possible

A

D - Needs to be as high as possible

R-Squared is a coefficient of determination and gives us an impression of the accuracy of our forecasts. It ranges from 0 to 100 where the higher the number, the more accurate the predictive power

28
Q

Which of the following is true regarding the mode?

A It can be easily distorted by extreme observations

B There can be more than one mode in any series

C It can be difficult to calculate

D It is a measure of dispersion

A

B - There can be more than one mode in any series

The mode is the most frequently occurring number in a set of data. However, one of its limitations as a measure of central tendency is that there can be more than one mode in a data series.

29
Q

The percentage price change of nine shares over the last two years is as follows:
20% 91% -10% 14% 28% 32% 45% 12% 85%

What is the median price change?

A

28.0

The median is the central value of a stream of data. The first thing to do is to place all the returns in numerical order:
-10% 12% 14% 20% 28% 32% 45% 85% 91%
Then take the central value, in this case 28%.

30
Q

A sample of 400 people were surveyed for their preference in cheese. The results were: Cheddar - 180; Gloucester - 120; Stilton - 100.

If the data were to be presented on a pie chart, what angle would represent those who prefer Gloucester?

A

108

The proportion of Gloucester fans is 120/400 = 0.3 or 30%. There is a total of 360 degrees in a circle. Gloucester fans would be represented by an angle of 108 degrees (0.3 x 360).

31
Q

Calculate the arithmetic mean for the following series of equity returns:

8% 9% -6% 3% 12% -25%

A

0.17

Simply add up all of the returns and divide by the total number of returns:
(8 + 9 - 6 + 3 + 12 - 25) / 6
= 1 / 6
= 0.166 or 0.17

32
Q

Which of the following statements apply to the arithmetic mean?

A It might be distorted by outliers
B It is never discrete
C It is always discrete
D It is the same as the mode

A

A - It might be distorted by outliers

The arithmetic mean is only a good indication of a typical value from a set of data that is grouped nicely together. Values that are extremes (outliers) will pull the mean towards them and away from the typical value.

33
Q

A share price rises by 10% in the first week, 15% in the second, 20% in the third and 25% in the fourth week after issue. What is the geometric mean for the four weeks?

A

17.37

Multiply the increases together:
1.1 x 1.15 x 1.20 x 1.25 = 1.8975
Take the fourth root of this number:
= 1.1737
Finally subtract the 1:
= 0.1737 or 17.37%.

34
Q

Six firms have earnings growth predicted by a financial analyst as follows:

1%, 30%, -20%, -5%, 19%, 10%

What is the geometric mean of their predicted earnings growth?

A 10.14%
B 5.83%
C 4.55%
D 5.00%

A

C - 4.55%

First, multiply the estimates together: 1.01 x 1.3 x 0.8 x 0.95 x 1.19 x 1.1 = 1.3062 Take the sixth root of this number: = 1.0455 Subtract the 1: = 0.0455 or 4.55%.

35
Q

Which of the following are measures of dispersion?

I Range
II Median
III Standard deviation
IV Mean

A I, II, III and IV
B I, II and III
C I, II and IV
D I and III

A

D - I and III

The range is the difference between the highest and lowest values in the data; the standard deviation is the average distance away from the mean.

36
Q

A security has achieved the following returns over the last seven years:
14% 6% 3% 9% 12% 22% 24%

What are the median and the geometric mean returns?

A 9 and 12.63
B 12 and 12.86
C 12 and 12.63
D 9 and 12.86

A

C - 12 and 12.63

The median is the central value after putting the data in numerical order: 3% 6% 9% 12% 14% 22% 24% 12% is the central value, or the median.
The geometric mean is calculated by multiplying all of the seven price relatives together, taking the seventh root and subtracting 1: 1.14 x 1.06 x 1.03 x 1.09 x 1.12 x 1.22 x 1.24 = 2.2987 7th root = 1.1263 Subtract the 1: = 0.1263 or 12.63%.

37
Q

Define what is meant by range:

A Maximum value less minimum value B Arithmetic mean of all values
C All values divided by two
D Difference between mean and median

A

A - Maximum value less minimum value

Hence, the range may be distorted by extreme values.