Data analytics test Flashcards

Question 1

Q

Name a measure of central tendency that is robust to outliers.

Question 2

Q

Name a measure of spread that is sensitive to outliers.

Answer

A

Standard deviation

Question 3

Q

A researcher found the mean and median of a sample were approximately equal. What (if anything) can you say about the distribution? Select one of the below.
(i) It is skewed left
(ii) It is skewed right
(iii) It is symmetric
(iv) Nothing, there is not enough information given.

Answer

A

(iii) It is symmetric

Question 4

Q

P(A)=0.3,P(B)=0.4andP(A∪B)=0.7WhatcanyousayaboutAand
B? Select one of the below.
(i) A and B are independent
(ii) A and B are mutually exclusive
(iii) A and B are both independent and mutually exclusive
(iv) A and B are neither independent nor mutually exclusive

Answer

A

(iv) A and B are neither independent nor mutually exclusive

Question 5

Q

P(A) = 0.25,P(B|A) = 0.5 and P(B) = 0.5 what is P(A∩B)? Select one of the
below.
(i) 0.5
(ii) 0.125
(iii) 0.25
(iv) there is not enough information given

Answer

A

(ii) 0.125

P(B|A) = P(A∩B) / P(A)
Rearranging the formula, we get:
P(A∩B) = P(B|A) * P(A)
Substituting the given values, we have:
P(A∩B) = 0.5 * 0.25 = 0.125

Question 6

Q

A menu has 3 options for starter, 2 for main and 3 for dessert. How many meal
selections are possible, eating in this order?

Answer

A

3 x 2 x 3 = 18

Question 7

Q

In how many ways can a shopper visit 6 shops, visiting each shop once?

Answer

A

6! = 6 * 5 * 4 * 3 * 2 * 1 = 720

Question 8

Q

Consider the following events A, B and C. If P(A) = 0.3,P(B) = 0.5, P(C)=0.2,P(A∪B)=0.8andP(B∪C)=0.6. Whichpairsofeventsare mutually exclusive? Select one of the below.
(i) A and B
(ii) B and C
(iii) A and C
(iv) All the events are mutually exclusive
(v) None of the events are mutually exclusive

Answer

A

(v) None of the events are mutually exclusive

Question 9

Q

A random variable X has the following probability distribution
x 1 2 3 4 5
P(X =x) ?, 0.1, 0.3, 0.2, 0.1 Find the missing probability, P (X = 1).

Answer

A

We are given the values of P(X = 2, 3, 4, 5)
All of them added together are supposed to equal 1

P(X = 1) + 0.1 + 0.3 + 0.2 + 0.1 = 1

P2…+P5=0.7

1-0.7=0.3

P(1)=0.3

Question 10

Q

A random variable, X, is normally distributed with mean μ and standard deviation σ. According to the empirical rule, what statement can be made about the range of values, μ − 3σ < X < μ + 3σ?

Answer

A

Therefore, the statement that can be made about the range of values μ − 3σ < X < μ + 3σ is that approximately 99.7% of the data falls within this range. In other words, for a normal distribution with mean μ and standard deviation σ, almost all of the data (99.7%) is expected to fall within three standard deviations of the mean.

Question 11

Q

An online retailer wishes to know, which social network generates the most return. To measure this, advertising campaigns were run through their facebook, twitter and snapchat accounts and sales generated were recorded. The length of time customers spent on the site was also recorded.
The amounts of the sales sampled are displayed in the figure below, where a separate boxplot is drawn for the observations from each of the three social networks. Descriptive statistics of the sale amounts sampled from each social network are also provided.

Facebook: 82.50-(Mean) 22.69-(Standard deviation)
Twitter: 82.00(Mean) 9.52(Standard deviation)
Snapchat: 69.00(Mean) 22.07(Standard deviation)

Identify each of the following variables as either qualitative or quantitative.
(i) the sale amount
(ii) the social network
(iii) the length of time spent on the site.

Answer

A

(i) Quantitative
(ii) Qualitative
(iii) Quantitative

Question 12

Q

The recorded sales amount (in e) for the sample of sales obtained from the twitter advertising campaign are as follows:
70, 75, 76, 81, 83, 94, 95.

(i) Verify that the sample mean for the twitter advertising campaign’s sales is
x = 82.00, as reported in the table, showing all workings.

Answer

A

70, 75, 76, 81, 83, 94, 95.

The sample mean is calculated by summing up all the observations and dividing by the sample size:

x = (70 + 75 + 76 + 81 + 83 + 94 + 95) / 7 = 80

So the sample mean for the twitter advertising campaign’s sales is not 82.00, but rather 80.

Question 13

Q

The recorded sales amount (in e) for the sample of sales obtained from the twitter advertising campaign are as follows:
70, 75, 76, 81, 83, 94, 95.

(ii) Verify that the sample standard deviation the twitter advertising campaign’s sales is s = 9.52, as reported in the table, showing all workings.

Answer

A

To calculate the sample standard deviation, we need to follow these steps:

Calculate the sample mean:
x = (70 + 75 + 76 + 81 + 83 + 94 + 95) / 7 = 574 / 7 = 82

Subtract the sample mean from each observation and square the differences:
(70 - 82)^2 = 144
(75 - 82)^2 = 49
(76 - 82)^2 = 36
(81 - 82)^2 = 1
(83 - 82)^2 = 1
(94 - 82)^2 = 144
(95 - 82)^2 = 169

Calculate the sum of the squared differences:
144 + 49 + 36 + 1 + 1 + 144 + 169 = 544

Divide the sum of the squared differences by n - 1, where n is the sample size:
s^2 = 544 / (7 - 1) = 90.67

Take the square root of the sample variance to get the sample standard deviation:
s = sqrt(90.67) = 9.52

Therefore, we have verified that the sample standard deviation for the twitter advertising campaign’s sales is s = 9.52, as reported in the table.

Question 14

Q

Describe and compare the distributions observed for the sale amount for each of the advertising campaigns. Include comments about central tendency, spread and the shape of the distributions, referencing the boxplots and the descriptive statistics provided.

Answer

A

Facebook:
- The mean sale amount is 82.50, which is slightly higher than the median, right-skewed distribution, a significant spread of values.
The boxplot shows several outliers on the upper end of the distribution, which could affect the mean value.

Twitter:
The mean sale amount is 82.00, which suggests a relatively symmetrical distribution.
The IQR is relatively small, indicating a relatively narrow spread of values.
The boxplot does not show any outliers,
indicating a relatively consistent distribution.

Snapchat:
The mean sale amount is 69.00, which is lower than the median, suggesting a slightly left-skewed distribution.
The IQR is relatively large, this indicates a significant spread of values.
The boxplot shows several outliers on the lower end, which could potentially affect the mean value.

Distribution for Facebook appears to be the most spread out and skewed.

Twitter is the most consistent and symmetrical.

Snapchat is also spread out, but skewed in the opposite direction compared to Facebook?

Question 15

Q

The correlation coefficient between sale amount and time spent on the site for this sample was calculated to be r = 0.89. Interpret the meaning of this value in terms of this application.

Answer

A

The longer a customer spends on the site, the more likely they are to make a purchase, and the higher the purchase amount tends to be.

Question 16

Q

Briefly explain the difference between a parameter and a statistic. Give an example of each to illustrate the difference.

Answer

Study These Flashcards

A

A parameter: a numerical value that describes a population eg: Population mean
statistic: a numerical value that describes a sample eg: sample mean

Question 17

Q

A lecturer wishes to take a sample of size 20 from a large class. The lecturer is short of time and considers choosing the first 20 students that turn up to class.
(i) What is this type of sampling called? [1 marks]
(ii) Give one reason why this way of taking a sample is problematic.

(c) Alternatively, the lecturer instead considers emailing a survey to the entire class. State one disadvantage of this method and explain why it is a disadvantage.
(i) Give one reason as to why this method is an improvement.
(ii) Name a further method that is an improvement on the methods in (b) and (c) above.

Answer

Study These Flashcards

A

(i) convenience sampling.
(ii) bias, validity

(i) low response rates
(ii) Random sampling is an improvement as it ensures that every member of the population has an equal chance of being selected, which helps to reduce bias

Question 18

Q

Name a measure of central tendency that is sensitive to outliers

Answer

Study These Flashcards

A

mean

Question 19

Q

Name a measure of spread that is robust to outliers

Answer

Study These Flashcards

A

Interquartile Range section

Question 20

Q

If P(A)=0.2, P(B)=0.5 and P(A∪B)=0.6 What can you say about A and B? Select one of the below.
(i) A and B are independent
(ii) A and B are mutually exclusive
(iii) A and B are both independent and mutually exclusive
(iv) A and B are neither independent nor mutually exclusive

Answer

Study These Flashcards

A

(i) A and B are independent

Question 21

Q

You wish to calculate the probability of receiving 5 text messages today. You know that on average you receive 8 text messages per day. Which of the following distributions might you use?
(i) Normal
(ii) Poisson
(iii) Binomial
(iv) Uniform

Answer

Study These Flashcards

A

(Poisson)

Question 22

Q

What is the relationship between the standard deviation and the variance? Why do we prefer to use the standard deviation instead of the variance?

Answer

Study These Flashcards

A

Standard deviation is the square root of variance.

Question 23

Q

You are waiting for an important email which is due to arrive between 1pm and 2pm. You know that it is equally likely to arrive any time during the time period. Which of the following distributions might you use to calculate the probability that the email arrives between 1.10 and 1.25pm?
(i) Normal
(ii) Poisson
(iii) Binomial
(iv) Uniform

Answer

Study These Flashcards

A

uniform

Question 24

Q

Briefly explain the difference between Descriptive and Inferential Statistics. Give an example of each to illustrate the difference.

Answer

Study These Flashcards

A

Descriptive statistics: summarizes and describes data
Inferential statistics: uses sample data to make inferences about a population.

Data analytics test Flashcards

(24 cards)