Data analytics test Flashcards

1
Q

Name a measure of central tendency that is robust to outliers.

A

Median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name a measure of spread that is sensitive to outliers.

A

Standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A researcher found the mean and median of a sample were approximately equal. What (if anything) can you say about the distribution? Select one of the below.
(i) It is skewed left
(ii) It is skewed right
(iii) It is symmetric
(iv) Nothing, there is not enough information given.

A

(iii) It is symmetric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

P(A)=0.3,P(B)=0.4andP(A∪B)=0.7WhatcanyousayaboutAand
B? Select one of the below.
(i) A and B are independent
(ii) A and B are mutually exclusive
(iii) A and B are both independent and mutually exclusive
(iv) A and B are neither independent nor mutually exclusive

A

(iv) A and B are neither independent nor mutually exclusive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

P(A) = 0.25,P(B|A) = 0.5 and P(B) = 0.5 what is P(A∩B)? Select one of the
below.
(i) 0.5
(ii) 0.125
(iii) 0.25
(iv) there is not enough information given

A

(ii) 0.125

  • P(B|A) = P(A∩B) / P(A)
  • Rearranging the formula, we get:
  • P(A∩B) = P(B|A) * P(A)
  • Substituting the given values, we have:
  • P(A∩B) = 0.5 * 0.25 = 0.125
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A menu has 3 options for starter, 2 for main and 3 for dessert. How many meal
selections are possible, eating in this order?

A

3 x 2 x 3 = 18

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In how many ways can a shopper visit 6 shops, visiting each shop once?

A

6! = 6 * 5 * 4 * 3 * 2 * 1 = 720

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Consider the following events A, B and C. If P(A) = 0.3,P(B) = 0.5, P(C)=0.2,P(A∪B)=0.8andP(B∪C)=0.6. Whichpairsofeventsare mutually exclusive? Select one of the below.
(i) A and B
(ii) B and C
(iii) A and C
(iv) All the events are mutually exclusive
(v) None of the events are mutually exclusive

A

(v) None of the events are mutually exclusive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A random variable X has the following probability distribution
x 1 2 3 4 5
P(X =x) ?, 0.1, 0.3, 0.2, 0.1 Find the missing probability, P (X = 1).

A

We are given the values of P(X = 2, 3, 4, 5)
All of them added together are supposed to equal 1

P(X = 1) + 0.1 + 0.3 + 0.2 + 0.1 = 1

P2…+P5=0.7

1-0.7=0.3

P(1)=0.3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A random variable, X, is normally distributed with mean μ and standard deviation σ. According to the empirical rule, what statement can be made about the range of values, μ − 3σ < X < μ + 3σ?

A

Therefore, the statement that can be made about the range of values μ − 3σ < X < μ + 3σ is that approximately 99.7% of the data falls within this range. In other words, for a normal distribution with mean μ and standard deviation σ, almost all of the data (99.7%) is expected to fall within three standard deviations of the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

An online retailer wishes to know, which social network generates the most return. To measure this, advertising campaigns were run through their facebook, twitter and snapchat accounts and sales generated were recorded. The length of time customers spent on the site was also recorded.
The amounts of the sales sampled are displayed in the figure below, where a separate boxplot is drawn for the observations from each of the three social networks. Descriptive statistics of the sale amounts sampled from each social network are also provided.

Facebook: 82.50-(Mean) 22.69-(Standard deviation)
Twitter: 82.00(Mean) 9.52(Standard deviation)
Snapchat: 69.00(Mean) 22.07(Standard deviation)

Identify each of the following variables as either qualitative or quantitative.
(i) the sale amount
(ii) the social network
(iii) the length of time spent on the site.

A

(i) Quantitative
(ii) Qualitative
(iii) Quantitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The recorded sales amount (in e) for the sample of sales obtained from the twitter advertising campaign are as follows:
70, 75, 76, 81, 83, 94, 95.

(i) Verify that the sample mean for the twitter advertising campaign’s sales is
x = 82.00, as reported in the table, showing all workings.

A

70, 75, 76, 81, 83, 94, 95.

The sample mean is calculated by summing up all the observations and dividing by the sample size:

x = (70 + 75 + 76 + 81 + 83 + 94 + 95) / 7 = 80

So the sample mean for the twitter advertising campaign’s sales is not 82.00, but rather 80.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The recorded sales amount (in e) for the sample of sales obtained from the twitter advertising campaign are as follows:
70, 75, 76, 81, 83, 94, 95.

(ii) Verify that the sample standard deviation the twitter advertising campaign’s sales is s = 9.52, as reported in the table, showing all workings.

A

To calculate the sample standard deviation, we need to follow these steps:

Calculate the sample mean:
x = (70 + 75 + 76 + 81 + 83 + 94 + 95) / 7 = 574 / 7 = 82

Subtract the sample mean from each observation and square the differences:
(70 - 82)^2 = 144
(75 - 82)^2 = 49
(76 - 82)^2 = 36
(81 - 82)^2 = 1
(83 - 82)^2 = 1
(94 - 82)^2 = 144
(95 - 82)^2 = 169

Calculate the sum of the squared differences:
144 + 49 + 36 + 1 + 1 + 144 + 169 = 544

Divide the sum of the squared differences by n - 1, where n is the sample size:
s^2 = 544 / (7 - 1) = 90.67

Take the square root of the sample variance to get the sample standard deviation:
s = sqrt(90.67) = 9.52

Therefore, we have verified that the sample standard deviation for the twitter advertising campaign’s sales is s = 9.52, as reported in the table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe and compare the distributions observed for the sale amount for each of the advertising campaigns. Include comments about central tendency, spread and the shape of the distributions, referencing the boxplots and the descriptive statistics provided.

A

Facebook:
- The mean sale amount is 82.50, which is slightly higher than the median, right-skewed distribution, a significant spread of values.
The boxplot shows several outliers on the upper end of the distribution, which could affect the mean value.

Twitter:
The mean sale amount is 82.00, which suggests a relatively symmetrical distribution.
The IQR is relatively small, indicating a relatively narrow spread of values.
The boxplot does not show any outliers,
indicating a relatively consistent distribution.

Snapchat:
The mean sale amount is 69.00, which is lower than the median, suggesting a slightly left-skewed distribution.
The IQR is relatively large, this indicates a significant spread of values.
The boxplot shows several outliers on the lower end, which could potentially affect the mean value.

Distribution for Facebook appears to be the most spread out and skewed.

Twitter is the most consistent and symmetrical.

Snapchat is also spread out, but skewed in the opposite direction compared to Facebook?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The correlation coefficient between sale amount and time spent on the site for this sample was calculated to be r = 0.89. Interpret the meaning of this value in terms of this application.

A

The longer a customer spends on the site, the more likely they are to make a purchase, and the higher the purchase amount tends to be.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Briefly explain the difference between a parameter and a statistic. Give an example of each to illustrate the difference.

A

A parameter: a numerical value that describes a population eg: Population mean
statistic: a numerical value that describes a sample eg: sample mean

17
Q

A lecturer wishes to take a sample of size 20 from a large class. The lecturer is short of time and considers choosing the first 20 students that turn up to class.
(i) What is this type of sampling called? [1 marks]
(ii) Give one reason why this way of taking a sample is problematic.

(c) Alternatively, the lecturer instead considers emailing a survey to the entire class. State one disadvantage of this method and explain why it is a disadvantage.
(i) Give one reason as to why this method is an improvement.
(ii) Name a further method that is an improvement on the methods in (b) and (c) above.

A

(i) convenience sampling.
(ii) bias, validity

(i) low response rates
(ii) Random sampling is an improvement as it ensures that every member of the population has an equal chance of being selected, which helps to reduce bias

18
Q

Name a measure of central tendency that is sensitive to outliers

A

mean

19
Q

Name a measure of spread that is robust to outliers

A

Interquartile Range section

20
Q

If P(A)=0.2, P(B)=0.5 and P(A∪B)=0.6 What can you say about A and B? Select one of the below.
(i) A and B are independent
(ii) A and B are mutually exclusive
(iii) A and B are both independent and mutually exclusive
(iv) A and B are neither independent nor mutually exclusive

A

(i) A and B are independent

21
Q

You wish to calculate the probability of receiving 5 text messages today. You know that on average you receive 8 text messages per day. Which of the following distributions might you use?
(i) Normal
(ii) Poisson
(iii) Binomial
(iv) Uniform

A

(Poisson)

22
Q

What is the relationship between the standard deviation and the variance? Why do we prefer to use the standard deviation instead of the variance?

A

Standard deviation is the square root of variance.

23
Q

You are waiting for an important email which is due to arrive between 1pm and 2pm. You know that it is equally likely to arrive any time during the time period. Which of the following distributions might you use to calculate the probability that the email arrives between 1.10 and 1.25pm?
(i) Normal
(ii) Poisson
(iii) Binomial
(iv) Uniform

A

uniform

24
Q

Briefly explain the difference between Descriptive and Inferential Statistics. Give an example of each to illustrate the difference.

A

Descriptive statistics: summarizes and describes data
Inferential statistics: uses sample data to make inferences about a population.