Data analytics test Flashcards
Name a measure of central tendency that is robust to outliers.
Median
Name a measure of spread that is sensitive to outliers.
Standard deviation
A researcher found the mean and median of a sample were approximately equal. What (if anything) can you say about the distribution? Select one of the below.
(i) It is skewed left
(ii) It is skewed right
(iii) It is symmetric
(iv) Nothing, there is not enough information given.
(iii) It is symmetric
P(A)=0.3,P(B)=0.4andP(A∪B)=0.7WhatcanyousayaboutAand
B? Select one of the below.
(i) A and B are independent
(ii) A and B are mutually exclusive
(iii) A and B are both independent and mutually exclusive
(iv) A and B are neither independent nor mutually exclusive
(iv) A and B are neither independent nor mutually exclusive
P(A) = 0.25,P(B|A) = 0.5 and P(B) = 0.5 what is P(A∩B)? Select one of the
below.
(i) 0.5
(ii) 0.125
(iii) 0.25
(iv) there is not enough information given
(ii) 0.125
- P(B|A) = P(A∩B) / P(A)
- Rearranging the formula, we get:
- P(A∩B) = P(B|A) * P(A)
- Substituting the given values, we have:
- P(A∩B) = 0.5 * 0.25 = 0.125
A menu has 3 options for starter, 2 for main and 3 for dessert. How many meal
selections are possible, eating in this order?
3 x 2 x 3 = 18
In how many ways can a shopper visit 6 shops, visiting each shop once?
6! = 6 * 5 * 4 * 3 * 2 * 1 = 720
Consider the following events A, B and C. If P(A) = 0.3,P(B) = 0.5, P(C)=0.2,P(A∪B)=0.8andP(B∪C)=0.6. Whichpairsofeventsare mutually exclusive? Select one of the below.
(i) A and B
(ii) B and C
(iii) A and C
(iv) All the events are mutually exclusive
(v) None of the events are mutually exclusive
(v) None of the events are mutually exclusive
A random variable X has the following probability distribution
x 1 2 3 4 5
P(X =x) ?, 0.1, 0.3, 0.2, 0.1 Find the missing probability, P (X = 1).
We are given the values of P(X = 2, 3, 4, 5)
All of them added together are supposed to equal 1
P(X = 1) + 0.1 + 0.3 + 0.2 + 0.1 = 1
P2…+P5=0.7
1-0.7=0.3
P(1)=0.3
A random variable, X, is normally distributed with mean μ and standard deviation σ. According to the empirical rule, what statement can be made about the range of values, μ − 3σ < X < μ + 3σ?
Therefore, the statement that can be made about the range of values μ − 3σ < X < μ + 3σ is that approximately 99.7% of the data falls within this range. In other words, for a normal distribution with mean μ and standard deviation σ, almost all of the data (99.7%) is expected to fall within three standard deviations of the mean.
An online retailer wishes to know, which social network generates the most return. To measure this, advertising campaigns were run through their facebook, twitter and snapchat accounts and sales generated were recorded. The length of time customers spent on the site was also recorded.
The amounts of the sales sampled are displayed in the figure below, where a separate boxplot is drawn for the observations from each of the three social networks. Descriptive statistics of the sale amounts sampled from each social network are also provided.
Facebook: 82.50-(Mean) 22.69-(Standard deviation)
Twitter: 82.00(Mean) 9.52(Standard deviation)
Snapchat: 69.00(Mean) 22.07(Standard deviation)
Identify each of the following variables as either qualitative or quantitative.
(i) the sale amount
(ii) the social network
(iii) the length of time spent on the site.
(i) Quantitative
(ii) Qualitative
(iii) Quantitative
The recorded sales amount (in e) for the sample of sales obtained from the twitter advertising campaign are as follows:
70, 75, 76, 81, 83, 94, 95.
(i) Verify that the sample mean for the twitter advertising campaign’s sales is
x = 82.00, as reported in the table, showing all workings.
70, 75, 76, 81, 83, 94, 95.
The sample mean is calculated by summing up all the observations and dividing by the sample size:
x = (70 + 75 + 76 + 81 + 83 + 94 + 95) / 7 = 80
So the sample mean for the twitter advertising campaign’s sales is not 82.00, but rather 80.
The recorded sales amount (in e) for the sample of sales obtained from the twitter advertising campaign are as follows:
70, 75, 76, 81, 83, 94, 95.
(ii) Verify that the sample standard deviation the twitter advertising campaign’s sales is s = 9.52, as reported in the table, showing all workings.
To calculate the sample standard deviation, we need to follow these steps:
Calculate the sample mean:
x = (70 + 75 + 76 + 81 + 83 + 94 + 95) / 7 = 574 / 7 = 82
Subtract the sample mean from each observation and square the differences:
(70 - 82)^2 = 144
(75 - 82)^2 = 49
(76 - 82)^2 = 36
(81 - 82)^2 = 1
(83 - 82)^2 = 1
(94 - 82)^2 = 144
(95 - 82)^2 = 169
Calculate the sum of the squared differences:
144 + 49 + 36 + 1 + 1 + 144 + 169 = 544
Divide the sum of the squared differences by n - 1, where n is the sample size:
s^2 = 544 / (7 - 1) = 90.67
Take the square root of the sample variance to get the sample standard deviation:
s = sqrt(90.67) = 9.52
Therefore, we have verified that the sample standard deviation for the twitter advertising campaign’s sales is s = 9.52, as reported in the table.
Describe and compare the distributions observed for the sale amount for each of the advertising campaigns. Include comments about central tendency, spread and the shape of the distributions, referencing the boxplots and the descriptive statistics provided.
Facebook:
- The mean sale amount is 82.50, which is slightly higher than the median, right-skewed distribution, a significant spread of values.
The boxplot shows several outliers on the upper end of the distribution, which could affect the mean value.
Twitter:
The mean sale amount is 82.00, which suggests a relatively symmetrical distribution.
The IQR is relatively small, indicating a relatively narrow spread of values.
The boxplot does not show any outliers,
indicating a relatively consistent distribution.
Snapchat:
The mean sale amount is 69.00, which is lower than the median, suggesting a slightly left-skewed distribution.
The IQR is relatively large, this indicates a significant spread of values.
The boxplot shows several outliers on the lower end, which could potentially affect the mean value.
Distribution for Facebook appears to be the most spread out and skewed.
Twitter is the most consistent and symmetrical.
Snapchat is also spread out, but skewed in the opposite direction compared to Facebook?
The correlation coefficient between sale amount and time spent on the site for this sample was calculated to be r = 0.89. Interpret the meaning of this value in terms of this application.
The longer a customer spends on the site, the more likely they are to make a purchase, and the higher the purchase amount tends to be.