Assignments Met Chatgpt Flashcards

1
Q

What is the median and how is it found in a dataset?

A

The median is the middle value when a dataset is ordered from smallest to largest. If there is an even number of data points, it is the average of the two middle values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the mode in a dataset?

A

The mode is the value that appears most frequently in a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you calculate the mean absolute deviation (MAD)?

A

The mean absolute deviation is calculated by finding the average of the absolute differences between each data point and the mean of the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is variance and how is it calculated?

A

Variance measures the spread of data points around the mean. It is calculated by averaging the squared differences between each data point and the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is standard deviation and how is it related to variance?

A

Standard deviation is the square root of variance and indicates how much data points typically deviate from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you identify outliers using z-scores?

A

An outlier can be identified if its z-score is greater than 1.5 or less than -1.5, which indicates it is significantly different from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a z-score and how is it calculated?

A

A z-score represents the number of standard deviations a data point is from the mean, calculated as
𝑧=(𝑥−𝜇)/𝜎

, where x is the data point, 𝜇 is the mean, and 𝜎 is the standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can z-scores be used to infer the probability of data points in a normal distribution?

A

Z-scores can be used to determine the probability of data points falling below or above a certain value using the standard normal distribution table or functions like pnorm().

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the pnorm() function used for in statistics?

A

The pnorm() function is used to calculate the cumulative probability that a normally distributed random variable is less than or equal to a given value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What assumption must be made when using pnorm() for probability calculations?

A

The data must be assumed to follow a normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you create a histogram and what key statistics should be marked on it?

A

A histogram is created by plotting data points in bins to visualize frequency distribution. Key statistics to mark include the mean, median, and mode using vertical lines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the significance of plotting mean, median, and mode on a histogram?

A

Marking these values helps visualize the central tendency and symmetry of the data distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can you use statistical measures to summarize a dataset?

A

By calculating the mean, median, mode, variance, standard deviation, and mean absolute deviation, you can summarize the central tendency and variability of the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the standard error and how is it calculated for a sample mean?

A

The standard error measures the variability of the sample mean from the population mean. It is calculated as
SE= σ/ (root n)
, where
σ is the standard deviation and
n is the sample size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a confidence interval and how is it interpreted?

A

A confidence interval is a range of values that likely contains the population mean. For example, a 95% confidence interval means we are 95% confident that the interval contains the true mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you calculate a 95% confidence interval for a sample mean?

A

Use
Mean +/- z * SE
, where z is the critical value for the 95% confidence level (approximately 1.96 for a two-tailed test) and SE is the standard error.

17
Q

What is a critical z-value and when is it used?

A

A critical z-value is a threshold for deciding significance in a z-test, used to compare the sample mean against the population mean. For alpha = 0.05 (two-tailed), the critical z-value is approximately ±1.96.

18
Q

What is a one-sample t-test and when is it used?

A

A one-sample t-test is used to determine if the mean of a sample significantly differs from a known population mean, with results compared against critical t-values.

19
Q

How do you calculate the t-value for a one-sample t-test?

A

The t-value is calculated using
𝑡=(𝑥ˉ-𝜇) / (𝑠/ root n))
, where
𝑥ˉ is the sample mean,
μ is the population mean,
s is the sample standard deviation, and
n is the sample size.

20
Q

What does it mean if the sample t-value exceeds the critical t-value?

A

If the calculated t-value is greater than the critical t-value, we reject the null hypothesis, indicating a significant difference between the sample and population means.

21
Q

How do you perform an F-test for comparing variances between two samples?

A

An F-test compares the variances of two samples by calculating the ratio of their variances and comparing it to a critical value from the F-distribution.

22
Q

When should you use a two-sample t-test?

A

A two-sample t-test is used when comparing the means of two independent samples to see if they significantly differ, based on whether their variances are assumed equal or unequal (determined by an F-test).

23
Q

How do you interpret paired sample t-test results?

A

In a paired sample t-test, if the calculated t-value exceeds the critical t-value, it suggests a significant difference between two related sets of observations (e.g., measurements before and after an intervention).

24
Q

What function in statistical software can be used to cross-check manual t-test calculations?

A

The t.test() function in R or equivalent functions in other statistical software can be used for cross-checking t-test results.

25
Q

What is the purpose of setting a seed (e.g., set.seed(1234)) when generating samples in statistics?

A

Setting a seed ensures reproducibility by controlling the random number generation process, allowing others to replicate the same sample results.

26
Q

What does it mean if a sample mean differs significantly from the population mean in hypothesis testing?

A

It means that the observed difference is unlikely due to random chance, suggesting that the sample mean reflects a true difference.

27
Q

How do you check if sample means differ significantly from population means using z-values?

A

Calculate z-values for the sample means and compare them to the critical z-value (e.g., ±1.96 for alpha = 0.05). If the z-value is outside this range, the difference is significant.

28
Q
A