Assignments Met Chatgpt Flashcards
What is the median and how is it found in a dataset?
The median is the middle value when a dataset is ordered from smallest to largest. If there is an even number of data points, it is the average of the two middle values
What is the mode in a dataset?
The mode is the value that appears most frequently in a dataset.
How do you calculate the mean absolute deviation (MAD)?
The mean absolute deviation is calculated by finding the average of the absolute differences between each data point and the mean of the dataset.
What is variance and how is it calculated?
Variance measures the spread of data points around the mean. It is calculated by averaging the squared differences between each data point and the mean.
What is standard deviation and how is it related to variance?
Standard deviation is the square root of variance and indicates how much data points typically deviate from the mean.
How do you identify outliers using z-scores?
An outlier can be identified if its z-score is greater than 1.5 or less than -1.5, which indicates it is significantly different from the mean.
What is a z-score and how is it calculated?
A z-score represents the number of standard deviations a data point is from the mean, calculated as
π§
=
π₯
β
π
π
z=
Ο
xβΞΌ
β
, where
π₯
x is the data point,
π
ΞΌ is the mean, and
π
Ο is the standard deviation.
How can z-scores be used to infer the probability of data points in a normal distribution?
Z-scores can be used to determine the probability of data points falling below or above a certain value using the standard normal distribution table or functions like pnorm().
What is the pnorm() function used for in statistics?
The pnorm() function is used to calculate the cumulative probability that a normally distributed random variable is less than or equal to a given value.
What assumption must be made when using pnorm() for probability calculations?
The data must be assumed to follow a normal distribution.
How do you create a histogram and what key statistics should be marked on it?
A histogram is created by plotting data points in bins to visualize frequency distribution. Key statistics to mark include the mean, median, and mode using vertical lines.
What is the significance of plotting mean, median, and mode on a histogram?
Marking these values helps visualize the central tendency and symmetry of the data distribution.
How can you use statistical measures to summarize a dataset?
By calculating the mean, median, mode, variance, standard deviation, and mean absolute deviation, you can summarize the central tendency and variability of the dataset.
What is the standard error and how is it calculated for a sample mean?
The standard error measures the variability of the sample mean from the population mean. It is calculated as
π
πΈ
=
π
π
SE=
n
β
Ο
β
, where
π
Ο is the standard deviation and
π
n is the sample size.
What is a confidence interval and how is it interpreted?
A confidence interval is a range of values that likely contains the population mean. For example, a 95% confidence interval means we are 95% confident that the interval contains the true mean.