DSE1101 Flashcards
What is a variable
characteristics observed in a study.
When does variable become categorical
U
observation belongs to a set of categories.
When does variable become quantitative
observations take on numerical values that represent different magnitudes
What is also called independent variable
Explanatory variable
What is also called dependent vairable
Response variable
What is mean
“average, is one way to measure the center
of a distribution.”
What is sample mean
The sample mean is a sample statistics and serve as a point estimate of the population mean.
What kind of variable does histogram show/
distribution of a continuous variable.
What is modality
associated with the numner of peaks your data have. If have one peak, only talking about a general pattern and data is called unimodal.
What is unimodal?
1 peak
What is 2 peaks
bimodal
What is more than 2 peaks
multimodal data
What is it called when all have same peask
uniform data
Where is the peak on negatively skewed data
“Long tail on left
Peak on right”
Give an example of negatively skewed data
“GPA
Age of death”
What is the peak on positively skewed data
“Longer tail on right
Peak on left”
If question ask wheterh left or right skewed, do we remove outliers first?
Yes
When you find data of some people who spend $1000 in super market, is it an error?
No, take them aside to be analysed separately
Why use median over mean?
More robust to outliers
What is the cons of using median
“MEAN IS EASIER TO COMPUTE THAN MEDIAN, REQUIRE MORE COMPUTING POWER
No need to sort”
If question ask wheterh left or right skewed, do we remove outliers first?
YES
If distribution is skewed or has some extreme values, where is the center
median
If distribution is left skewed, where is median in relation to mean
“mean smaller than median
Median is always closer to the PEAK”
What is variance?
the average squared deviation from the sample mean.
What is the formula for variance?
Why we dont use absolute value but square for variance
less computatoinal power, get rid of negative value
What is the interquartile range
Q1 to Q3
Where does the whiskers of box plot extend up to
1.5 x IQR away from lower and upper quartile
What is tukey rule
outliers are values more than 1.5 times the IQR from the quartiles — either below Q1 - 1.5IQR, or above Q3 + 1.5IQR.
Where are outliers?
more than 1.5 times the IQR from the quartiles — either below Q1 - 1.5IQR, or above Q3 + 1.5IQR.
What are robust statistics for variance
Median and IQR
What to do to extremely skewed data?
natural log transformation
horizontal axis of histogram is ____
discrete
What is denoted by omega
sample space
What does a probability model describe
the uncertainty of a random process.
What is an outcome
mutually exclusive and collectively exhaustive results of a random process.
What is an event
collection of one or more outcomes. It is a subset of the sample space.
What is the probaility distribution ?
lists all possible outcomes and the probabilities with which each of them occurs.
What is cumulative probability distribution
“probability that a variable is less than or equal to a particular value.
P(X<=2)”
What is disjoint outcomes?
cannot happen at the same time
What does it mean for 2 variables to be independent
occurrence of B provides no information about A.