Seminar 1 Flashcards
Name three types of central tendency (averages)
Mean, Median, Mode
Name three ways to measure the spread of a data set
Standard Deviation, Range, Variance
Which type of central tendency can you use for categorical variables?
The mode (the most frequent category)
You are comparing average customer satisfaction between two products. Can you rely on descriptive statistics to generalise your findings?
No. we can never generalise findings using descriptive statistics. This is particularly true when talking about continuous variables (i.e., real numbers).
Name the three stages of data analytics
Descriptive analytics, Predictive analytics, Prescriptive analytics
Describe Descriptive analytics.
Basically inferential statistics. Analysing historical data (events that have already taken place)
Describe Predictive analytics
Building mathematical, computational, and statistical models to make predictions using existing data. In most cases we look at making numerical (regression) or categorical (classification) predictions.
Describe Prescriptive analytics
Building data-driven solutions to control, or change the outcome of an event (e.g., helping customers build credit, or directing salespeople on best targets for advertising merch).
A team of medical researchers is testing a new drug on hypertensive patients. They hypothesise that the drug will alter systolic blood pressure. Write out the null and alternative hypotheses.
Null hypothesis (there is no difference in the means on a population level, i.e., significant difference. Denoted H0).
Alternative hypothesis (there is a difference in the means. Denoted H1)
H0: There is no effect of drug on systolic BP
H1: There is an effect of drug on systolic BP
Can you generalise means and use them to gain conclusions?
No. We cannot generalise using the means. If there is a large difference we need to un some formal tests first.
What is random sampling?
Randomly sample from the population. Simple, yet risks overrepresentation of certain groups, unequal group sizes.
What is stratified sampling?
Allows for control of group sizes by sampling based on said groups (e.g., sex, profession, etc.). Needs careful planning as frequencies may lead to biased representation of otherwise smaller groups by overrepresenting their importance.
What is clustered sampling?
Usually based on geography and proximity. E.g., sampling from a local hospital, rather than all hospitals.
What is systematic sampling?
Taking every kth member. Especially useful in industrial domains where order may matter.
What is the difference between a parameter and a statistic?
Parameters are summaries of population data. Statistics are summaries of sample data. In many places there are used interchangeably (e.g., in data science).