Seminar 1 Flashcards

1
Q

Name three types of central tendency (averages)

A

Mean, Median, Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name three ways to measure the spread of a data set

A

Standard Deviation, Range, Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which type of central tendency can you use for categorical variables?

A

The mode (the most frequent category)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

You are comparing average customer satisfaction between two products. Can you rely on descriptive statistics to generalise your findings?

A

No. we can never generalise findings using descriptive statistics. This is particularly true when talking about continuous variables (i.e., real numbers).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Name the three stages of data analytics

A

Descriptive analytics, Predictive analytics, Prescriptive analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe Descriptive analytics.

A

Basically inferential statistics. Analysing historical data (events that have already taken place)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe Predictive analytics

A

Building mathematical, computational, and statistical models to make predictions using existing data. In most cases we look at making numerical (regression) or categorical (classification) predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe Prescriptive analytics

A

Building data-driven solutions to control, or change the outcome of an event (e.g., helping customers build credit, or directing salespeople on best targets for advertising merch).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A team of medical researchers is testing a new drug on hypertensive patients. They hypothesise that the drug will alter systolic blood pressure. Write out the null and alternative hypotheses.

A

Null hypothesis (there is no difference in the means on a population level, i.e., significant difference. Denoted H0).

Alternative hypothesis (there is a difference in the means. Denoted H1)

H0: There is no effect of drug on systolic BP

H1: There is an effect of drug on systolic BP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Can you generalise means and use them to gain conclusions?

A

No. We cannot generalise using the means. If there is a large difference we need to un some formal tests first.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is random sampling?

A

Randomly sample from the population. Simple, yet risks overrepresentation of certain groups, unequal group sizes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is stratified sampling?

A

Allows for control of group sizes by sampling based on said groups (e.g., sex, profession, etc.). Needs careful planning as frequencies may lead to biased representation of otherwise smaller groups by overrepresenting their importance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is clustered sampling?

A

Usually based on geography and proximity. E.g., sampling from a local hospital, rather than all hospitals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is systematic sampling?

A

Taking every kth member. Especially useful in industrial domains where order may matter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the difference between a parameter and a statistic?

A

Parameters are summaries of population data. Statistics are summaries of sample data. In many places there are used interchangeably (e.g., in data science).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Name two types of qualitative data.

A

Categorical , Ordinal

17
Q

Describe Categorical data

A

No particular order to them. E.g., countries, eye colours, sex, etc.

18
Q

Describe Ordinal data

A

An order has been applied. E.g., Likert-scaled questions from surveys, countries by population size, products by amount sold, etc.

19
Q

Name two types of quantitative data

A

Discrete, Continuous

20
Q

Describe Discrete data

A

Whole numbers (integers). E.g., number of emails sent.

21
Q

Describe Continuous data

A

Real numbers, such as temperature, weight, height, etc. There are an infinite number of real numbers even between two numbers such as 0 and 1.