Intro to data science (wk 2) Flashcards

1
Q

What is a population?

A

A complete subset of objects e.g. all UG students

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a sample?

A

A subset of a given population e.g. a group of students in this module out of all UG students

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a variable?

A

A variable is a set of related events that can take in more than one value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an independent variable?

A

-The IV is the variable representing the value being changed or manipulated.
-It’s controlled or selected to determine its relationship on an observed outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a dependent variable?

A

-The DV is the observed result of the IV being manipulated.
-It is something that (may) depend on the IV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Levels of IVs

A

-An IV can be composed of different categories
-These are called levels, conditions or treatments
-This is different from the number of IVs- you only belong to one level, but have multiple IVs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a control variable?

A

-Variables that are kept constant to prevent them influencing the effect of IV on DV
-Critical for study design (e.g. recruitment criteria for participants)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is nominal data (categorical)?

A

-Cannot be ordered, cannot be counted e.g. country, gender, occupation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is ordinal data?

A

-Can be ordered, but cannot be added or subtracted e.g. satisfaction rating, education level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is interval data?

A

-Can be ordered, and their difference can be measured, but you cannot compute a ratio between two value (no meaningful zero exists) e.g. exam date

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is ratio data?

A

-Interval and can take ratio between two (has a meaningful zero) e.g. distance, height, annual income

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What 2 data sets are qualitative and categorical

A

Nominal and ordinal values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What 2 data sets are quantitative and continuous

A

Interval and ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a histogram?

A

Histograms visualises how data is distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the 3 central tendencies?

A

Mode, median, mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the mode?

A

-There can be multiple modes
-Can be used for all types of variables, and often used for nominal and ordinal variables

17
Q

What is the median?

A

-Median cannot be obtained for nominal variables, it can be obtained only on ordered variables (i.e. ordinal, interval and ratio)

18
Q

What is the mean?

A

Mean can only be defined in interval and ratio variables

19
Q

What is the 2nd moment?

A
20
Q

What is the 3rd moment/ skewness

A

-Skewness is another parameter informing the shape of distribution, other than mean and variance
-Zero skewness means data are symmetrically distributed
-High skewness means distribution is highly asymmetrical
-Positive/negative skewness indicates which direction data are skewed

21
Q

What is the 4th moment?

A
22
Q

What is kurtosis?

A

-Kurtosis means the sharpness
-Always positive by definition but normally be subtract 3 - this is called the excess kurtosis
-Kurtosis -> how thin

23
Q

What are the 2 popular methods to detect outliers?

A
  1. Based on z-score -> outlier is more than 3 or less than -3
  2. Based on IQR (inter-quartile range) -> width between the 1st and 3rd quartile. Is an outlier of the value is greater than 1.5 IQR above the 3rd quartile, or smaller than 1.5 IQR below the 2nd quartile.
24
Q

What is a box plot?

A

-Box plot is a plot summarising quartile-based statistics of a data set
-A box plot includes:
1. Location of quartiles
2. Range of data excluding outliers
3. Outliers detected by quartile