lecture 4 - data types, collection, management and exploratory data analysis Flashcards

1
Q

numeric discrete data -

A

numbers are regularly gapped

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

numerical continuous data

A

numbers can take any value in a continuum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

categorical nominal data

A

categories are arbitrary and can be recorded

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

categorical ordinal

A

the order of the categories are important

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is the goal of explatory data analysis

A

describe dats numerically and visualise it graphically

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what are the things used to describe data

A
  1. centre - bulk of the data
  2. spread - consistency of data
  3. shape - symmetrical or skewed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the goal of summary statistics

A

convey as much information about the data in as few numbers as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

median

A

midpoint of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are the measurements of spread?

A
  1. percentiles, range and interquartile range
  2. variance and standard deviation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

pth percentile

A

value at which p% of the data is less than or equal to

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Q1

A

25th percentile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Q2

A

50th percentile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Q3

A

75th percentile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what does a 5 number summary contain

A

min, Q1, median, Q3, max

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

variance

A

measure of the amount the data is spread around the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

sample standard deviation

A

positive square root of the sample variance

17
Q

what are the 2 measures of shape of the data

A
  1. skewness - symmetry of the data
  2. kurtosis - peakedness or tailedness of the data
18
Q

probability

A

the likelihood of an event occurring

19
Q

discrete probability distribution

A

model discrete data

20
Q

continuous probability distribution

A

to model continuous data

21
Q

examples of discrete probability distribution

A

Bernoulli distribution, binominal distribution, negative binomial distribution

22
Q

examples of continuous distribution

A

normal distribution, gamma distribution

23
Q

why are normal distribution curves important

A
  1. a lot of random variables do follow a normal distribution
  2. a lot of random variables can be approximated as following a normal distribution
  3. the normal distribution has a special property that makes it very useful
24
Q
A