Statistics - Basics Flashcards

1
Q

What is meant by a population?

A

Every member of a defined group of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is meant by a sample?

A

A selection of a smaller number of people from a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the criterion for sample selection?

A

A sample must be representative of a population

Then it is possible to infer properties of the population from properties of the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is meant by inference?

A

Extrapolating a result from a sample to how it represents the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the population parameter and sample statistic?

A

A property of the population that is estimated is a population parameter

A value calculated from a sample is a sample statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is meant by a variable?

A

Data that vary in characteristics between patients

e.g. age and sex of patients with a particular disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is the data in a variable initially classified?

A

It is either categorical or numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are categorical variables?

A

Variables that can only be assigned to a number of distinct categories

e.g. sex, blood type, severity of symptoms (absent, mild, severe)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can categorical variables be further divided?

A

Nominal or ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are nominal variables?

A

They are assigned a category with no natural ordering

e.g. sex

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are ordinal variables?

A

Ordinal variables have categories that are ordered

e.g. pain - absent, mild or severe

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are numerical variables?

A

They are variables that take a numerical value

e.g. age, number of siblings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can numerical variables be further divided?

A

Discrete or continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a discrete variable?

A

A variable that only takes whole numerical values

e. g. -1, 0, 1, 2, etc.
e. g. the number of hospital episodes a patient has experienced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a continuous variable?

A

A variable that has no limitation on values

e.g. weight, age

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why is age a continuous variable?

A

The last birthday in years is often recorded

Not the age to the year, month, day and even second

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does a frequency table record?

A

The frequency distribution

This is a description of the manner in which values of a variable are scattered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a frequency?

A

The number of times each value, or range of values, is counted up and recorded

19
Q

What is a relative frequency?

A

The percentage occurrence of a variable, compared to the others

20
Q

What is the difference in how a categorical and a numerical variable is displayed in a frequency table?

A

Categorical - the frequency of each category is counted (e.g. blood type A)

Numeric - a range of values is used for each row of the table (e.g. age 0-5)

21
Q

What is a simple method for displaying frequency data?

When is this method appropriate?

A

Bar chart

Appropriate for categorical and discrete metric values

22
Q

How are the bars drawn on a bar chart?

A

A bar is drawn for each category with length proportional to frequency in that category

The bars are separated by small gaps to indicate that the variable is categorical or discrete

23
Q

What is the usual x and y axis value on a bar chart?

A

x - category (e.g. blood type)

y - frequency

24
Q

When are pie charts used to display data?

How do they do this?

A

They are used for categorical variables

The pie is split into sections with the area of each section proportional to the frequency of the corresponding category

25
Q

What type of data are histograms used to present?

A

The frequency distribution of continuous variables

26
Q

How must the data be arranged before a histogram can be drawn?

A

The variable’s range is split into classes

The classes are used to group the values of the variable so that a frequency table can be composed

27
Q

How are the bars drawn in a histogram?

A

There are no gaps between the bars

The area of the bar is proportional to the frequency

28
Q

what is the height of a histogram bar proportional to?

A

the frequency divided by the class width

29
Q

How is the mean of a sample calculated?

A

Summing all the values and dividing by the total number of values

30
Q

What is the median?

A

The value below which half the distribution lies

31
Q

How is the median calculated in a sample with an odd or even number of samples?

A

odd number of samples - median is the middle value of the sorted sample

even number of samples - median is the value midway between the middle 2 values

32
Q

What is the range of a variable?

A

The range of a variable is defined by the smallest and largest value in the sample

33
Q

Rather than using the range, what is preferred in larger samples?

A

Other measures of spread, showing the spread for the majority of individuals

34
Q

What does the standard deviation measure and what type of data is it used with?

A

It is the most widely used measure of spread

It is only appropriate for metric data

35
Q

what can standard deviation be loosely interpreted as?

A

as a measure of the average distance of all the data values from the mean

36
Q

What does a large value for standard deviation suggest?

A

The larger the value, the further away the values are collectively from the mean

The data values are more spread out

37
Q

How is standard deviation calculated?

A
  1. calculate the mean
  2. subtract the mean from every value
  3. square each value obtained and add these together to get the sum of the squares value
  4. divide the ‘sum of the squares’ by the ‘total number of values in the sample minus 1’

THIS IS THE VARIANCE

  1. Take the square root of the variance
38
Q

When is the interquartile range used?

A

It is a measure of spread that can be used with the median

39
Q

How are the lower quartile (Q1) and the upper quartile (Q3) defined?

A

Q1 - the value for which 25% of individuals have values that are less

Q3 - the value for which 75% of individuals have values that are less

40
Q

How are the quartile values determined?

A
  1. use median to divide ordered data into 2 halves
  2. when calculating Q1 and Q3, do NOT include the median value
  3. Q1 is the median of the lower half of the data, and Q3 is the median of the upper half of the data
41
Q

What is the interquartile range (IQR)?

A

The difference between Q1 and Q3

Between these 2 values, half of the distribution lies

42
Q

For what types of data can the IQR be used?

A

It is used as a measure of spread for continuous and discrete metric variables

It can also be used for ordered variables (ordinal)

43
Q

What data is shown on a box plot?

A

The central line of the box is the median

The upper limit of the box is Q3

The lower limit of the box is Q1

The extensions of the line coming out of the box show the range

44
Q

How can an outlier be identified from a box plot?

A

It is a data plot that is outside of the range