Statistics Flashcards

1
Q

What are the two main types of data

A

quantitative

qualitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is ordinal data

A

the data can be given a meaningful order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is nominal data

A

there is no relationship that is meaningful in terms of order of the categories ie. it is just name e.g. atkins diet and paleo diet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is binomial data

A

there are only two options e.g. yes or no

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a random sample

A

one in which each member of the population has an equally likely non zero chance of being included

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is a stratified sample

A

one in which certain categories of the population must be represented e.g. if we know the library is 50 percent history books, 30 percent science and 20 percent others. in a sample of 20 we must select 10 history books, 6 science and 4 others.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is a convienience sample

A

one that is not chosen randomly but is all that is available eg. all patients at an outpatient dermatology clinic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

when would you use a bar or pie chart

A

categorical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

when would you use histograms, stem and leaf plots and box and whisker plots

A

to visualise continuous data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what does a scatter plot show

A

the relationship between two variable and how one changes in relation to the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

when would you use the mean and when would you use the median to describe the centrality of data

A

mean - normal distriuted not skewed data
media- if data is more skewed or significant outlier
mode- used for qualitative data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what do you do differently when calculating the sample variance/sd as opposed to the population

A

use n-1 as the denominator instead of n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what does the standard deviation show

A

the spread of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what does positively skewed mean

A

that more of the values are clusted towards the bottome of the scale - such as alcohol intake

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is negatively skewed

A

most of the values are clustered at the higher range of the scale - rare in clinical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is the coefficient of skewness

A

a value which shows how skewed the data is - the closes to 0 the more symmetrical the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what does a value of 0 for the kurtosis mean

A

indicates that the shape of the data is close to the normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is inference

A

making predictions about a population based on the data collected from a smaller sample or series of smaller samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what are the characteristics of a normal distribution

A
continuous
symmetrical
bell shaped curve
mean, median and mode are equal
single central peak
values between -infinity and +infinity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is the binomial distribution

A

for binary data e.g. dead/alive, male/female

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is the poisson distribution

A

for events which occur at random intervals of time or space e.g. deaths per year.
rare events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what is the mean and sd of a standard normal distributions

A

mean = 0
sd = 1
we write z~ N (0,1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

where would you expect 95 percent of values to like in normally distributed data

A

mean +/- 1.96 x SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

how can you assess the normality of data

A

Informal review of properties of normal distribution
Inspection of a normal plot
Shapiro- Wilk test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Name ways in which you can transform data to make it plausibly normal and when you would use each one
``` Logarithmic - fairly skewed data in which the variances are proportional to the mean Square root - countrs Reciprocal - highly skewed data Cube- volumes Logit - proportions ```
26
what is the variance of expected number of events
nxpx (1-p)
27
when can the binomial distribution be approximated to normal
if np > 5 and n(1-p) >5
28
what is the standard error
the standard deviation of the mean
29
When can you make inferences about sample means based on the normal distrubution
1. sample is selected from normal population with known SD or the sample size is large 2 observations in the sample are independent
30
when should the hypothesis be defined
before data is collected
31
what is a type 1 error
rejecting a true null hypothesis
32
what is a type 2 error
accepting a false null hypothesis
33
What does the level of significance of a test mean
the probability of making a type one error
34
what is the generally accepted risk of making a type 2 error
20 percent
35
if your significant level is 5 percent what is you confidence level
95 percent
36
When is students t distribution used
When the population standard deviation is not known - for normally distributed data
37
What is the degrees of freedom in t distribution
one less than the sample size
38
What is the difference between and independent and dependent sample
``` independent = different people dependant= same people ``` however if samples from two different groups are match e.g. for age and gender the sample could then be viewed as dependant
39
What are the steps that can be done to compare the means of two samples with incomparable sample variances
1. investigate the relationship between the means and variances 2. use Welch's modified t test 3. do non parametric tests 4. do not process with the test of the means
40
what does it mean if the F statistic is not significant
the variances of the two samples are comparable
41
when can you use the normal approximation for a binomial trial
if both np and n(1-p) are greater than 5
42
what is regression
provides information about the nature of the relationship e.g linear
43
what is correlation
asses the extent of the associations between two variables
44
when is a logistic regression used
when one variable is categorical
45
how do we measure the linear relationship between two variables
correlation coefficient
46
what is the most commonly used measure of correlation
Pearson's product moment correlation coefficient (r)
47
What are the three main points to remember about r
r value increases with sample size at least one value should be normally distributed random sample the pairs of variables are independent correlation can be mathematically significant but not clinically significant
48
what is r squared
measure of the proportion of the variation in the dependent variable which is attributable to its linear relationship with the independent variable
49
what assumptions are made when using regression methods
correlation between x and y significant for each value of the x variable, the values of the y variable have a normal distribution variances of these normal distributions are equal
50
up to what sample size can the Shapiro wilk test provide a test for normality
up to 2000
51
what do the results of the Shapiro wilk test mean
closer to 1 = the more normal the data is
52
what is a cohort study
a group of disease free subjects are followed up over time
53
what is a case control study
retrospective study of people with a disease. compares factors they have been expose to with controls
54
advantages and disadvantages of a cohort study
less likely to be biased expensive not suitable for rare diseases
55
what are the advantages/disadvantages of a case control study
cheap and easy to do | could be biased
56
what factors influence the sample size needed in a study
``` significance level power of the test size of effect to be identified standard deviation of the measurements- greater the SD the greater the sample size needed study design practical issues ```
57
when are parametric tests used
for normally distributed data
58
when are non parametric tests used and name examples
for skewed i.e. not normally distributed data | e.g. chi squared, Wilcoxon, sign
59
what is another name for non parametric tests
distribution free
60
what is the disadvantage of non parametric techniques
they are less powerful than parametric techniques as such all efforts to transform data to approximate normal distribution should be done first
61
what do non parametric techniques use as a representative of centre
the median
62
when should a large sample Wilcoxon statistic which approximated the normal be used
when n>25