Statistics Flashcards by Jennifer Steen

What are the two main types of data

quantitative

qualitative

How well did you know this?

Not at all

Perfectly

What is ordinal data

the data can be given a meaningful order

How well did you know this?

Not at all

Perfectly

What is nominal data

there is no relationship that is meaningful in terms of order of the categories ie. it is just name e.g. atkins diet and paleo diet

How well did you know this?

Not at all

Perfectly

What is binomial data

there are only two options e.g. yes or no

How well did you know this?

Not at all

Perfectly

What is a random sample

one in which each member of the population has an equally likely non zero chance of being included

How well did you know this?

Not at all

Perfectly

what is a stratified sample

one in which certain categories of the population must be represented e.g. if we know the library is 50 percent history books, 30 percent science and 20 percent others. in a sample of 20 we must select 10 history books, 6 science and 4 others.

How well did you know this?

Not at all

Perfectly

what is a convienience sample

one that is not chosen randomly but is all that is available eg. all patients at an outpatient dermatology clinic

How well did you know this?

Not at all

Perfectly

when would you use a bar or pie chart

categorical data

How well did you know this?

Not at all

Perfectly

when would you use histograms, stem and leaf plots and box and whisker plots

to visualise continuous data

How well did you know this?

Not at all

Perfectly

what does a scatter plot show

the relationship between two variable and how one changes in relation to the other

How well did you know this?

Not at all

Perfectly

when would you use the mean and when would you use the median to describe the centrality of data

mean - normal distriuted not skewed data
media- if data is more skewed or significant outlier
mode- used for qualitative data

How well did you know this?

Not at all

Perfectly

what do you do differently when calculating the sample variance/sd as opposed to the population

use n-1 as the denominator instead of n

How well did you know this?

Not at all

Perfectly

what does the standard deviation show

the spread of the data

How well did you know this?

Not at all

Perfectly

what does positively skewed mean

that more of the values are clusted towards the bottome of the scale - such as alcohol intake

How well did you know this?

Not at all

Perfectly

what is negatively skewed

most of the values are clustered at the higher range of the scale - rare in clinical data

How well did you know this?

Not at all

Perfectly

what is the coefficient of skewness

a value which shows how skewed the data is - the closes to 0 the more symmetrical the data

How well did you know this?

Not at all

Perfectly

what does a value of 0 for the kurtosis mean

indicates that the shape of the data is close to the normal distribution

How well did you know this?

Not at all

Perfectly

what is inference

making predictions about a population based on the data collected from a smaller sample or series of smaller samples

How well did you know this?

Not at all

Perfectly

what are the characteristics of a normal distribution

continuous
symmetrical
bell shaped curve
mean, median and mode are equal
single central peak
values between -infinity and +infinity

How well did you know this?

Not at all

Perfectly

what is the binomial distribution

for binary data e.g. dead/alive, male/female

How well did you know this?

Not at all

Perfectly

what is the poisson distribution

for events which occur at random intervals of time or space e.g. deaths per year.
rare events

How well did you know this?

Not at all

Perfectly

what is the mean and sd of a standard normal distributions

mean = 0
sd = 1
we write z~ N (0,1)

How well did you know this?

Not at all

Perfectly

where would you expect 95 percent of values to like in normally distributed data

mean +/- 1.96 x SD

How well did you know this?

Not at all

Perfectly

how can you assess the normality of data

Informal review of properties of normal distribution
Inspection of a normal plot
Shapiro- Wilk test

How well did you know this?

Not at all

Perfectly

Name ways in which you can transform data to make it plausibly normal and when you would use each one

``` Logarithmic - fairly skewed data in which the variances are proportional to the mean Square root - countrs Reciprocal - highly skewed data Cube- volumes Logit - proportions ```

what is the variance of expected number of events

nxpx (1-p)

when can the binomial distribution be approximated to normal

if np > 5 and n(1-p) >5

what is the standard error

the standard deviation of the mean

When can you make inferences about sample means based on the normal distrubution

1. sample is selected from normal population with known SD or the sample size is large 2 observations in the sample are independent

when should the hypothesis be defined

before data is collected

what is a type 1 error

rejecting a true null hypothesis

what is a type 2 error

accepting a false null hypothesis

What does the level of significance of a test mean

the probability of making a type one error

what is the generally accepted risk of making a type 2 error

20 percent

if your significant level is 5 percent what is you confidence level

95 percent

When is students t distribution used

When the population standard deviation is not known - for normally distributed data

What is the degrees of freedom in t distribution

one less than the sample size

What is the difference between and independent and dependent sample

``` independent = different people dependant= same people ``` however if samples from two different groups are match e.g. for age and gender the sample could then be viewed as dependant

What are the steps that can be done to compare the means of two samples with incomparable sample variances

1. investigate the relationship between the means and variances 2. use Welch's modified t test 3. do non parametric tests 4. do not process with the test of the means

what does it mean if the F statistic is not significant

the variances of the two samples are comparable

when can you use the normal approximation for a binomial trial

if both np and n(1-p) are greater than 5

what is regression

provides information about the nature of the relationship e.g linear

what is correlation

asses the extent of the associations between two variables

when is a logistic regression used

when one variable is categorical

how do we measure the linear relationship between two variables

correlation coefficient

what is the most commonly used measure of correlation

Pearson's product moment correlation coefficient (r)

What are the three main points to remember about r

r value increases with sample size at least one value should be normally distributed random sample the pairs of variables are independent correlation can be mathematically significant but not clinically significant

what is r squared

measure of the proportion of the variation in the dependent variable which is attributable to its linear relationship with the independent variable

what assumptions are made when using regression methods

correlation between x and y significant for each value of the x variable, the values of the y variable have a normal distribution variances of these normal distributions are equal

up to what sample size can the Shapiro wilk test provide a test for normality

up to 2000

what do the results of the Shapiro wilk test mean

closer to 1 = the more normal the data is

what is a cohort study

a group of disease free subjects are followed up over time

what is a case control study

retrospective study of people with a disease. compares factors they have been expose to with controls

advantages and disadvantages of a cohort study

less likely to be biased expensive not suitable for rare diseases

what are the advantages/disadvantages of a case control study

cheap and easy to do | could be biased

what factors influence the sample size needed in a study

``` significance level power of the test size of effect to be identified standard deviation of the measurements- greater the SD the greater the sample size needed study design practical issues ```

when are parametric tests used

for normally distributed data

when are non parametric tests used and name examples

for skewed i.e. not normally distributed data | e.g. chi squared, Wilcoxon, sign

what is another name for non parametric tests

distribution free

what is the disadvantage of non parametric techniques

they are less powerful than parametric techniques as such all efforts to transform data to approximate normal distribution should be done first

what do non parametric techniques use as a representative of centre

the median

when should a large sample Wilcoxon statistic which approximated the normal be used

when n>25

Statistics Flashcards

(62 cards)