Statistics Flashcards
What is sampling?
randomnly take a small set of data from a huge set to be investigated
What are the two aspects of sampling?
randomisation and size
What are the two types of samples?
Independent or dependent
How to decide a population size?
depends on the research aim
What information is needed to estimate sample size?
clinical difference in practice
standard deviation from previous/pilot studies
How to decide sample size?
difference in means of 80% in medical research
0.05 significance level
Ideal sample size
larger than 10 and ideal is 30
What is a double blind experiment?
nobody knows what the f is going on
How do we know if data is normal distribution?
skewness coefficient, >2EM means its not normal distribution
P-p plot- dots arent around the line its not normal distribution
Kolomogorov-smirnov- if the significance is below 0.05 its not normal distribution
What is the null hypothesis?
G1 and G2 arent stastically different
If the tested results are <0.01 or 0.05 then reject the null hypothesis
When do you accept the null hypothesis?
p>0.05
what is the t-test?
parametric test with a parameter similar to normal distribution
used to compare two groups of data
what is an applied situation for a t-test
to compare means for two groups of data with small sample size or data distribution should be normal
When is ANOVA used?
when more than two means are compared with one another, if its just two then the t-test is used
what is degree of freedom?
n-1
what is variance?
descriptor to show how far away samples are from the data center
standard deviation squared
what is an independent sample t-test?
two samples from different populations regarding the same variable
what is paired t-test?
two sample means from the same population measuring the same variable at two different times e.g. pre and post test
what is variance?
similar to standard deviation and is a descriptor to show how far away the data is from the center
standard deviation squared
what are the two variances?
variance within group shows differences amongst samples and variances between groups shows differences between groups
what is the mean of squared differences between groups and within groups
between groups= differences between group mean and total mean
within groups= between samples and their group mean
what is the F value?
ANOVA uses the F value to test if p value is significant
MS between groups/ MS within groups = f value
What are the two test before/after ANOVA?
before- contrast
after- post-hoc
how do we describe non-numeric data?
nominal and ordinal data
frequencies, percentages, distribution, charts- pie, bar, histogram, tables etc.
what is the chi test?
also called the pearson chi square test
compares the observed and expected frequencies in each category to test if all categories contain the similar proportions of values
is there a difference between measured and expected values?
Is Chi squared a parametric or non-parametric test?
Non parametric
What is the expected frequency in a Chi test?
either all are equal or theoretical and the theoritcal depends on your expectation.
What is the goodness of fit test?
compares the observed and expected frequencies in each category
What is the asym sig?
when calculating the chi value you get a number and then this number will give you the asym sig
if the asym sig is <0.05 this is significant
In the Chi square test which function of SPSS do they keep going on about?
crosstab function
what are the applications and limitations of the chi test?
deals with nominal/ordinal data between two or more variables
could be used to compare numeric data but quantitative data could be lost
applied condition: almost no limitation in application
can t-test be applied to stuff without normal distribution?
no it cannot
how to describe normal distribution vs non-normal distribution?
normal- mean and standard deviation
non-normal- median and quartiles
what are the two conditions in parametric test?
data to be numeric and normal distribution
what are the criterion used in non-parametric tests?
the number of signs
the total of ranks in groups
what does the sign test show?
shows which sample is larger rather than the difference between the two
What is the wilcox signed ranks test for dependent data?
similar to the paired t-test but for non-normal distribution of data. To do with comparing two groups of medians
non-parametric tests
what is the mann whitney test?
non-parametric data non-numeric data compares two independent groups uses rank data does not require any data ditribution
talk about the limitations and applications of non-parametric tests
chi, mann whitney and wilcox are the three non-parametric tests
used for nominal/ordinal data and to compare two/multiple variables
can compare numeric data with non-normal distribution
almost no limitations in applications
what is the difference between the mann whitney and wilcox?
both are non-parametric tests and involve the summation of ranks
whitney- independent samples
wilcox- matched/dependent samples
what plot can be used to measure whether there is a trend between two variables?
scatter/dot plot
what is a correlation coefficient and when is it accepted?
used to describe whether two variables have a linear relationship and how strong that relationship is
also called pearssons coefficient
if the significance level is <0.05
what is the range of vales for the correlation coefficient?
-1 to 1
as the correlation comes closer to 1 its called a positive correlation and as the correlation becomes closer to -1 it becomes a negative correlation
correlation of 0 indicates that there is no correlation and the best is a horizontal line through
the more closer to 1 the more correlated two values are
when is correlation coefficient significant?
when the p value is <0.05 (*)
when the p value is<0.01 (**)
describe what the R value <0 and >0 means
<0 - one value decreases while another increases
>0- both values increase
what is the intercept?
the point at which the line cuts through the y axis
what is linear regression?
linear regression considers that there is a linear relationship between two variables independent variable - x dependent variable - y b1- intercept b2- slope
what is standard error of the estimate?
the range of predicted value
what are residuals?
the actual values minus the predicted values - errors produced by the model and the larger this value is the worse it is
the sum of squared differences between the actual and predicted value
what are the signs for regression and correlation coefficients?
regression coefficient- b1 and b2
correlation coefficient- R
can linear regression be used for non-linear stuff?
if the variables arent linearly related then they can be transformed to a suitable form an then linear regression can be used
what are censored cases?
when collecting data some cases cannot be determined/studied outwith factors related to the one being studied
what data is needed for survival analysis?
information stopped
time analysed
factor studied
describe survival analysis based on data quality and time period
data quality- the more the samples the better the better the results, if the number of samples isnt enough then dont do the survival analysis
time period
what is meta-analysis?
a method to use multi source data to analyse what is favoured by most studies
provides the whole picture on an arguable issue from multiple sources
meanly see the mean and effect size?
consider sample size as weight in analysis?
what is the odds ratio and how do you calculate it?
odds ration can be used as an estimate when the occurence of the factor is rare
in spss which function will give you odds ratio?
RISK function
when do you make a forest plot?
meta- analysis
define regression
construct an equation to describe the relationship between two or more variables
what does the residual tell us about the model?
the lower the resiuals, the more accurate the predicted and the better the model
degree of freedom 1 vs 2?
df1 - number of groups - 1
df2- number of samples- number of groups
what is the homogenity of variance test?
to test for the equality of group variances
not dependent on the asumptionof normality
HV>0.05 the variances are similar otherwise its not