Statistics Flashcards
What are the definitions of x̄ and SD? what meanings of the indexes?
the mean is the average so all samples/ number of samples
- it describes how data is concentrated
SD gives the average distance from samples to the centre value
- it describes how data is separated
What is
a) data frequency?
b) data distribution?
c) what relationship exists between the two terms?
a) how often similar data occurs (no times sample value occurs)
b) the shape constructed by data distribution
c) frequency constructs the distribution eg in a histogram
If a huge no of samples are collected from nature
a) what is the distribution?
b) what shape is it?
c) what value is at the peak?
a) normal distribution
b) bell shaped
c) the mean
In ND, what is
a) x̄ + 1SD
b) x̄ + 2SD
c) x̄ + 3SD equal to in %?
a) 68.27%
b) 95.45%
c) 99.73%
roughly 65,95,99
What types of file can be directly imported into SPSS?
- excel
- txt
- direct input data in data view
What are the 3 main types of data?
- numerical
- nominal: categories without rank eg gender
- ordinal: categories with rank eg satisfaction
In SPSS what data characteristics can be shown using the histogram?
- frequency
- distribution
In SPSS what can users do with the
a) variable view?
b) data view?
a) define variables: define name with letter, define type of data eg string/numeric, define how many posession, define no decimal …
b) Edit, calculate and analyse data
How do we calculate the median?
median is the middle sample value
- reorder the values from smallest to biggest and pick the middle one
how do we work out the mean?
the total of the numbers divided by how many numbers there are
what are the maximum and minimum for sample data?
highest and lowest values
show us data range
In SPSS what can users do with the crosstab function?
show 2 variables in one table
run chi-square to hypothesis test
In an error bar what do the
a) circles
b) dashes
represent?
a) mean
b) SD, SEM, 95% CI
what characteristics from two variables can be shown using the scatter/dot?
tendency of the data or the relationship between variables: certain pattern or trend
using SPSS what file types can be exported as output?
- direct copy to word
- export as excel, word, powerpoint, txt, graph only
if two sets of sample data have different means are their global means significantly different? why?
depends on significance
- if no significance it is not coming from the population but from your sampling
- if significant then global means are different
in test of hypothesis, what are primary/null hypothesis, H0 and alternative hypothesis, H1 and H2?
- H0 means that there is no significant difference
- H1/H2 show significant difference
- depending on group means
- H1 = M1>M2 so G1>G2
- H2 = M1<m2></m2>
in test of hypothesis what significant levels are normally used?
low p <0.05
high p>0.05
(0.01 would be used to emphasise a strong SD)
when comparing two sets of sample data under what conditions are two groups of data considered to be significantly different?
p<0.05
What do a single asterisk and double asterisk represent in terms of statistics?
* = p<0.05
** = p<0.01
under what condition is a primary hypothesis accepted?
p>0.05 accept
what main indexes will influence the results in test of hypothesis?
4 values
- x̄ ,mean
- SD , standard deviation
- n , no of samples
- p , probability
in which situation can t test be applied?
numeric data
normal distribution data
two group
small sample size
to examine if two means are significantly different
if a group of subjects are measured twice in a time interval, eg pre and post treatments, are the measured variables independent or dependent? what statistical method can be used to compare the means?
- dependent data
- paired t-test
if a group of subjects are treated in different conditions ie. each patient gets a different type of hip replacement, are the measured variables independent or dependent? what statistical methods can be used to compare?
- independent
- independent sample t test
what are usually applied situations for paired sample t test or independent sample t test?
- paired sample if dependent data
- independent sample t test if independent data
can t test be applied if data is not continual?
no
cannot t test to non-numerical even rank is not continual
What is the standard error of mean?
a sample mean deviates from the actual mean of a population; this deviation is the standard error
How do we calculate standard error of mean?
SEM = SD/ √ n
What is the confidence interval of x̄ + 2SEM?
95% confidence interval
How do SD and the number of samples influence SEM?
SEM = SD/ √n so if SD inc the SEM increases, if n increases SEM decreases
What does ANOVA stand for? What situation is it suitable?
Analysis of variances
numeric data comparing multi-group
If p<0.05 is found from ANOVA result, does it mean that all groups of data are significantly different from each other?
no it means there is some sd among groups but not all groups sd
need to post-hoc test to know which groups as pairs have a SD
what does post hoc mean?
after ANOVA if p<0.05 we do post-hoc for all groups to see what specific pairs are SD
if data is ordinal what method can test the hypothesis?
chi-square
non-parametric test (wilcoxon or mann whitney)
In chi square analysis what are the theoretical values used to compare with practical data?
all categories equal in percentage and sample size so that they all have same chance
eg 50% for two, 33.3% for 3
in using chi square analysis, assuming that multiple groups of data are significantly different, does this mean that any two of them will be significantly different?
not sure
one p value for multi-group does not mean there is a SD between two groups
we need to run post-hoc to check
What is the difference between nominal and ordinal data?
ordinal is a category with a rank eg satisfaction whereas nominal is just a category like gender
when comparing means for two+ groups of data what situations should users apply non-parametric methods?
non-numeric information so ordinal or nominal data
numeric but not ND
how to use statistical equations and text book tables eg. t table or z-table, to assess if two groups are significant different?
we have equation and we use it to calculate value
- eg t test to calculate t value from own value
- use t table from textbook to find t critical value and compare the two values
- t critical number will define a range
- our t value will fall in p<0.05 or p>0.05
- p<0.05 needs to be more than t critical (which is usually 2)
what are the main differences between parametric and non-parametric tests?
parametric tests use parameters to analyse numeric data and calculate probability
non-parametric tests directly use non-numeric info to compare groups
in non-parametric test methods, what kinds of information are used to assess differences between groups of data?
what types of data are used?
- the number of signs
- the total of ranks
data is ordinal and rank (scale if not ND)
when a scatter/dot graph shows that two variables have a certain association can we say that they are linearly associated?
no this shows the trend then we need to work out the correlation coefficient and use the significant level to confirm the trend
if two variables have a linear correlation, how much significant value would be expected after the test of hypothesis?
p<0.05
what range should the correlation coefficient be kept within?
-1 to 1
- max 1
- min 0
is it possible that a correlation coefficient can be negative?
yes it is possible
to obtian a linear regression equation, what coefficients will be calculated or estimated?
y = b1 + b2x
- b1 is the intercept
- b2 is the slope
what is the definition of residuals in linear regression?
the actual values of dependent item minus its predicted values
ie. the errors produced by the model
can the method of linear regression be extended to non-linear situations?
yes
replace the non-linear variable with a linear one
why does sampling procedure have to be randomised?
so that data is representative of the population
what is the censored case? what method is used to analyse the case?
when collecting data some of the cases cannot be collected for some reason not related to the factor studied
- analyse using survival analysis (big sample size, long time data)
what is the main idea in meta analysis?
uses multi-source data eg. local research centre, publication
to analyse what is favoured by most studies
What is statistics?
taking a population with huge amounts of data and sampling this to get important information from a smaller amount of data
what is the sample mean formula?
x̄ = ( Σ xi ) / n
What does xi mean?
all of the x values
What is the equation for standard deviation?
√ Σ ( xi – x̄)2 / ( n – 1 )
In what type of sample is it hard to have ND?
small samples
In which type of distribution do we use SEM?
normal
When distribution isn’t normal what average do we use?
median
Which type of distribution is a boxplot used for?
not normal
What do we plot normal distribution on?
error bar
can we analyse the mean, median and SD for ordinal data?
no
so need non-parametric tests eg chi-square
If we have dependent, non-numerical data which hypothesis test should be used?
wilcoxon signed-ranks test
if we have independent, non-numeric data which test should we use?
Mann-Whitney
In linear regression equation what is
a) dependent variable?
b) independent variable?
c) intercept?
d) slope?
which is the constant and which is the coefficient
a) y
b) x
c) b1 (constant)
d) b2 (coefficient
in a linear regression equation, what coefficients will be calculated or estimated?
the coefficient in front of the variable and a constant
When we have a 95% CI what are the upper and lower limits of the mean?
x̄ +/- 2SEM
What is the degree of freedom?
n-1
What type of distribution is needed for ANOVA?
normal
Which test can be used to hypothesis test percentage data?
chi square
What is R?
when it is used what does data need to be
pearson’s correlation coefficient
continuous
Change linear regression equation to what the values are?
y= b1 + b2x
dependent variable = constant + coefficient (independent variable)