Statistics Flashcards
definition of statistics?
use of a study to explore the most important/concise information from a huge set of data using a small set of data
definition of population/global data?
a huge set of data to be investigated, or an experimental set of data with a specific condition
definition of a sample(s)?
a small set of data from the population/global data
definition of sampling?
randomly taking a set of samples from population data
why should sampling always be randomised?
randomised sampling means the data will always be representative of the bigger population
what key indexes are used to represent a set of data?
- Mean
2. Standard deviation (SD)
what is the definition of mean?
the mean is an an average:
the sum of the samples/total number of samples
what is the meaning of mean in terms of data distribution?
the mean gives information concerning how the data is CONCENTRATED
what is the definition of standard deviation?
the SD indicates how much the values of a data set vary from the mean value on average
i.e. gives the average distance from samples to a centre value (mean)
what is the meaning of standard deviation in terms of data distribution?
the standard deviation gives information concerning how the data is SEPARATED
what is the equation for standard deviation?
(can’t write oot just write doon)
what does ‘frequency of data’ mean in stats?
the number that similar data occurs
what does ‘distribution of data’ mean in stats?
the shape constructed by data frequencies
what relationship exists between the frequency of data and distribution of data?
data frequencies plotted together will establish a pattern - the distribution of data
In SPSS, what can users do with the Variable View interface?
define a variable: by name, type and labelling.
In SPSS, what can users do with the Data View interface?
edit data: e.g. copy, paste, delete.
when defining a variable in SPSS, what ‘type’ of data is preferred and why?
Numeric data - can be changed into another form
if a huge number of samples are collected from variables with natural characteristics, the data may form a famous distribution.
What is that?
What shape does it look like?
At the peak position of this distribution, what key value is expected?
A normal distribution curve.
Bell-shaped.
Peak position - the mean.
In Normal Distribution, how much data would be included in the approximate range of:
mean +/- 1 standard deviation?
mean +/- 2 standard deviation?
mean +/- 3 standard deviation?
mean +/- 1 SD : 68%
mean +/- 2 SD: 95%
mean +/- 3 SD: 99%
In stats there are 3 main data types. What are they?
- Numeric data
- e.g. body mass, age, score - Nominal data
- categories without rank
- e.g. gender, colour, weekday - Ordinal data
- categories with rank
- e.g. feeling, satisfaction, visual analogy scale
What types of files can be imported into SPSS directly?
txt
excel file
import data manually
What are the most important indexes to report on in normal data distribution description?
- Mean
- Standard Deviation
- Standard error of mean
- 95% confidence interval: mean +/- 2SEM
- Max and Min values
What is the definition of Standard Error of Mean (SEM)?
shows the”true mean” for the population when multiple sample groups result in several means. if “mean” is used as a variable, and the distribution of means is plotted, it is still a normal distribution.
SEM is the standard deviation of means.
SEM = SD/square root of number of samples
What is a confidence interval?
How do you use SEM to estimate the confidence interval range of mean?
the confidence interval is the range where the global mean could fall within (SEM) .
because plotting means as a variable produces a normal distribution curve, a 95% confidence interval is roughly:
mean +/- SEM
How do standard deviation and the number of samples influence SEM?
since SEM = SD/square root of number of samples,
if SD increases, SEM also increases
If number of samples increases, SEM decreases
What are the most important indexes to report on in data distribution description which isn’t normal?
- Median
- Quartiles
- Frequency or percentage
What is the definition of median?
reordering the data from the smallest to largest, and it is the value at the middle sample
What is the difference between mean and median?
Median comes from sample values directly.
Mean isn’t a sample value. It is the central value, but may not be equal to any sample value.
In terms of sample data, what are the meanings of the maximum and minimum?
the highest and lowest values in the data.
In SPSS, what can users do with the Cross-table function?
- arrange 2 variables into a table
- calculate chi-square
In SPSS, there are many tools to plot graphs. What does a Simple-Bar graph show?
proportions and percentage of data
In SPSS, there are many tools to plot graphs. What does a Pie-Plot graph show?
proportions and percentage of data
In SPSS, there are many tools to plot graphs. What does a Boxplot show?
median (50%), quartiles (25-75%), and extreme values (max/min) within a category
In SPSS, there are many tools to plot graphs. In Error-Bar, what do the circles and dashes represent?
Circle - mean
Dashes - standard deviation
In SPSS, what characteristics from two variables can be shown using the Scatter/Dot graph?
- tendency of the data
- relationship between variables
Using SPSS, what file types can be exported as output?
many formats - txt, word file, excel, html
What methods ensure a sample is randomised?
- researchers have no particular standard in the selection of samples.
- samples have no particular standard to be selected by researchers
samples taken from a data set can either be ‘dependent’ or ‘independent’. what does this mean?
independent: measurements have no effect on each other e.g. different subjects from different areas are tested using the same equipment - data is independent
dependent: measurements have an effect on each other e.g. same subjects tested at different times, pre- and post- op - data is dependent
what is a double blind experiment
both researchers and patients have no idea what samples they are dealing with
what is the difference between subjective and objective data?
subjective data - the results produced from the feeling or psychological impression of the participants
objective data - the results produced by the measurement instruments or equipment
why is it necessary to test if data is normal distribution?
some statistical methods require the data to be normal distribution, or a similar one
What methods can be used to test whether the data is normal distributed or not?
- Skewness Coefficient (SC):
- measure of asymmetry of distribution
- SC close to 0: ND
- SC > 0: long right tail
- SC < 1: long left tail - P-P plot:
- points close to the line: ND, otherwise not ND. - Kolmogrov-Smirnov test with p-value
If two sets of sample data have different means, are their global means significantly different? Why?
Can’t be sure - the sampling could be the main reason for the difference. Need to use a statistical method to check
In test of hypothesis, what is a primary/null hypothesis, Ho?
there is no significant difference between 2 parent sets of data
In test of hypothesis, what are alternative hypotheses, H1 and H2?
H1:
Group 1 > Group 2 if Mean 1 > Mean 2
H2:
Group 1 < Group 2 if Mean 1 < Mean 2
In the test of hypothesis, what significant levels are normally used?
0.05
When comparing two sets of sample data, under what conditions are two groups of data considered to be significantly different in terms of statistics?
probability (p) value < 0.05 :
significant difference between 2 sets of data. the hypothesis doesn’t stand
Under what condition is a primary hypothesis accepted?
probability (p) value > 0.05:
no significant difference between 2 sets of data. the hypothesis may stand
What do a single asterisk and double asterisk represent in terms of statistics?
- = p < 0.05 (low)
** = p < 0.01 (very low)
significant difference between data sets
what main indexes will influence the results in test of hypothesis
mean (x) standard deviation (SD) sample size (n) significant level (p)
What is the most common method used to compare 2 groups of data?
t-test
in what situations can t-test be applied?
numerical data, small sample size, normal distribution
if a group of subjects are measured twice in a time interval e.g. pre- and post- treatments, are the measured variables (e.g. knee scores) independent or dependent?
in this situation, what statistical method can be used to compare the means?
dependent
use paired t-test
if a group of subjects are treated in different conditions i.e. each patient getting a different type of artificial hip, are the measured variabels (e.g. hip scores) independent or dependent?
in this situation, what statistical method can be used to compare the means?
independent
use independent t-test
what are the usually applied situations for Paired-Sample T-Test or Independent-Sample-T-Test?
paired t-test - dependent data
independent t-test - independent data
if data is not continual, can t-test be applied?
no
What does ANOVA stand for? What situations are ANOVA suitable for?
ANOVA = analysis of variances
used to test whether there are differences amongst multiple groups of data (t-test can only test 2 groups)
What is variance? How is it calculated?
Variance is similar to SD, a descriptor to show how samples are far away from the data centre.
Can be calculated using (SD)squared.
if p < 0.05 is found from an ANOVA result, does it mean that all groups of data are significantly different from each other?
Can’t be sure - need to do post-hoc test to see which pair is significantly different.
in statistics, when estimating the expected mean in a range of 95% confidence interval, what are the upper and lower bounds for the mean?
upper: + 2SEM
lower: - 2SEM
what methods are used with ANOVA to see which 2 means are different if ANOVA gets p < 0.05?
- Contrast: a pre-test done before ANOVA to test if 2 groups are significantly different.
- Post-Hoc test: a post test done after ANOVA if p <0.0.5 to test which pair of groups is significantly different. checks all groups as pairs.
What are the two types of ANOVA?
One-way ANOVA:
an extended t-test which uses contrast and post-hoc
Univariate ANOVA:
a general linear model in SPSS
If data is not ND or non numeric (ordinal or nominal), what main methods are suitable for their test of hypothesis?
Chi-square analysis or non-parametirc testing
what is the principle of chi-square testing?
the similarity between an observed distribution and theoretical one can be used to test whether a hypothesis stands or not
in chi-square analysis, what are the theoretically expected values used to compare practical data?
equal averaged percentage for all groups or equal sample size for all
how is chi-square calculated
sum of
(observed value - expected value)squared / expected value
in using chi-square, assuming that multiple groups of data are significantly different, does this mean that any two of them will be significantly different?
No
in what situations should users apply non-parametric test methods?
non-numerical data (ordinal or nominal), and
comparison of numeric data without normal distribution.
what are the main differences between parametric and non-parametric test methods?
parametric testing:
- uses a specific parameter (t-test) to analyse data and look at probability
- require data to be numeric and normal distribution
- compares means and SD to describe data
non-parametric testing:
- uses non-numeric information from data to compare sample groups without normal distribution
- compares medians and quartiles to describe data
in non-parametric test methods, what kinds of information/values are used to assess differences between groups of data?
1) the number of signs
2) the total of ranks in groups
if G1 number of signs > G2 number of signs, there may be significant difference between 2 groups
if G1 total ranks > G2 total ranks, there may be significant difference between 2 groups
how are number of signs in groups calculated in non-parametric testing?
if a sample value in G1 > than sample value in G2, then add a +ve sign
if a sample value in G1 < than a sample value in G2, then add a -ve sign
count the number of +ve and -ve signs
explain the rank test in non-parametric testing?
define the max/min as rank 1, the second max/min as 2, and so on.
then sum the ranks of each group
What are the two types of rank test?
Wilcoxon-signed-ranks test: for dependent non-normal distribution data
Mann-Whitney test: for independent non-normal distribution data
In statistics, what methods help us to analyse relations between variables?
- correlation
2. regression
What type of graph is plotted to determine whether two variables are correlated?
Scatter/Dot graph
When a scatter/dot graph shows that two variables have a certain association, can we say that they are linearly correlated?
can’t directly say - need first to get a p-value to say if the values are correlated.
then calculate a correlation coefficient to show how strongly they are correlated
What is the use of a Pearson correlation coefficient in statistics?
- describes whether two variables have a linear relationship
- describes how strong the relationship is
If two variables have a linear correlation, what significant value would be expected after the test of hypothesis?
p < 0.05
in terms of the calculation of a correlation coefficient, what range should it be kept within?
-1 to 1
the closer to 1 or -1, there is a strong correlation between variables
if value is 0, the variables are not linearly correlated
Is it possible that a correlation coefficient is negative? If so, what is the meaning of the coefficient?
Yes it is possible.
If correlation coefficient is negative, it means that a variable increases while another variable decreases.
What is regression and linear regression?
Regression is to construct an equation to describe the relationship between 2 (or multiple) variables.
linear regression considers that there is a linear relationship between the two variables when constructing the equation (e.g. y = b1 + b2x),
where b1 is the intercept and b2 is the slope
To obtain a linear regression equation, what coefficients will be calculated or estimated?
What is the meaning of these coefficients in terms of a linear equation?
coefficients: b1 and b2
b1 = independent coefficient (intercept) b2 = slope, shows the direction between the straight line and the x-axis
What is the definition of residuals in linear regression?
If the residuals are higher/lower, how much quality will be expected from the regressed linear equation?
Residuals are the errors produced by the model between the predictor value and the measured value
A high residual value - poor accuracy of prediction of the model
A low residual value - accurate prediction by the model
Can the method of linear regression be extended to non-linear situations? If so, how do we process the original data?
Yes.
Transform non-linear variable to a linear variable, and then linear regression can also be used.
What is survival analysis? How is the principle of survival analysis applied to orthopaedics?
Survival analysis analyses how death ratios are changed with ages.
Can be applied to orthopaedics by assessing the effect of implants on surgery/patients, for example.
What is a censored case?
Cases in which data cannot be collected or determined by their situations for some reasons not related to the factor studied
What method is used to analyse censored cases?
survival analysis
What main points should be noted when applying survival analysis?
Must have large sample size, and use the time period which has the most reliable data (i.e. the number of samples is large enough in the studied period).
What is the main idea in Meta Analysis?
Studies multi-sources of data to provide a whole picture from all sources on an arguable issue, to see whether data favours one direction
What main points should be noted when applying Meta analysis?
The data quality and format of all sources should be similar
What is a relative ratio?
the number of cases with event to the number of samples
What is an odds ratio?
the proportion of the number of cases to the number of non-cases
What values are calculated in Meta-analysis?
Relative ratio
Odds ratio
Forest plot
Definition of repeatability?
assessment of the variation between repeated measurements by a user or an instrument on the same object under the same conditions
What is repeatability also sometimes called?
test-retest reliability
What is the opposite term of repeatability?
Variability
Why are reliability & repeatability relevant to the quality of medical instruments?
It is important to know if readings from an instrument are reliable. Or if different consultants who measure the same object will give similar outcomes.
List the usual stages to check the difference/repeatability between two measurements?
1) check normality of data
2) select statistical method (e.g. t-test if ND, or non-parametric testing if non-ND)
3) find p-value to conclude whether 2 measurements are significantly different or not
What is a one-sample t-test used for?
To assess whether the difference between 2 measurements is close to zero - a one-sample t-test against 0 is done.
Why should the 95% CI of the differences between measurements be obtained?
To assess the quality of the measurements.
To say that when we do repeated measurements, there is a 95% chance the measurement will be within a specifc range (i.e. 95% CI)
What are the calculations for Repeatability Coefficient?
RC = 2 x (SD of differences)
RC% = 100 x (2(SD of differences) / (mean1 + mean2)/2)
What value of RC% means the system has good repeatability?
< 10%
What type of data is used in repeatability coefficient?
Numerical
What are the 3 types of coefficient used to assess if data is reliable?
- Repeatability coefficient
- Intra-class correlation coefficient
- Kappa coefficient
What types of data can be used for the intra-class correlation coefficient?
Numerical - but can be dichotomous, ordinal or interval, but the format should be coded numerically
What do the values of the ICC mean?
close to 0 = worst
close to 1 = best
What type of data is used in Kappa Coefficient?
Ordinal
What does Kappa’s coefficient measure?
measures the agreement between evaluations of 2 rankers both ranking the same object
What doe the values of Kappa’s coefficient indicate?
close to 0 = agreement by chance
close to 1 = perfect agreement