Statistics Flashcards

1
Q

definition of statistics?

A

use of a study to explore the most important/concise information from a huge set of data using a small set of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

definition of population/global data?

A

a huge set of data to be investigated, or an experimental set of data with a specific condition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

definition of a sample(s)?

A

a small set of data from the population/global data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

definition of sampling?

A

randomly taking a set of samples from population data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

why should sampling always be randomised?

A

randomised sampling means the data will always be representative of the bigger population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what key indexes are used to represent a set of data?

A
  1. Mean

2. Standard deviation (SD)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the definition of mean?

A

the mean is an an average:

the sum of the samples/total number of samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the meaning of mean in terms of data distribution?

A

the mean gives information concerning how the data is CONCENTRATED

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the definition of standard deviation?

A

the SD indicates how much the values of a data set vary from the mean value on average
i.e. gives the average distance from samples to a centre value (mean)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is the meaning of standard deviation in terms of data distribution?

A

the standard deviation gives information concerning how the data is SEPARATED

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the equation for standard deviation?

A

(can’t write oot just write doon)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what does ‘frequency of data’ mean in stats?

A

the number that similar data occurs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what does ‘distribution of data’ mean in stats?

A

the shape constructed by data frequencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what relationship exists between the frequency of data and distribution of data?

A

data frequencies plotted together will establish a pattern - the distribution of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In SPSS, what can users do with the Variable View interface?

A

define a variable: by name, type and labelling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In SPSS, what can users do with the Data View interface?

A

edit data: e.g. copy, paste, delete.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

when defining a variable in SPSS, what ‘type’ of data is preferred and why?

A

Numeric data - can be changed into another form

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

if a huge number of samples are collected from variables with natural characteristics, the data may form a famous distribution.
What is that?
What shape does it look like?
At the peak position of this distribution, what key value is expected?

A

A normal distribution curve.
Bell-shaped.
Peak position - the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

In Normal Distribution, how much data would be included in the approximate range of:
mean +/- 1 standard deviation?
mean +/- 2 standard deviation?
mean +/- 3 standard deviation?

A

mean +/- 1 SD : 68%

mean +/- 2 SD: 95%

mean +/- 3 SD: 99%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

In stats there are 3 main data types. What are they?

A
  1. Numeric data
    - e.g. body mass, age, score
  2. Nominal data
    - categories without rank
    - e.g. gender, colour, weekday
  3. Ordinal data
    - categories with rank
    - e.g. feeling, satisfaction, visual analogy scale
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What types of files can be imported into SPSS directly?

A

txt
excel file
import data manually

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the most important indexes to report on in normal data distribution description?

A
  1. Mean
  2. Standard Deviation
  3. Standard error of mean
  4. 95% confidence interval: mean +/- 2SEM
  5. Max and Min values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the definition of Standard Error of Mean (SEM)?

A

shows the”true mean” for the population when multiple sample groups result in several means. if “mean” is used as a variable, and the distribution of means is plotted, it is still a normal distribution.

SEM is the standard deviation of means.

SEM = SD/square root of number of samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is a confidence interval?

How do you use SEM to estimate the confidence interval range of mean?

A

the confidence interval is the range where the global mean could fall within (SEM) .

because plotting means as a variable produces a normal distribution curve, a 95% confidence interval is roughly:
mean +/- SEM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How do standard deviation and the number of samples influence SEM?

A

since SEM = SD/square root of number of samples,

if SD increases, SEM also increases
If number of samples increases, SEM decreases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What are the most important indexes to report on in data distribution description which isn’t normal?

A
  1. Median
  2. Quartiles
  3. Frequency or percentage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the definition of median?

A

reordering the data from the smallest to largest, and it is the value at the middle sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the difference between mean and median?

A

Median comes from sample values directly.

Mean isn’t a sample value. It is the central value, but may not be equal to any sample value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

In terms of sample data, what are the meanings of the maximum and minimum?

A

the highest and lowest values in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

In SPSS, what can users do with the Cross-table function?

A
  • arrange 2 variables into a table

- calculate chi-square

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

In SPSS, there are many tools to plot graphs. What does a Simple-Bar graph show?

A

proportions and percentage of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

In SPSS, there are many tools to plot graphs. What does a Pie-Plot graph show?

A

proportions and percentage of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

In SPSS, there are many tools to plot graphs. What does a Boxplot show?

A

median (50%), quartiles (25-75%), and extreme values (max/min) within a category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

In SPSS, there are many tools to plot graphs. In Error-Bar, what do the circles and dashes represent?

A

Circle - mean

Dashes - standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

In SPSS, what characteristics from two variables can be shown using the Scatter/Dot graph?

A
  • tendency of the data

- relationship between variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Using SPSS, what file types can be exported as output?

A

many formats - txt, word file, excel, html

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What methods ensure a sample is randomised?

A
  1. researchers have no particular standard in the selection of samples.
  2. samples have no particular standard to be selected by researchers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

samples taken from a data set can either be ‘dependent’ or ‘independent’. what does this mean?

A

independent: measurements have no effect on each other e.g. different subjects from different areas are tested using the same equipment - data is independent
dependent: measurements have an effect on each other e.g. same subjects tested at different times, pre- and post- op - data is dependent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

what is a double blind experiment

A

both researchers and patients have no idea what samples they are dealing with

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

what is the difference between subjective and objective data?

A

subjective data - the results produced from the feeling or psychological impression of the participants

objective data - the results produced by the measurement instruments or equipment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

why is it necessary to test if data is normal distribution?

A

some statistical methods require the data to be normal distribution, or a similar one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What methods can be used to test whether the data is normal distributed or not?

A
  1. Skewness Coefficient (SC):
    - measure of asymmetry of distribution
    - SC close to 0: ND
    - SC > 0: long right tail
    - SC < 1: long left tail
  2. P-P plot:
    - points close to the line: ND, otherwise not ND.
  3. Kolmogrov-Smirnov test with p-value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

If two sets of sample data have different means, are their global means significantly different? Why?

A

Can’t be sure - the sampling could be the main reason for the difference. Need to use a statistical method to check

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

In test of hypothesis, what is a primary/null hypothesis, Ho?

A

there is no significant difference between 2 parent sets of data

45
Q

In test of hypothesis, what are alternative hypotheses, H1 and H2?

A

H1:
Group 1 > Group 2 if Mean 1 > Mean 2

H2:
Group 1 < Group 2 if Mean 1 < Mean 2

46
Q

In the test of hypothesis, what significant levels are normally used?

A

0.05

47
Q

When comparing two sets of sample data, under what conditions are two groups of data considered to be significantly different in terms of statistics?

A

probability (p) value < 0.05 :

significant difference between 2 sets of data. the hypothesis doesn’t stand

48
Q

Under what condition is a primary hypothesis accepted?

A

probability (p) value > 0.05:

no significant difference between 2 sets of data. the hypothesis may stand

49
Q

What do a single asterisk and double asterisk represent in terms of statistics?

A
  • = p < 0.05 (low)

** = p < 0.01 (very low)

significant difference between data sets

50
Q

what main indexes will influence the results in test of hypothesis

A
mean (x)
standard deviation (SD)
sample size (n)
significant level (p)
51
Q

What is the most common method used to compare 2 groups of data?

A

t-test

52
Q

in what situations can t-test be applied?

A

numerical data, small sample size, normal distribution

53
Q

if a group of subjects are measured twice in a time interval e.g. pre- and post- treatments, are the measured variables (e.g. knee scores) independent or dependent?

in this situation, what statistical method can be used to compare the means?

A

dependent

use paired t-test

54
Q

if a group of subjects are treated in different conditions i.e. each patient getting a different type of artificial hip, are the measured variabels (e.g. hip scores) independent or dependent?

in this situation, what statistical method can be used to compare the means?

A

independent

use independent t-test

55
Q

what are the usually applied situations for Paired-Sample T-Test or Independent-Sample-T-Test?

A

paired t-test - dependent data

independent t-test - independent data

56
Q

if data is not continual, can t-test be applied?

A

no

57
Q

What does ANOVA stand for? What situations are ANOVA suitable for?

A

ANOVA = analysis of variances

used to test whether there are differences amongst multiple groups of data (t-test can only test 2 groups)

58
Q

What is variance? How is it calculated?

A

Variance is similar to SD, a descriptor to show how samples are far away from the data centre.

Can be calculated using (SD)squared.

59
Q

if p < 0.05 is found from an ANOVA result, does it mean that all groups of data are significantly different from each other?

A

Can’t be sure - need to do post-hoc test to see which pair is significantly different.

60
Q

in statistics, when estimating the expected mean in a range of 95% confidence interval, what are the upper and lower bounds for the mean?

A

upper: + 2SEM
lower: - 2SEM

61
Q

what methods are used with ANOVA to see which 2 means are different if ANOVA gets p < 0.05?

A
  1. Contrast: a pre-test done before ANOVA to test if 2 groups are significantly different.
  2. Post-Hoc test: a post test done after ANOVA if p <0.0.5 to test which pair of groups is significantly different. checks all groups as pairs.
62
Q

What are the two types of ANOVA?

A

One-way ANOVA:
an extended t-test which uses contrast and post-hoc

Univariate ANOVA:
a general linear model in SPSS

63
Q

If data is not ND or non numeric (ordinal or nominal), what main methods are suitable for their test of hypothesis?

A

Chi-square analysis or non-parametirc testing

64
Q

what is the principle of chi-square testing?

A

the similarity between an observed distribution and theoretical one can be used to test whether a hypothesis stands or not

65
Q

in chi-square analysis, what are the theoretically expected values used to compare practical data?

A

equal averaged percentage for all groups or equal sample size for all

66
Q

how is chi-square calculated

A

sum of

(observed value - expected value)squared / expected value

67
Q

in using chi-square, assuming that multiple groups of data are significantly different, does this mean that any two of them will be significantly different?

A

No

68
Q

in what situations should users apply non-parametric test methods?

A

non-numerical data (ordinal or nominal), and

comparison of numeric data without normal distribution.

69
Q

what are the main differences between parametric and non-parametric test methods?

A

parametric testing:

  • uses a specific parameter (t-test) to analyse data and look at probability
  • require data to be numeric and normal distribution
  • compares means and SD to describe data

non-parametric testing:

  • uses non-numeric information from data to compare sample groups without normal distribution
  • compares medians and quartiles to describe data
70
Q

in non-parametric test methods, what kinds of information/values are used to assess differences between groups of data?

A

1) the number of signs
2) the total of ranks in groups

if G1 number of signs > G2 number of signs, there may be significant difference between 2 groups

if G1 total ranks > G2 total ranks, there may be significant difference between 2 groups

71
Q

how are number of signs in groups calculated in non-parametric testing?

A

if a sample value in G1 > than sample value in G2, then add a +ve sign

if a sample value in G1 < than a sample value in G2, then add a -ve sign

count the number of +ve and -ve signs

72
Q

explain the rank test in non-parametric testing?

A

define the max/min as rank 1, the second max/min as 2, and so on.
then sum the ranks of each group

73
Q

What are the two types of rank test?

A

Wilcoxon-signed-ranks test: for dependent non-normal distribution data

Mann-Whitney test: for independent non-normal distribution data

74
Q

In statistics, what methods help us to analyse relations between variables?

A
  1. correlation

2. regression

75
Q

What type of graph is plotted to determine whether two variables are correlated?

A

Scatter/Dot graph

76
Q

When a scatter/dot graph shows that two variables have a certain association, can we say that they are linearly correlated?

A

can’t directly say - need first to get a p-value to say if the values are correlated.

then calculate a correlation coefficient to show how strongly they are correlated

77
Q

What is the use of a Pearson correlation coefficient in statistics?

A
  • describes whether two variables have a linear relationship
  • describes how strong the relationship is
78
Q

If two variables have a linear correlation, what significant value would be expected after the test of hypothesis?

A

p < 0.05

79
Q

in terms of the calculation of a correlation coefficient, what range should it be kept within?

A

-1 to 1

the closer to 1 or -1, there is a strong correlation between variables

if value is 0, the variables are not linearly correlated

80
Q

Is it possible that a correlation coefficient is negative? If so, what is the meaning of the coefficient?

A

Yes it is possible.

If correlation coefficient is negative, it means that a variable increases while another variable decreases.

81
Q

What is regression and linear regression?

A

Regression is to construct an equation to describe the relationship between 2 (or multiple) variables.

linear regression considers that there is a linear relationship between the two variables when constructing the equation (e.g. y = b1 + b2x),
where b1 is the intercept and b2 is the slope

82
Q

To obtain a linear regression equation, what coefficients will be calculated or estimated?
What is the meaning of these coefficients in terms of a linear equation?

A

coefficients: b1 and b2

b1 = independent coefficient (intercept) 
b2 = slope, shows the direction between the straight line and the x-axis
83
Q

What is the definition of residuals in linear regression?

If the residuals are higher/lower, how much quality will be expected from the regressed linear equation?

A

Residuals are the errors produced by the model between the predictor value and the measured value

A high residual value - poor accuracy of prediction of the model
A low residual value - accurate prediction by the model

84
Q

Can the method of linear regression be extended to non-linear situations? If so, how do we process the original data?

A

Yes.

Transform non-linear variable to a linear variable, and then linear regression can also be used.

85
Q

What is survival analysis? How is the principle of survival analysis applied to orthopaedics?

A

Survival analysis analyses how death ratios are changed with ages.

Can be applied to orthopaedics by assessing the effect of implants on surgery/patients, for example.

86
Q

What is a censored case?

A

Cases in which data cannot be collected or determined by their situations for some reasons not related to the factor studied

87
Q

What method is used to analyse censored cases?

A

survival analysis

88
Q

What main points should be noted when applying survival analysis?

A

Must have large sample size, and use the time period which has the most reliable data (i.e. the number of samples is large enough in the studied period).

89
Q

What is the main idea in Meta Analysis?

A

Studies multi-sources of data to provide a whole picture from all sources on an arguable issue, to see whether data favours one direction

90
Q

What main points should be noted when applying Meta analysis?

A

The data quality and format of all sources should be similar

91
Q

What is a relative ratio?

A

the number of cases with event to the number of samples

92
Q

What is an odds ratio?

A

the proportion of the number of cases to the number of non-cases

93
Q

What values are calculated in Meta-analysis?

A

Relative ratio
Odds ratio
Forest plot

94
Q

Definition of repeatability?

A

assessment of the variation between repeated measurements by a user or an instrument on the same object under the same conditions

95
Q

What is repeatability also sometimes called?

A

test-retest reliability

96
Q

What is the opposite term of repeatability?

A

Variability

97
Q

Why are reliability & repeatability relevant to the quality of medical instruments?

A

It is important to know if readings from an instrument are reliable. Or if different consultants who measure the same object will give similar outcomes.

98
Q

List the usual stages to check the difference/repeatability between two measurements?

A

1) check normality of data
2) select statistical method (e.g. t-test if ND, or non-parametric testing if non-ND)
3) find p-value to conclude whether 2 measurements are significantly different or not

99
Q

What is a one-sample t-test used for?

A

To assess whether the difference between 2 measurements is close to zero - a one-sample t-test against 0 is done.

100
Q

Why should the 95% CI of the differences between measurements be obtained?

A

To assess the quality of the measurements.
To say that when we do repeated measurements, there is a 95% chance the measurement will be within a specifc range (i.e. 95% CI)

101
Q

What are the calculations for Repeatability Coefficient?

A

RC = 2 x (SD of differences)

RC% = 100 x (2(SD of differences) / (mean1 + mean2)/2)

102
Q

What value of RC% means the system has good repeatability?

A

< 10%

103
Q

What type of data is used in repeatability coefficient?

A

Numerical

104
Q

What are the 3 types of coefficient used to assess if data is reliable?

A
  1. Repeatability coefficient
  2. Intra-class correlation coefficient
  3. Kappa coefficient
105
Q

What types of data can be used for the intra-class correlation coefficient?

A

Numerical - but can be dichotomous, ordinal or interval, but the format should be coded numerically

106
Q

What do the values of the ICC mean?

A

close to 0 = worst

close to 1 = best

107
Q

What type of data is used in Kappa Coefficient?

A

Ordinal

108
Q

What does Kappa’s coefficient measure?

A

measures the agreement between evaluations of 2 rankers both ranking the same object

109
Q

What doe the values of Kappa’s coefficient indicate?

A

close to 0 = agreement by chance

close to 1 = perfect agreement