Basic Statistics Flashcards
What is an observation ?
The units on which we measure data, such as persons, cars, animals… are called observations.
What is a population ?
A collection of all units
What is a sample ?
A selection of n observations. A sample is always a subset of the population
What is a qualitative variable ?
Variables which take value that cannot be ordered in a logical or natural way.
What is a quantitative variable ?
Variables that represent measurable quantities. The values which these variables can take can be ordered in a logical and natural way.
What is a graphic ?
It represents the relationship between two or more variables
It is an alternative way to summarize a variable’s information
It provides clues that words and equations do not
It is great tool to form hypotheses and draw conclusions
What is a disadvantage of graphs
They can be inaccurately interpreted, resulting in incorrect answers or conclusions
What is the pie chart used for ?
Used to visualize the absolute and relative fréquences of nominal (categorical) and ordinal variables
What is the bar chart used for ?
Used to visualize the absolute and relative frequencies of observed values of a variable. Can be used for nominal and ordinal variables.
What is the histogram used for ?
Used to visualize the distribution of values of continuous variables.
What are the differences between bar charts and histograms ?
Histograms shows the distribution of variables whereas bar charts compare variables
Histograms show quantitative data whereas bar charts show categorical data
The bars in an histogram cannot be reordered
What is line graph used for ?
Used to visualize quantitative data collected over a specific topic and a pecific time interval.
Data points are connected by a line, and they represent the observation.
What are box plots used for ?
Used to visualize the distribution of data based on a five number summary : minimum, first quartile, median, third quartile, maximum.
What is Q2 ?
The middle value of the data = the median
What is Q1 ?
The lower quartile, the middle number between the smallest and the median
What is Q3 ?
The upper quartile, the middle value between the median and the highest value
What is the interquartile range ?
From Q1 to Q3
How to determine the lower extreme in a blox plot graph ?
Lower extreme = Q1-1,5*IQR
Where IQR = Q3-Q1
How to determine the upper extreme in a box plot chart ?
Upper extreme = Q3+1,5*IQR
Where IQR=Q3-Q1
What are scatter plots used for ?
Used to visualize the relationship between two quantitative variables measured on the same individuals.
It is useful to visually detect outliers
It shows the type of relationship between two variables
What are tables useful for ?
Used to present results from research, e.g., within or between-group comparisons.
What is an outlier ?
An outlier represents a value distant from the rest, due to variability or error.
Outliers are value more than 1,5 time the IQR
How to detect an outlier ?
- visually inspect data using a scatter plot or box plot
- use Tukey rule to detect outliers :
Q1-1,5IQR
Q3+1,5IQR
What is a correlation ? What is it useful for ?
Correlation is used to test the relationship between variables (quantitative or categorical)
It is a measure of how things are related.
Some correlations are high
Some correlations are low
It is useful to make predications about future events
Which graph is the most appropriate to read a correlation ?
The scatter plots graph
Adding a trend line will help to show the tendency behavior between variables
How to read a correlation between interval or ratio variables ?
Using the correlation coefficient or Pearson coefficient «r»
«r» describes the strength and direction of the linear association between two continuous (interval or ratio) variables.
«r» varies from -1 (negative strong correlation ) to 1 (positive strong correlation)
0=no correlation
How to read a correlation between qualitative ordinal scale variable ?
Using the Spearman correlation coefficient
In the context of correlation of ratio/interval variables, what is «r square» ?
The coefficient of determination : the ratio of the amount of variance explained by the regression model to the total variation in the data
What is reliability ?
It is the overall consistency of a measure.
There is high reliability if a measure produces similar result under consistent conditions
What is the reliability test for categorical variables ?
Percent agreement or k-statistics - Cohen’s K
What does the k-statistic determines ?
It determines how well an observation produces the same value for the same patient on repeated measurements (ideally two examiners)
It determines:
- intra and inter examiner reliability
- intra and inter session reliability
How to calculate the % of raw agreement ?
((Sum of normal observations)+(sum of abnormal observations))/total observations
We do the sum of agreed observations and divide them by that total number of observations
What is the purpose of crosstabulation ?
The purpose of crosstabulation is to show in tabular format the relationship between tow or more categorical variables.
How to interpret K-statistics
0-.59 : weak
0.60-0.79 : moderate
0.80-0.90 : strong
Above 0.90 : almost perfect
What means k=0
Represents the amount of agreement that can be expected from random chance
What means k=1
Represents the perfect agreement between the raters
What means k=-1 ?
Represents great disagreement among raters (or no agreement)
For what is k-statistic used for ?
It is used as a measure for quantifying agreement beyond chance for categorical variables
How to determine K in Kappa statistics ?
K=((Po-Pe)/(1-Pe)
Where :
Po is the percent agreement observed = raw % agreement
Pe is the percent agreement expected
What is the coefficient of variation ?
For continuous variables, the coefficient of variation (CV) provides a very simple way to determine the relationship between the standard deviation and the mean of two sets of observations
Values close to zero show minimal variation
How to determine the CV ?
CV=(standard deviation/mean)*100
What is the interclass correlation coefficient ?
Is another reliability measure to use in continuous variables
Ranges between 0 and 1, ans is always associated to a 95% confidence interval
What is the standard error measurement ?
A test of reliability
An estimation of the expected random variation in scores when no real change has taken place
What is detectable difference (or change)
A test of reliability
The minimum amount of change that needs to be observed at either the group or individual level for it to be considered a real change
What is inferential statistics ?
It refers to the generalization of results from a sample of participants to the whole population.
Why is it helpful to use inferential statistics ?
-making inferences about the population from the sample
- concluding whether a sample is significantly different from the population
- if one model is significantly better than the other
- hypothesis testing in general
What is the most used method of inferential testing ?
Hypothesis testing
What is hypothesis testing ?
It determines the probability (p-value) of difference, or non-difference between groups
In hypothesis testing, what is p-value ?
The p-value provides evidence against the null hypothesis H0
The smaller p-value is, the stronger the evidence against H0 and in favor of the alternative hypothesis Ha
If p-value is equal or inferior to 0,05, then H0 is rejected in favor of Ha
What is the null hypothesis ?
The one we want to disprove
What is the Chi-square test used for ?
Is used to determine if there is a significant association between two categorical variables
The test compares the observed values with the expected.
What determines the T-test ?
The t test tells us how significant the difference between group means are
What are the requirements for the T-test?
-continuous variables
-normal distribution
-equal variance in the samples
What are the three types of T-tests ?
Paired sample T-test
Independent sample T-test
One-sample T-test
What is the paired samples T-test ?
Used to compare the means between two sample from the same group/individual
Comparing the means of two conditions. Where the same people are in both groups
What is the indépendant samples T-test used for ?
Used to compare means between two samples from different groups/individuals
Comparing the means of two different groups
What is the one-sample T-test used for ?
Comparing the mean of a sample with a pre-specified mean
What are the 3 questions answered to by each type of T-test ?
- can I be certain that the difference between groups is not due to random chance ?
- how big is the difference ?
- is this difference important ?
In T-student test, what is t-value ?
The degree to which the difference can be explained by the group
It is compare to a threshold value
t>0,05 : assumes Ha, and the difference is significant
t<0,05 : assumes H0 the difference is NOT significant