Intro To Statistics Flashcards
What are statistics useful for ?
- to make informed decisions
- to establish cause and effects
- to predict what is likely to happen in the future
- to ensure credibility
- to prepare for emergencies
- to test the effectiveness of interventions
- to improve the quality of assessment procedures, treatment procedures and services
What are the domains of statistics ?
- descriptive
- correlational
- inferential
What is the aim of descriptive statistics ?
Organizing, summarizing, describing data
What is the aim of correlational statistics ?
Relationships
What is the aim of inferential statistics ?
Generalizing
What is datum ?
Datum is a value/ symbol assigned to a quantitative or qualitative variable
What is a variable ?
It is a logical attribute or a set of attributes of a thing, a place or a person
What is measurement ?
Assigning numbers to objects, events, observations or abstract concepts according to a known set of rules which permits data to be categorized, quantified and/or analyzed so that meaningful conclusions can be drawn.
What is a dependent variable ?
Variable affected by the independent variable
It responds to changes in the independent variable
It’s the presumed effect
What is the independent variable ?
Variable is presumed to influence the other variables
It is the presumed cause (could be caused by something else) : we have to prove the hypothesis statistically through inferential statistics
What are the two types of data ?
Quantitative
Qualitative
what is quantitative data ?
The values of the variables are numerical
What is qualitative data ?
The values of the variables are not numerical
What are the two types fo quantitative variables ?
- Descrete variables
- continuous variables
And further : - interval
- ratio
What are the characteristics of a discrete variable ?
It is collected through counting
It is non decimal
It may have derived units of measurement
The level of measurement is interval or ratio
What are the characteristics of continuous variable ?
Collected through measurement
Is decimal
It systematically has a unit of measurement
The level of measurement is interval or ratio
Is average a continuous or discrete variable ?
Continuous
What are the types of qualitative data ?
- ordinal
- nominal
What are the different scales of measurement ?
Qualitative :
- nominal
- ordinal
Quantitative :
- discrete -> interval/ratio
- continuous -> interval/ratio
Which scale doesn’t have a hierarchy ?
Nominal scale
Which scales are in ranked order ?
Ordinal
Discrete
Continuous
Give an example of ordinal scale ?
Muscle power
Give an example of interval scale ?
BMI
Give an example of ratio scale
Speed of run
What are the criteria to classify variables ?
- can the variable be measured or counted ?
- are observations in ranked order ?
- are observations separated by equal distance or size ?
- does the variable have a meaningful zero value ?
What are the characteristics of the nominal scale ?
Names to identify or categorize
Useful for quantifying qualitative data
No order, no magnitude, no comparison
Observations must fall into only 1 category
There must be sufficient categories for all observations
Cannot be counted nor measured
There is no order or hierarchy
No interval
No meaningful zero
What are the characteristics of ordinal scale ?
Symbols/names of order and or rank
Used to arrange data into series or order of occurrence
Intervals between points are either unknown or uneven
Observations must fall into 1 category
There must be sufficient categories for all observations
- not measured or counted (some exceptions)
- have rank order or size or chronology
- do not have equal distances or equal sizes between categories
- do not have meaningful zero value (some exceptions)
What are the exceptions in ordinal scales ?
- some ordinal data are discrete and quantitative
- some ordinal data have meaningful zero
Give examples of ordinal scales with a meaningful zero value
Medical research council scale (muscle power)
Modified Ashworth scale (spasticity)
Tendon jerk grading (reflex)
What are the characteristics of interval scale
Observations are classified into mutually exclusive and exhaustive categories
There is a clear relationship between observations
Observations have a common unit of measurement
Order and magnitude of observations are known
Equal magnitude/interval/size between contiguous points
No meaningful zero value
- can be counted or measures
- have rank order or size
- have equal distance between categories
- do not have meaningful zero
What are the characteristics of ratio scale ?
All characteristics of ratio scale except that it also has a meaningful zero value
- either counted or measured
- have rank or size order
- have equal distances between categories
- have meaningful zero value
What is «meaningful zero value» ?
It is not possible to have negative values
Is there an absolute zero ?
If values under zero are possible then zero is not meaningful = no meaningful zero value
What are the different types of data collection techniques ?
Observations
Tests and assessments
Surveys
Documents analysis (published articles)
Interviews
What is secondary data ?
Data someone else has collected
What is primary data ?
Data you collected
What are the disadvantages of secondary data ?
- may be out of date
- may not have been collected long enough for detecting trends
- there may be missing information
- it may be incomplete
- you have no control over data quality
What are the advantages of secondary data ?
- saves time
- saves money
- easily accessible
- increases the feasibility of multi enter/ international collaboration (data can be shared by researchers)
What are the challenges of primary data collection ?
- can be expansive to collect
- selection of population or sample
- difficulty recruiting participants
- protesting/piloting the instrument to determine the presence or absence of measurement bias
What is a sample ?
Is a smaller group with similar characteristics from within a population
A subset of the population. It is important that the subset represents the population to insure internal validity
What is a population ?
Is a group that have something in common
What are the probabilistic sampling methods ?
Simple random
Stratified random
Systematic random
Clustered random
What are the non probabilistic sampling methods ?
Convenience
Purposive
Snowball
What are descriptive statistics used for ?
- to summarize data
- to describe data
- to present data
What are the different types of descriptive statistics ?
- measures of frequency : count, percent and frequency
- measures of central tendency : mean, median and mode
- measures of dispersion or variability : range, variance, standard deviation and interquartile range
- measures of position and rank : percentile ranks, quartile
What is the purpose of measures o frequency ?
shows how often an observation occurs
What is the purpose of measures of central tendencies ?
locates the distribution by various points ; shows the average or the most common score
What is the purpose of measures of dispersion or variability ?
Identifies the spread of scores by stating intervals
What is the range ?
Range = high / low points in the dataset
What is the variance or standard deviation ?
Variance or standard deviation = difference between observed score and the mean
What is the interquartile range ?
Is the difference between the third quartile and the first quartile
The difference between the lowest value and the highest value in a dataset
Range = (maximum value-minimum value)
What is the purpose of measures of position and rank ?.
Describe how scores fall in relation to one another
Compares scores to a normalized score
How to calculate the mean ?
Also called average
Add up the values for each case and divide by the total number of cases
Excel function = AVERAGE ()
What can affect the mean ?
Outlier
What is the median ?
When all the values of a dataset are ranked in order, the value that is in the position that divides all the values int two equal halves is the median. It’s the same as 50th percentile.
Excel function = MEDIAN ()
What is the mode ?
The mode is the observation with the largest frequency
Excel function = MODE ()
There can be several modes (ex: bimodal)
A range can represent the mode
Can we find the mean in class intervals ?
No, only an estimation of the mean, using midpoint of each class
What is the formula to calculate a percentile ?
(C+0,5F)/N
Where :
C is the number of observations lower than the observation of interest
F the frequency of observation of interest
N the total number of observations
What happens when using the excel formula PRECENTILE EXC ?
You cannot calculate the lowest percentile (0 percentile) nor the highest percentile (100 percentile), you can only calculate before the 1st and 99th percentile
What happens when using the excel formula PERCENTILE INC
You calculate the lowest percentile (0 percentile) and the highest percentile (100th percentile)
We prefer to use this formula
What is the interquartile range ?
Q3-Q1
What is Q1 ?
The value occupying the 1/4 position of all values arranged in a ranked order (or median of the 1st half of all observations
What is Q3 ?
The value occupying the 3/4 position of all values arranged in a ranked order. For the median of 2nd half of all observations.
What is variance ?
Variance is a measure of how close together or far appart the values in a dataset are
The larger the variance, the further the individual values are from the mean
The smaller the variance, the closer the individual values are to the mean
Excel function : VAR()
What is standard deviation ?
The square root of variance
What is a positively skewed frequency distribution ?
Right tailed
It is not normal, it is asymmetrical
What is a normal distribution ?
It is symmetrical
Mean value = mode = median
What is a negative skewed frequency distribution ?
Left tailed
It is not normal it is asymmetrical