RESS I: Data Anaylsis #1 Flashcards
What are the 5 As of practicing EBM?
Ask, Acquire, Apply, Apprais and Assess
What is a variable?
A particular characteristic being studied.
What is a dataset?
A collection of variables and observations.
What is categorical data?
Can only be assigned to a number of distinct categories e.g. sex, blood type.
What is numerical data?
Data that can take a numerical values e.g. age, weight
How can categorical data be subdivided?
Nominal: No natural ordering e.g. sex or blood type
Ordinal: Data can be ordered e.g. severity or disease stage.
How can numerical data be subdivided?
Continuous: Data can take any value e.g. weight.
Discrete: Whole values only e.g. number fo hospital visits.
What type of data is:
- Weight
- Sex
- Number of children
- Symptoms
- Disease Stage
- Weight
- BMI
- Pain (measured as ‘absent’, ‘mild’ or ‘severe’)
- Numerical continuous
- Categorical nominal
- Numerical discrete
- Categorical ordinal
- Categorical ordinal
- Numerical continuous
- Numerical continuous
- Categorical ordinal
What is quantitative data?
Numerical data. It is measurable data.
What is qualitative data?
Not numerical data
How do you graphically present categorical data?
Pie chart, Bar chart, Frequency distribution table
How do you graphically represent numerical data?
Histogram, Box and Whisker Plot
What are scatterplots used for?
To display relationships between numerical data (using tow continuous variables).
What does positively screw data look like on a histogram?
The bell-shaped distribution is shifted heavily to the right. Thinner ends are called tails
If one tail stretches out farther than the other, the histogram is skewed.
What does negatively screw data look like on a histogram?
The bell-shaped distribution is shifted heavily to the left. Thinner ends are called tails
If one tail stretches out farther than the other, the histogram is skewed.
What is the normal distribution?
A bell-shaped distribution that is symmetrical.
What is the explanatory variable?
The independent variable
What is the outcome variable?
The dependent variable
What type of descriptive statistics doe you use on categorical data?
Frequency, Proportion and Percentages
What type of descriptive statistics doe you use on numerical data?
Mode, Median, Range (and IQR), Standard Deviation
What is mean?
The average value. This is calculated by adding up the sum of the values and dividing this value by the total number of values.
What is the median?
Where the mid-point of the measurement values lies.
Defined as the value above and below which, half (50%) of the measurements lie.
To calculate the median:
- Sort observations in numerical order
- Find the mid point
- If two values lie at the mid point, average them
What is the mode?
The most common value.
What is the range?
The difference between the highest and lowest data value. This indicates the extreme within which all measurements lie.
What is standard deviation?
Summarises the average speed of values around the mean. The larger the standard deviation, the further away the values are, on average, from the mean i.e. the more spread out are the values
How do you calculate the standard deviation?
- Calculate the mean
- Subtract the mean from every value
- Square these new values, and add up
- Divide this total by (n-1) = variance
- Take the square root = standard deviation
How do you calculate interquartile range?
1, Order the data
- Divide into two halves using the median (exclude median)
- Lower quartile = median of lower half
- Upper quartile = median of upper half
This shows the spread of values around the median.
What measures do you use for normal distributed data?
Mean and standard deviation
can also use mode and range
What measures do you use for skewed data?
Median and interquartile range
can also use mode and range
What are Box and Whisker Plots?
Box-plots are graphical representations of the average, spread and extreme values. They display the median, inter-quartile range and range. They can be used for data which are skewed (or not).
If data are normally distributed, median is approximately equal to the mean.