Understanding Data Flashcards
Why are statistical methods important?
- Social sciences
- Epidemiology
- Business and marketing
- used for evidence based research
Define data analysis.
The process of inspecting, cleansing, transforming, and modelling data with the aims of gaining some useful
insight (or information) to help support decision making.
What is the DIKW pyramid?
DIKW is a useful framework for describing the relationship, or structural ‘stages’ one must go through to gain knowledge and wisdom.
What does DIKW stand for?
Data - Information - Knowledge - Wisdom
Define data in terms of DIKW.
Raw facts.
Define information in terms of DIKW.
Contents of a database assembled from raw facts.
Define evidence in terms of DIKW.
Results of analysis of many datasets or scenarios.
Define knowledge in terms of DIKW.
Personal knowledge about places and issues.
Define wisdom in terms of DIKW.
Policies developed and accepted by stakeholders.
List the three main facets statistics is composed of.
- design
- description
- inference
Describe the design part of statistics.
How to collect the data (i.e., probabilistic sampling approaches).
Describe the description part of statistics.
- Describing the way the data looks
- Summarising the data that has been collected
Describe the inference part of statistics.
- Making predictions about the wider population or about the future
- Specifically, statistical inference
Define population.
The entire possible set of subjects we wish to study e.g. states, individuals, businesses..
Define sample.
The subset of subjects chosen for study through data collection.
Define parameter.
A numerical summary about the OVERALL population.
Define statistic.
A numerical summary of the sample data.
Why do we tend to use statistics instead of parameters?
Because we rarely know true population parameters.
Which two bits of information does statistics contain?
- A measure of central tendency
- A measure of variability
Define variable.
Anything that we can measure about the subjects in our sample.
What falls under continuous variables?
- Interval
- Ratio
What falls under categorical variables?
- Nominal
- Ordinal
Describe the levels of measurement from lowest to highest.
(Lowest) Nominal «_space;Ordinal «_space;Interval «_space;Ratio (Highest)
Define discrete variables.
Contains data with countable items e.g. number of crimes in London in the last month, number of students in a class..
Define continuous variables.
Contains data with measurable items, e.g. Age (in years: 25, 57, etc.), height (in meters)
Define categorical variables.
Has categories or groups, e.g., gender, ethnicity, employment status etc
List the characteristics of nominal measures.
- Categorical measure
- Discrete set of categories with no natural order
- Used to distinguish groups with labels
- May be referred to as a qualitative or categorical variable
- It is the lowest level of measurement
Give examples of nominal measures.
e.g. Gender:
0 = Female
1 = Male
e.g. Race:
1 = Asian
2 = Black
3 = White
List the characteristics of ordinal measures.
- Categorical measure
- Discrete set of categories that have some natural order
- Their categories have rankings but difference between rankings is not known
- Order matters!
- It is the 2nd lowest level of measurement
Give examples of ordinal measures.
- Likert scale (strongly disagree, disagree, neutral,
agree, strongly agree) - Socioeconomic status
1 = Working class (Low)
2 = Middle class
3 = Upper class (High
List the characteristics of interval measures.
- Continuous measure
- Unlike ordinal variable, difference between categories are known and equal (-must be known to calculate an interval)
- Zero is arbitrary (meaning that whatever observation you measure it does not indicate that its nonexistent)
- 2nd best level of measurement
Give examples of interval measures.
e.g. Temperature in degree Celsius: difference between 78 degrees and 79 degrees is the SAME as the difference between 45 and 46 degrees
- Measure of zero degrees Celsius doesn’t indicate that there is no temperature – it only means that its temperature at zero is at freezing point
List the characteristics of ratio measures.
- Continuous measure
- Most precise
- Exact value
- Unlike interval measure, a zero value means that there’s “nothing” there (not arbitrary)
Give examples of ratio measures.
- Weight
- Height
- Income
- House price
Define a dependent variable (outcome,event).
The variable to be explained, described or understood.
How is the dependent variable mathematically denoted?
As the variable Y.
List two characteristics of dependent variables.
- Dependent variable should be dependent upon something else
- Should NOT affect the independent variable
Why should dependent variables vary?
If you have a constant DV, you will not be able to explain the effect of other variables on it.
Define an independent variable.
Presumed as the determinant or cause, or something that impacts the dependent variable.
What other terms can be used to describe the independent variable?
Explanatory or predictor variables and risk factors.
How is the independent variable mathematically denoted?
X
List the 3 types of descriptive statistics.
- Univariable analysis
- Bivariable analysis
- Multivariable analysis
Define univariable analysis.
Analysis of only one variable on some characteristic.
Give examples of univariable analyses and describe them.
- Frequency Distributions - a count or distribution of values on some single variable
- Other descriptive statistics – some summary measure that describes the data in a way not obvious by looking at the frequency distribution
Define bivariable analysis.
Analysis of two variables.
Give an example of a bivariable analysis.
- Simple scatter plots
- Cross-tabulations
Define multivariable analysis.
Analysis of three or more variables.