IRM describing data Flashcards
Statistics help us to:
describe complex information in a simple manner
Learning from data in 3 steps
Previous Steps
Form a hypothesis
Test: Does data support hypothesis?
Aggregated data is
Summarised data relative to categories or levels
Raw data
Is data that is not aggregated, basically just all the information accumulated but not organised
Variables can be
Continuous or discreet and Independent or dependent
An example of a discreet variable
discrete category such as male or female, on or the other, not both
An example of a continuous variable
ie a score can lie anywhere on a continuum i.e 1-100
4 levels of measurement properties in variables are
Nominal, Ordinal, Interval or Ratio
Nominal variable
Arbitrary numerical value (this variable gives the least information) ie 1 for male 2 for female, arbitrary in the way the female is a greater number than the female, yet it means nothing
Ordinal Variable
Rankings on a class test are an example of Ordinal value. can order from high to low, but may not give further information such as the difference between the highest and lowest
Interval Variable
Interval variables have a consistent unit of measurement and the numerical difference between any two values is meaningful. In contrast to nominal and ordinal variables, interval variables allow for meaningful mathematical operations, such as addition and subtraction
ratio Variable
A ratio variable is a type of quantitative variable in statistics that has a meaningful zero point and can be measured on a continuous scale. In other words, the values of a ratio variable can be expressed as a ratio of two numbers, where the denominator is not equal to zero. Examples of ratio variables include length, weight, age, height, income, and many others.
Unlike interval variables, ratio variables have a true zero point, which represents the absence of the measured attribute. This allows for meaningful comparisons between measurements using ratios and proportions. For example, if one person’s income is twice that of another person, it means they earn twice as much money, not just that they earn more.
epistemic
things we don’t known because of a lack of data or experience
aleatoric
things that are simply unknown, like what number a die will show on the next roll
Uncertainty defined
Uncertainty relates to how the estimate might differ from the “true value” and these measures help users of ONS statistics to understand the degree of confidence in the outputs
4 measures of uncertainty
standard error
confidence interval
coefficient of variation
statistical significance
2 different types of samples
representative: proportionate
convenient sample: potential for bias
whats some alternative words that are less likely to infer causation (ie rather than effect as that infers causality, we can say:)
association, relationship
What makes a good measurement
Validity and Reliability
What questions the measurement and it make sense on its face?
Face validity
What validiity verifys if the measurement is related to other measurements in an appropriate way
Construct validity.
different measurements are closely related to one another is known as
Convergent validity
( If my theory of personality says that extraversion and conscientiousness are two distinct constructs, then I should also see that my measurements of extraversion are unrelated to measurements of conscientiousness.)
measurements thought to reflect different constructs should be unrelated, known as
divergent validity
( If my theory of personality says that extraversion and conscientiousness are two distinct constructs, then I should also see that my measurements of extraversion are unrelated to measurements of conscientiousness.)
If our measurements are truly valid, then they should also be predictive of other outcomes
Predictive validity.
All variables must take on at least two different possible values otherwise they would be a ….
constant rather than a variable)
different values of the variable can relate to each other in different ways, which we refer to as:
scales of measurement.
here are four ways in which the different values (features) of a variable can differ.
Identity:
Magnitude:
Equal intervals:
Absolute zero:
Different values: Each value of the variable has a unique meaning.
Identity
Different values:
The values of the variable reflect differently ——- and have an ordered relationship to one another – that is, some values are larger and some are smaller.
Magnitude(s)
Different Values:
Units along the scale of measurement are equal to one another. This means, for example, that the difference between 1 and 2 would be equal in its magnitude to the difference between 19 and 20.
Equal intervals
Different values:
The scale has a true meaningful zero point. For example, for many measurements of physical quantities such as height or weight, this is the complete absence of the thing being measured.
Absolute zero:
There are four different scales of measurement that go along with these different ways that values of a variable can differ.
(4 ways to measure the variable)
Nominal scale.
Ordinal scale.
Interval scale.
Ratio scale.
Values of a Variable: satisfies the criterion of identity, such that each value of the variable represents something different, but the numbers simply serve as qualitative labels as discussed above. For example, we might ask people for their political party affiliation, and then code those as numbers: 1 = “Republican”, 2 = “Democrat”, 3 = “Libertarian”, and so on. However, the different numbers do not have any ordered relationship with one another.
Nominal scale.
Values of a variable
satisfies the criteria of identity and magnitude, such that the values can be ordered in terms of their magnitude. For example, we might ask a person with chronic pain to complete a form every day assessing how bad their pain is, using a 1-7 numeric scale. Note that while the person is presumably feeling more pain on a day when they report a 6 versus a day when they report a 3, it wouldn’t make sense to say that their pain is twice as bad on the former versus the latter day; the ordering gives us information about relative magnitude, but the differences between values are not necessarily equal in magnitude.
Ordinal scale.
Values of a variable
has all of the features of an ordinal scale, but in addition the difference between units on the measurement scale can be treated as equal. A standard example is physical temperature measured in Celsius or Fahrenheit; the physical difference between 10 and 20 degrees is the same as the physical difference between 90 and 100 degrees, but each scale can also take on negative values.
Interval scale.
Values of a variable
has all four of the features outlined above: identity, magnitude, equal intervals, and absolute zero. The difference between a —- scale variable and an interval scale variable is that the —– scale variable has a true zero point. Examples of —— scale variables include physical height and weight, along with temperature measured in Kelvin.
Ratio Scale
What are the two primary reasons for paying attention to the scale of measurement of a variable in research and statistics?
The scale determines the types of mathematical operations applicable to the data.
It indicates the kinds of statistics that can be computed on each type of variable.
What mathematical operations can be applied to nominal variables
Nominal variables can only be compared for equality, meaning whether two observations have the same numeric value. (since they don’t really function as numbers in a nominal variable, but rather as labels)
What mathematical operations can be performed on ordinal variables?
Ordinal variables allow comparison for greater or lesser values, but arithmetic operations are not possible.
What mathematical operations are permitted on interval and ratio variables
Interval variables allow addition and subtraction, while ratio variables also permit multiplication and division.
Can you provide examples of variables for each scale of measurement?
Answer: Nominal examples include gender or ethnicity. Ordinal examples include rankings like socioeconomic status. Interval examples include temperature in Celsius or Fahrenheit. Ratio examples include height or weight.
Question: What types of statistics are appropriate for nominal variabes
Answer: Nominal variables can have statistics like mode.
What types of statistics are appropriate for ordinal variables?
. Ordinal variables can have median-based statistics.
What types of statistics are appropriate for interval variables?
Interval (and ratio) variables can have mean-based statistics.
Descriptive representations of information is known as what type of data
Qualitative (often interviews and focus groups)
What types of statistics are appropriate for ratio variables?
Ratio (and interval) variables can have mean-based statistics.
Qualitative data is collated and analysed to identify
Key Themes
Numerical representations of information is know as what type of data?
Quantitative Data (often observational and experimental studies)
Binary data can also be called…
discrete
A variable may also be called a
factor
a fundamental aspect of a variable is that it…..
varies
if something is not a variable it is a
constant
3 types of data
Binary, integer and Continuous
What is a term to describe a limited or non diverse sample that may not be representative of the population
Highly selective (ie all marshmellow test participants went to stanford daycare)
What is binary data?
Binary data consists of only two possible values, typically represented as 0s and 1s.
What are integer numbers?
Integer numbers are whole numbers without any decimal or fractional parts.
What are real numbers?
Real numbers include all rational and irrational numbers, encompassing integers, fractions, and decimals.
What are the key characteristics of binary data?
Binary data is discrete, with only two possible values, and is commonly used in digital computing and communication systems.
What are the key characteristics of real numbers?
Real numbers are continuous and infinite, covering all possible values on the number line, including integers and fractions.
Name some key characteristics of integers?
Whole Numbers
No Fractional Parts
Infinite Set
Ordered
Closure Under Addition and Subtraction
Closure Under Multiplication (with Exceptions)
No Decimal or Fractional Representation
——– measures the spread or dispersion of a set of data points around their mean. It quantifies how much the values in a dataset differ from the mean value.
variance
How does variance contribute to our understanding of statistical data?
Variance provides insight into the variability or spread of data points within a dataset, helping to assess the consistency or volatility of the data around the mean.