Defining the Data Flashcards
What is a population?
The collection of all the individuals of interest
What is a sample?
The subset of the population that is selected as the
result of sampling
What is a biased sample?
Study participants are not representative of the target population
What is an unbiased sample?
Study participants are representative of the target population
What is validity?
the extent to which the instruments that are used in the study measure exactly what they should be measuring
What is reliability?
the extent to which the results of the study are consistent when the study is repeated under the same conditions
What is a variable?
something whose value can change or vary
What is data?
the values we obtain when we measure a variable
What are the two type of variables?
1, Categorical “attributes”
2. Quantitative “numbers”
What are the two types of categorical attributes? And their meanings?
Nominal: Values are “names” that are unordered categories
Ordinal: Values are “names” that are ordered categories
What are the two types of quantitative numbers? And their meanings?
Discrete: Values are integer values 0, 1, 2 … on a proper numeric scale
Continuous: Values are a measured number of units, including possible decimal values
What are the two types of “continuous” quantitative numbers? And their meanings?
Interval: Interval scale variable has no true zero on the scale
Ratio: Ratio scale variable has true zero on the scale (0 just means the absence of something)
What is derived variables?
variables that you create by calculating or categorising variables that already exist in your data set
What are the two different types of derived variables?
Calculated
Categorized
What is threshold variables?
variables obtained by splitting the values of another variable into categories based on the values of well-known thresholds
What is a transformed variable?
a variable which has been transformed from another variable with a different measurement scale (ex. square rooting numbers, squaring…)
What is an exposure variable?
a variable thought to predict an outcome variable
What is an outcome variable?
a variable thought to change as a function of changes in an exposure variable
What is the Center?
A representative or average value that indicates where the middle of the data set is located
What is variation in data?
A measure of the amount that the values vary among themselves from the average value
What is distribution in data?
The nature or shape of the distribution of data (such as bell-shaped, uniform, or skewed)
What are outliers in data?
Sample values that lie very far away from the vast majority of other sample values
What is time in data?
Changing characteristics of the data over time
What are the measures of central tendency?
Means, medians & modes
What is the central tendency?
the tendency for values in a group to cluster around a central or ‘average’ value which is typical of the group
do extreme values affect the median?
Nope
Do extreme values affect the mean?
Yep
Do extreme values affect the mode?
Nope
What is dispersion? (variability, scatter, spread)
how stretched or squeezed a distribution of values within a sample or a dataset is
which percentiles are good summary of a sample?
the “Five Number Summary” (P0, P25, P50, P75, P100)
what are the measures of dispersion
Range, interquartile range, and standard deviation
What does a small standard deviation mean?
most data points are close to the mean
What does a large standard deviation mean?
data points are widely spread from the mean
What is a percentile?
is a measure that indicates the value below
which a given percentage of observations in a group of observations fall
How do you calculate the IQR (interquartile range)?
Q3 - Q1
What is Q1?
25%
What is Q3?
75%
What is the formula of median when the sample is odd?
[𝑛+1]/2
What is the formula of median when the sample is even?
([𝑛/2] , [𝑛/2+ 1])
A garden contains 39 plants. The following plants were chosen at random, and their heights were recorded in cm: 38, 51, 46, 79, and 57. Calculate their heights’ standard deviation.
https://byjus.com/maths/standard-deviation-questions/
SD indicates the variation where the what is the measure of central tendency?
Mean
IQR indicates the variation where the what is the measure of central tendency?
median
What is inferential statistics?
statistics used to make inferences based on
relationships found in the sample to relationships truly exist in the
population
What is Descriptive statistics?
statistics used to describe, show or summarize data
in a meaningful way (take pictures of data)
What are the two types of statistics?
descriptive statistics and inferential statistics
What is a theory?
a generalization about a phenomenon (explanation of how or
why something occurs)
What is a hypothesis?
a proposed explanation made on the basis of limited
evidence as a starting point for further investigation (without any
assumption of its truth)
What are the steps of the research process?
- Initial observation (Research question)
- Generate theory
- Generate hypothesis
- Collect data to test hypothesis
- Analyse data
Why is data important?
Identifying problems
Planning & making informed decisions
Monitoring/evaluating progress
Test hypotheses & make inferences about populations of interest
What is the formula to calculate percentages?
L = sample size [𝑑𝑒𝑠𝑖𝑟𝑒𝑑 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒/100]
If L is whole number use average of the L and (L+1).
If L is not whole number round to the next whole number