Biostatics 2 Flashcards
What is a variable in the context of research?
A variable is any characteristic, number, or quantity that can be measured or counted.
What variables are you going to measure on your sample?
The specific variables to measure depend on the research study but may include demographic information, clinical outcomes, lab results, and survey responses.
Where will the data for your variables come from?
The data can come from various sources such as clinical records, questionnaires, clinical measures, and biological specimens.
What are clinical records used for in research?
Clinical records provide detailed patient information including medical history, treatment plans, and outcomes, which can be used to measure clinical variables.
How are questionnaires used in research?
Questionnaires are used to collect data directly from participants about their experiences, behaviors, attitudes, and other subjective measures.
What are clinical measures?
Clinical measures are objective assessments obtained through physical exams, lab tests, imaging studies, and other medical evaluations.
What are biological specimens, and how are they used in research?
Biological specimens, such as blood, urine, or tissue samples, are used to obtain biochemical and genetic data
What types of data can variables be classified into?
Variables can be classified into numerical data and categorical data.
What is numerical data?
Numerical data represent quantities and can be measured. They include continuous data (e.g., blood pressure, weight) and discrete data (e.g., number of hospital visits).
What is numerical data?
Numerical data represent quantities and can be measured. They include continuous data (e.g., blood pressure, weight) and discrete data (e.g., number of hospital visits).
What is categorical data?
Categorical data represent characteristics and can be divided into groups. They include nominal data (e.g., blood type, gender) and ordinal data (e.g., pain scale ratings, stages of cancer).
What is categorical data?
Categorical data represent characteristics and can be divided into groups. They include nominal data (e.g., blood type, gender) and ordinal data (e.g., pain scale ratings, stages of cancer).
How do you differentiate between continuous and discrete numerical data?
Continuous data can take any value within a range (e.g., height, weight), whereas discrete data can only take specific, separate values (e.g., number of children, number of hospital visits).
How do you differentiate between nominal and ordinal categorical data?
Nominal data have categories with no inherent order (e.g., blood type, eye color), while ordinal data have categories with a clear, ranked order (e.g., education level, pain severity)
Why is it important to classify variables into numerical and categorical types?
Classifying variables helps determine the appropriate statistical methods for analysis and how the data should be collected and interpreted.
What is an example of a numerical variable in clinical research?
An example of a numerical variable is the patient’s age or systolic blood pressure.
What is an example of a categorical variable in clinical research?
An example of a categorical variable is the patient’s blood type or gender.
What is a univariate statistical description?
A univariate statistical description involves analyzing a single variable to summarize and find patterns in its data. It includes measures like mean, median, mode, variance, and standard deviation.
What are the common measures of central tendency used in univariate analysis?
The common measures of central tendency in univariate analysis are the mean, median, and mode.
What measures of variability are used in univariate analysis?
Measures of variability in univariate analysis include range, variance, standard deviation, and interquartile range.
What is a bivariate statistical description?
A bivariate statistical description involves analyzing the relationship between two variables. It includes examining how one variable changes in relation to the other
What graphical methods are used in bivariate analysis?
Common graphical methods for bivariate analysis include scatter plots, line graphs, and bar charts.
What is a scatter plot?
A scatter plot is a type of graph used in bivariate analysis to display the relationship between two quantitative variables by plotting data points on a two-dimensional axis.
What statistical methods are used to describe relationships between two variables in bivariate analysis?
Methods include correlation coefficients (like Pearson’s r), regression analysis, and cross-tabulation.
What is the purpose of bivariate analysis?
The purpose of bivariate analysis is to explore the relationship between two variables, determine the strength and direction of their association, and to make predictions.
Give an example of a research question that involves univariate analysis.
An example of a research question for univariate analysis is, “What is the average age of participants in a study?”
Give an example of a research question that involves bivariate analysis.
An example of a research question for bivariate analysis is, “Is there a relationship between hours of study and exam scores among students?
What are the limitations of univariate analysis?
Univariate analysis cannot determine relationships or causation between variables and provides only a summary of the data for a single variable.
Why is it important to use both univariate and bivariate analyses?
Using both univariate and bivariate analyses provides a more comprehensive understanding of the data, allowing for summary statistics and exploration of relationships between variables.
What are examples of numerical data?
Examples of numerical data include age in years, height in cm, and length of stay in a hospital.
Why is the numerical value significant in numerical data?
The numerical value has meaning because it quantifies characteristics or attributes, allowing for precise measurement and analysis.
What is continuous data?
Continuous data can take an infinite number of possible values within a given range. For example, height can be 160 cm or 160.523 cm.
What is discrete data?
Discrete data can be counted and only take on whole number values. For example, the number of nights spent in a hospital or the number of doses of medication missed.
Why is it important to consider the distribution of numerical variables?
Considering the distribution of numerical variables is crucial to determine the best methods for summarizing and analyzing them.
What graphical methods are used to examine the distribution of numerical data?
Histograms and box & whisker plots are commonly used to examine the distribution of numerical data.
What are summary statistics?
Summary statistics are measures that describe the central tendency and variability of a data set.
Which summary statistics are used for symmetrically or normally distributed data?
For symmetrically or normally distributed data, the mean and standard deviation are used.
Which summary statistics are used for skewed data?
For skewed data, the median and interquartile range (IQR) are used.
What is a histogram?
A histogram is a graphical representation of the distribution of numerical data, showing the frequency of data points within specified intervals.
What is a box & whisker plot?
A box & whisker plot, or box plot, is a graphical representation that shows the distribution of a data set through its quartiles, highlighting the median, interquartile range, and potential outliers.
What does the mean represent in a data set?
The mean represents the average value of a data set, calculated by summing all the values and dividing by the number of values.
What does the median represent in a data set?
The median represents the middle value in a data set when the values are arranged in ascending order.
What is the standard deviation?
The standard deviation measures the amount of variation or dispersion of a set of values from the mean.
What is the interquartile range (IQR)?
The interquartile range (IQR) measures the spread of the middle 50% of the data, calculated as the difference between the first (Q1) and third quartiles (Q3).
What is a histogram?
A histogram is a graph that displays the frequency distribution of numerical variables, showing how often each range of values occurs in a data set.
What do histograms help us to identify in data distributions?
Histograms help us to identify whether the data distribution is symmetrical, skewed to the left, or skewed to the right.
What characterizes a normal distribution in a histogram?
A normal distribution in a histogram is characterized by its symmetrical shape around the mean, forming a bell curve.
What indicates a left-skewed distribution in a histogram?
A left-skewed distribution, also known as negatively skewed, has a tail that extends to the left, indicating that most of the data points are concentrated on the higher end of the scale.
What indicates a right-skewed distribution in a histogram?
A right-skewed distribution, also known as positively skewed, has a tail that extends to the right, indicating that most of the data points are concentrated on the lower end of the scale.
What is a symmetrical or normal distribution?
A symmetrical or normal distribution is one where the left and right sides of the histogram are mirror images, with the data points evenly distributed around the mean.
What is a negatively or left-skewed distribution?
A negatively or left-skewed distribution has a longer tail on the left side, meaning that there are a few lower values that stretch out the distribution.
What is a positively or right-skewed distribution?
A positively or right-skewed distribution has a longer tail on the right side, meaning that there are a few higher values that stretch out the distribution
Why is it important to identify the skewness of a distribution?
Identifying the skewness of a distribution is important because it informs which summary statistics (mean, median, standard deviation, interquartile range) and analytical methods should be used for accurate data interpretation.
How does skewness affect the mean and median?
In a left-skewed distribution, the mean is typically less than the median. In a right-skewed distribution, the mean is typically greater than the median. In a symmetrical distribution, the mean and median are usually equal or very close.
Why is the mean not appropriate for skewed data?
The mean is not appropriate for skewed data because it is sensitive to extreme values (outliers), which can distort the representation of the central tendency.
How is the mean calculated?
The mean is calculated by summing all the observations and dividing by the number of observations.
What happens to the mean in the presence of outliers?
In the presence of outliers, the mean can be significantly increased or decreased, giving a misleading representation of the typical value in the data set.
Example: Calculate the mean for the number of days spent in hospital among a sample of 10 patients: 4, 4, 5, 7, 7, 7, 8, 9, 9, 10.
The mean is calculated as (4 + 4 + 5 + 7 + 7 + 7 + 8 + 9 + 9 + 10) / 10 = 7 days.
What is the median, and how is it calculated for the same sample of 10 patients?
The median is the middle value of ordered observations. For the sample 4, 4, 5, 7, 7, 7, 8, 9, 9, 10, the median is 7 days.
What is the mode, and what is it for the same sample of 10 patients?
The mode is the value that occurs most often. For the sample 4, 4, 5, 7, 7, 7, 8, 9, 9, 10, the mode is 7 days.
What is the impact of an outlier on the mean? Example: Add an extreme value (60) to the sample.
Adding an outlier, such as 60, to the sample results in the mean being calculated as (4 + 4 + 5 + 7 + 7 + 7 + 8 + 9 + 9 + 60) / 10 = 12 days.
What is the impact of an outlier on the median?
The median remains the same despite the outlier. For the sample with the outlier, the ordered observations are 4, 4, 5, 7, 7, 7, 8, 9, 9, 60, and the median is still 7 days.
What is the impact of an outlier on the mode?
The mode is not affected by the outlier. For the sample with the outlier, the mode remains 7 days.
Why is the median more appropriate for skewed data?
The median is more appropriate for skewed data because it is not affected by outliers and better represents the central tendency of the data.
Why is the mode useful in describing skewed data?
The mode is useful because it identifies the most frequently occurring value, providing insight into common outcomes within the data set.
Summary: How do the mean, median, and mode compare in the presence of skewed data?
In the presence of skewed data, the mean is distorted by outliers, the median remains stable and provides a better central value, and the mode shows the most frequent value, all contributing different perspectives on the data distribution.
What are measures of dispersion?
Measures of dispersion describe the amount of variability in a data set, indicating how close together or spread out the values are.