Biostatics 2 Flashcards

1
Q

What is a variable in the context of research?

A

A variable is any characteristic, number, or quantity that can be measured or counted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What variables are you going to measure on your sample?

A

The specific variables to measure depend on the research study but may include demographic information, clinical outcomes, lab results, and survey responses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Where will the data for your variables come from?

A

The data can come from various sources such as clinical records, questionnaires, clinical measures, and biological specimens.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are clinical records used for in research?

A

Clinical records provide detailed patient information including medical history, treatment plans, and outcomes, which can be used to measure clinical variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How are questionnaires used in research?

A

Questionnaires are used to collect data directly from participants about their experiences, behaviors, attitudes, and other subjective measures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are clinical measures?

A

Clinical measures are objective assessments obtained through physical exams, lab tests, imaging studies, and other medical evaluations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are biological specimens, and how are they used in research?

A

Biological specimens, such as blood, urine, or tissue samples, are used to obtain biochemical and genetic data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What types of data can variables be classified into?

A

Variables can be classified into numerical data and categorical data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is numerical data?

A

Numerical data represent quantities and can be measured. They include continuous data (e.g., blood pressure, weight) and discrete data (e.g., number of hospital visits).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is numerical data?

A

Numerical data represent quantities and can be measured. They include continuous data (e.g., blood pressure, weight) and discrete data (e.g., number of hospital visits).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is categorical data?

A

Categorical data represent characteristics and can be divided into groups. They include nominal data (e.g., blood type, gender) and ordinal data (e.g., pain scale ratings, stages of cancer).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is categorical data?

A

Categorical data represent characteristics and can be divided into groups. They include nominal data (e.g., blood type, gender) and ordinal data (e.g., pain scale ratings, stages of cancer).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you differentiate between continuous and discrete numerical data?

A

Continuous data can take any value within a range (e.g., height, weight), whereas discrete data can only take specific, separate values (e.g., number of children, number of hospital visits).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you differentiate between nominal and ordinal categorical data?

A

Nominal data have categories with no inherent order (e.g., blood type, eye color), while ordinal data have categories with a clear, ranked order (e.g., education level, pain severity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is it important to classify variables into numerical and categorical types?

A

Classifying variables helps determine the appropriate statistical methods for analysis and how the data should be collected and interpreted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is an example of a numerical variable in clinical research?

A

An example of a numerical variable is the patient’s age or systolic blood pressure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is an example of a categorical variable in clinical research?

A

An example of a categorical variable is the patient’s blood type or gender.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a univariate statistical description?

A

A univariate statistical description involves analyzing a single variable to summarize and find patterns in its data. It includes measures like mean, median, mode, variance, and standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the common measures of central tendency used in univariate analysis?

A

The common measures of central tendency in univariate analysis are the mean, median, and mode.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What measures of variability are used in univariate analysis?

A

Measures of variability in univariate analysis include range, variance, standard deviation, and interquartile range.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a bivariate statistical description?

A

A bivariate statistical description involves analyzing the relationship between two variables. It includes examining how one variable changes in relation to the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What graphical methods are used in bivariate analysis?

A

Common graphical methods for bivariate analysis include scatter plots, line graphs, and bar charts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a scatter plot?

A

A scatter plot is a type of graph used in bivariate analysis to display the relationship between two quantitative variables by plotting data points on a two-dimensional axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What statistical methods are used to describe relationships between two variables in bivariate analysis?

A

Methods include correlation coefficients (like Pearson’s r), regression analysis, and cross-tabulation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the purpose of bivariate analysis?

A

The purpose of bivariate analysis is to explore the relationship between two variables, determine the strength and direction of their association, and to make predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Give an example of a research question that involves univariate analysis.

A

An example of a research question for univariate analysis is, “What is the average age of participants in a study?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Give an example of a research question that involves bivariate analysis.

A

An example of a research question for bivariate analysis is, “Is there a relationship between hours of study and exam scores among students?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What are the limitations of univariate analysis?

A

Univariate analysis cannot determine relationships or causation between variables and provides only a summary of the data for a single variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Why is it important to use both univariate and bivariate analyses?

A

Using both univariate and bivariate analyses provides a more comprehensive understanding of the data, allowing for summary statistics and exploration of relationships between variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What are examples of numerical data?

A

Examples of numerical data include age in years, height in cm, and length of stay in a hospital.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Why is the numerical value significant in numerical data?

A

The numerical value has meaning because it quantifies characteristics or attributes, allowing for precise measurement and analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is continuous data?

A

Continuous data can take an infinite number of possible values within a given range. For example, height can be 160 cm or 160.523 cm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is discrete data?

A

Discrete data can be counted and only take on whole number values. For example, the number of nights spent in a hospital or the number of doses of medication missed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Why is it important to consider the distribution of numerical variables?

A

Considering the distribution of numerical variables is crucial to determine the best methods for summarizing and analyzing them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What graphical methods are used to examine the distribution of numerical data?

A

Histograms and box & whisker plots are commonly used to examine the distribution of numerical data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What are summary statistics?

A

Summary statistics are measures that describe the central tendency and variability of a data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Which summary statistics are used for symmetrically or normally distributed data?

A

For symmetrically or normally distributed data, the mean and standard deviation are used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Which summary statistics are used for skewed data?

A

For skewed data, the median and interquartile range (IQR) are used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What is a histogram?

A

A histogram is a graphical representation of the distribution of numerical data, showing the frequency of data points within specified intervals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What is a box & whisker plot?

A

A box & whisker plot, or box plot, is a graphical representation that shows the distribution of a data set through its quartiles, highlighting the median, interquartile range, and potential outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What does the mean represent in a data set?

A

The mean represents the average value of a data set, calculated by summing all the values and dividing by the number of values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What does the median represent in a data set?

A

The median represents the middle value in a data set when the values are arranged in ascending order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What is the standard deviation?

A

The standard deviation measures the amount of variation or dispersion of a set of values from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What is the interquartile range (IQR)?

A

The interquartile range (IQR) measures the spread of the middle 50% of the data, calculated as the difference between the first (Q1) and third quartiles (Q3).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What is a histogram?

A

A histogram is a graph that displays the frequency distribution of numerical variables, showing how often each range of values occurs in a data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What do histograms help us to identify in data distributions?

A

Histograms help us to identify whether the data distribution is symmetrical, skewed to the left, or skewed to the right.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What characterizes a normal distribution in a histogram?

A

A normal distribution in a histogram is characterized by its symmetrical shape around the mean, forming a bell curve.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What indicates a left-skewed distribution in a histogram?

A

A left-skewed distribution, also known as negatively skewed, has a tail that extends to the left, indicating that most of the data points are concentrated on the higher end of the scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What indicates a right-skewed distribution in a histogram?

A

A right-skewed distribution, also known as positively skewed, has a tail that extends to the right, indicating that most of the data points are concentrated on the lower end of the scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

What is a symmetrical or normal distribution?

A

A symmetrical or normal distribution is one where the left and right sides of the histogram are mirror images, with the data points evenly distributed around the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

What is a negatively or left-skewed distribution?

A

A negatively or left-skewed distribution has a longer tail on the left side, meaning that there are a few lower values that stretch out the distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What is a positively or right-skewed distribution?

A

A positively or right-skewed distribution has a longer tail on the right side, meaning that there are a few higher values that stretch out the distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

Why is it important to identify the skewness of a distribution?

A

Identifying the skewness of a distribution is important because it informs which summary statistics (mean, median, standard deviation, interquartile range) and analytical methods should be used for accurate data interpretation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

How does skewness affect the mean and median?

A

In a left-skewed distribution, the mean is typically less than the median. In a right-skewed distribution, the mean is typically greater than the median. In a symmetrical distribution, the mean and median are usually equal or very close.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

Why is the mean not appropriate for skewed data?

A

The mean is not appropriate for skewed data because it is sensitive to extreme values (outliers), which can distort the representation of the central tendency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

How is the mean calculated?

A

The mean is calculated by summing all the observations and dividing by the number of observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

What happens to the mean in the presence of outliers?

A

In the presence of outliers, the mean can be significantly increased or decreased, giving a misleading representation of the typical value in the data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

Example: Calculate the mean for the number of days spent in hospital among a sample of 10 patients: 4, 4, 5, 7, 7, 7, 8, 9, 9, 10.

A

The mean is calculated as (4 + 4 + 5 + 7 + 7 + 7 + 8 + 9 + 9 + 10) / 10 = 7 days.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

What is the median, and how is it calculated for the same sample of 10 patients?

A

The median is the middle value of ordered observations. For the sample 4, 4, 5, 7, 7, 7, 8, 9, 9, 10, the median is 7 days.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

What is the mode, and what is it for the same sample of 10 patients?

A

The mode is the value that occurs most often. For the sample 4, 4, 5, 7, 7, 7, 8, 9, 9, 10, the mode is 7 days.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

What is the impact of an outlier on the mean? Example: Add an extreme value (60) to the sample.

A

Adding an outlier, such as 60, to the sample results in the mean being calculated as (4 + 4 + 5 + 7 + 7 + 7 + 8 + 9 + 9 + 60) / 10 = 12 days.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

What is the impact of an outlier on the median?

A

The median remains the same despite the outlier. For the sample with the outlier, the ordered observations are 4, 4, 5, 7, 7, 7, 8, 9, 9, 60, and the median is still 7 days.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

What is the impact of an outlier on the mode?

A

The mode is not affected by the outlier. For the sample with the outlier, the mode remains 7 days.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

Why is the median more appropriate for skewed data?

A

The median is more appropriate for skewed data because it is not affected by outliers and better represents the central tendency of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

Why is the mode useful in describing skewed data?

A

The mode is useful because it identifies the most frequently occurring value, providing insight into common outcomes within the data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

Summary: How do the mean, median, and mode compare in the presence of skewed data?

A

In the presence of skewed data, the mean is distorted by outliers, the median remains stable and provides a better central value, and the mode shows the most frequent value, all contributing different perspectives on the data distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

What are measures of dispersion?

A

Measures of dispersion describe the amount of variability in a data set, indicating how close together or spread out the values are.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

What is the range?

A

The range is the difference between the minimum and maximum values in a data set.

69
Q

How is the range calculated?

A

The range is calculated by subtracting the minimum value from the maximum value

70
Q

What is a significant limitation of the range?

A

The range is highly influenced by outliers, which can distort the measure of variability.

71
Q

What are variance and standard deviation?

A

Variance and standard deviation are measures of dispersion that indicate the average distance of each data point from the mean in a symmetrically distributed data set.

72
Q

When should variance and standard deviation be used?

A

Variance and standard deviation should be used when the data is symmetrically distributed.

73
Q

What is the variance?

A

Variance is the average of the squared differences between each data point and the mean.

74
Q

What is the standard deviation?

A

The standard deviation is the square root of the variance, representing the average distance of each data point from the mean.

75
Q

What is the interquartile range (IQR)?

A

The interquartile range (IQR) is a measure of dispersion that describes the range between the 25th percentile (Q1) and the 75th percentile (Q3) of the data set.

76
Q

When should the interquartile range (IQR) be used?

A

The interquartile range (IQR) should be used when the data is skewed, as it is less affected by outliers and provides a better measure of central dispersion.

77
Q

How is the interquartile range (IQR) calculated?

A

The interquartile range (IQR) is calculated by subtracting the 25th percentile (Q1) value from the 75th percentile (Q3) value.

78
Q

Why are measures of dispersion important?

A

Measures of dispersion are important because they provide insights into the spread and variability of the data, helping to understand the distribution and reliability of the data set.

79
Q

How do outliers affect measures of dispersion?

A

Outliers can significantly affect measures like the range, variance, and standard deviation, making them appear larger and potentially misleading the interpretation of the data’s variability

80
Q

Why is the IQR preferred over the range in skewed distributions?

A

The IQR is preferred over the range in skewed distributions because it is less affected by outliers and provides a more accurate representation of the central spread of the data.

81
Q

Summary: How do variance and standard deviation differ from the IQR?

A

Summary: How do variance and standard deviation differ from the IQR?

82
Q

What is a box-and-whisker plot?

A

A box-and-whisker plot is a graphical display that summarizes the distribution of a data set using selected summary measures.

83
Q

What are the key components of a box-and-whisker plot?

A

The key components of a box-and-whisker plot are the minimum, lower quartile (25th percentile), median (50th percentile), upper quartile (75th percentile), and maximum.

84
Q

What does the box in a box-and-whisker plot represent?

A

The box represents the interquartile range (IQR), showing the range between the lower quartile (25th percentile) and the upper quartile (75th percentile).

85
Q

What does the line inside the box indicate?

A

The line inside the box indicates the median (50th percentile) of the data set.

86
Q

What do the “whiskers” in a box-and-whisker plot represent?

A

The whiskers extend from the box to the minimum and maximum values in the data set, excluding outliers.

87
Q

How are outliers represented in a box-and-whisker plot?

A

Outliers are typically represented as individual points beyond the whiskers.

88
Q

What can a box-and-whisker plot tell us about the skewness of a distribution?

A

A box-and-whisker plot can show if the distribution is skewed by comparing the lengths of the whiskers and the position of the median within the box. If one whisker is longer than the other, the distribution is skewed in that direction.

89
Q

How can a box-and-whisker plot indicate the presence of outliers?

A

Outliers are indicated by points that fall outside the whiskers of the box-and-whisker plot.

90
Q

What is the importance of the interquartile range (IQR) in a box-and-whisker plot?

A

The IQR is important because it represents the range of the middle 50% of the data, providing a measure of the data’s central dispersion that is resistant to outliers.

91
Q

How do you interpret the position of the median in a box-and-whisker plot?

A

The position of the median within the box indicates the central tendency of the data. If the median is closer to the lower quartile, the data is left-skewed; if it is closer to the upper quartile, the data is right-skewed.

92
Q

Why are box-and-whisker plots useful?

A

Box-and-whisker plots are useful because they provide a clear summary of the data’s distribution, central tendency, and variability, as well as identifying outliers and skewness.

93
Q

What is the significance of the upper quartile (75th percentile) in a box-and-whisker plot?

A

The upper quartile (75th percentile) marks the value below which 75% of the data points fall, indicating the upper boundary of the interquartile range.

94
Q

What is the significance of the lower quartile (25th percentile) in a box-and-whisker plot?

A

The lower quartile (25th percentile) marks the value below which 25% of the data points fall, indicating the lower boundary of the interquartile range.

95
Q

What do the minimum and maximum values represent in a box-and-whisker plot?

A

The minimum and maximum values represent the smallest and largest data points within the range, excluding outliers.

96
Q

Summary: What information does a box-and-whisker plot provide?

A

A box-and-whisker plot provides information on the minimum, lower quartile, median, upper quartile, and maximum values, as well as the presence of outliers and the skewness of the data distribution.

97
Q

What should you do first when collecting a numerical variable?

A

Check the distribution of the variable.

98
Q

Which graphical methods can be used to check the distribution of a numerical variable?

A

Use histograms and box-and-whisker plots.

99
Q

Are there statistical tests to check the normality of a distribution?

A

Yes, there are statistical tests like the Shapiro-Wilk test (to be discussed later).

100
Q

What should you decide after checking the distribution?

A

Decide whether the distribution is normal/symmetrical or skewed.

101
Q

What are the characteristics of a normal distribution?

A

A normal distribution is symmetrical with fixed characteristics.

102
Q

How should you summarize a normally distributed variable?

A

Summarize with the mean and standard deviation.

103
Q

What should you consider if the distribution is skewed?

A

Consider the whole distribution.

104
Q

How should you summarize a skewed variable?

A

Summarize with the median and interquartile range (IQR).

105
Q

What is the purpose of using histograms and box-and-whisker plots?

A

The purpose is to visually assess the distribution of the numerical variable.

106
Q

What does the Shapiro-Wilk test check for?

A

The Shapiro-Wilk test checks for normality in a data distribution.

107
Q

Why is it important to know if the distribution is normal or skewed?

A

It is important because it informs the choice of summary statistics and analysis methods.

108
Q

What is the mean?

A

The mean is the average value of a data set, calculated by summing all values and dividing by the number of values.

109
Q

What is the standard deviation?

A

The standard deviation measures the average distance of each data point from the mean, indicating the spread of the data in a normal distribution.

110
Q

What is the median?

A

The median is the middle value of a data set when the values are arranged in ascending order.

111
Q

What is the interquartile range (IQR)?

A

The interquartile range (IQR) is the range between the 25th percentile (Q1) and the 75th percentile (Q3), representing the middle 50% of the data.

112
Q

What are examples of categorical data?

A

Examples of categorical data include employment status, province of birth, and eye color.

113
Q

Can numerical variables be converted to categorical data?

A

Yes, numerical variables can be converted to categorical data.

114
Q

How can age, a continuous numerical variable, be converted to a categorical variable?

A

Age can be converted to a categorical variable by creating age categories, such as [age < 25] and [age >= 25].

115
Q

Why might you convert a continuous variable like age into categories?

A

Converting a continuous variable like age into categories can help to focus on specific groups of interest, such as younger versus older men.

116
Q

What is an important consideration when creating categories from a numerical variable?

A

It is important to know how many categories you need and to define what each category represents in your protocol.

117
Q

How do you define categories in your research protocol?

A

Define categories by specifying the criteria for each category, such as age ranges or specific attributes.

118
Q

What is the importance of defining categories in your protocol?

A

Defining categories ensures consistency in data collection and analysis, allowing for clear interpretation and comparison of results.

119
Q

How do categorical variables differ from numerical variables?

A

Categorical variables represent distinct groups or categories, while numerical variables represent measurable quantities.

120
Q

Why is it important to know the number of categories for a variable?

A

Knowing the number of categories helps in planning the analysis and ensures that all relevant groups are represented.

121
Q

Give an example of creating a categorical variable from a continuous variable.

A

Example: Collect age in years (continuous) but categorize into “younger men” [age < 25] and “older men” [age >= 25].

122
Q

What is a protocol in research?

A

A protocol in research is a detailed plan that outlines the methodology, including how variables will be measured and categorized.

123
Q

How can employment status be considered categorical data?

A

Employment status can be categorized into groups such as employed, unemployed, and retired.

124
Q

How can the province of birth be considered categorical data?

A

The province of birth can be categorized into different provinces, such as Ontario, Quebec, and British Columbia.

125
Q

What should be done before data collection regarding categorical variables?

A

Before data collection, clearly define each category to ensure accurate and consistent data recording.

126
Q

Summary: What steps should you take when handling categorical data?

A

When handling categorical data, identify the variables, define categories clearly in the protocol, ensure the number of categories is known, and convert numerical variables to categories if necessary for the research focus.

127
Q

How do we summarize categorical data?

A

We summarize categorical data by counting how frequently observations occur in each category, which is referred to as frequencies.

128
Q

What is relative frequency?

A

Relative frequency is the proportion of the total number of observations that fall into each category, displayed as a percentage.

129
Q

How is relative frequency calculated?

A

Relative frequency is calculated by dividing the frequency of a category by the total number of observations and then multiplying by 100 to get a percentage.

130
Q

What is the purpose of using frequencies and proportions in data analysis?

A

Frequencies and proportions help to understand the distribution and prevalence of categories within the data set.

131
Q

What are some common graphical displays for frequencies?

A

Common graphical displays for frequencies include pie charts and bar graphs

132
Q

How does a pie chart display categorical data?

A

A pie chart displays categorical data as slices of a circle, where each slice represents the relative frequency of a category as a proportion of the whole.

133
Q

How does a bar graph display categorical data

A

A bar graph displays categorical data with bars, where the height or length of each bar represents the frequency or relative frequency of a category.

134
Q

Why are pie charts useful?

A

Pie charts are useful for showing the proportion of each category relative to the whole data set, making it easy to compare parts to the whole.

135
Q

Why are bar graphs useful?

A

Bar graphs are useful for comparing the frequency or relative frequency of different categories side by side.

136
Q

What is an example of summarizing categorical data using frequencies?

A

Example: If you have data on eye color for a group of people, you can count how many people have blue, brown, green, and other eye colors to get the frequencies.

137
Q

What is an example of summarizing categorical data using relative frequencies?

A

Example: If 10 out of 50 people have blue eyes, the relative frequency of blue eyes is (10/50) * 100 = 20%.

138
Q

When might you choose a pie chart over a bar graph?

A

You might choose a pie chart over a bar graph when you want to emphasize the proportion of each category relative to the whole data set.

139
Q

When might you choose a bar graph over a pie chart?

A

You might choose a bar graph over a pie chart when you want to compare the frequencies or relative frequencies of categories directly and clearly see the differences between them.

140
Q

How can you display the same data in both a pie chart and a bar graph?

A

You can display the same data in both a pie chart and a bar graph by first calculating the frequencies or relative frequencies of each category and then creating the corresponding graphical displays.

141
Q

Summary: What are the key points for summarizing and displaying categorical data?

A

Key points for summarizing and displaying categorical data include counting frequencies, calculating relative frequencies, and using pie charts and bar graphs to visually represent the data.

142
Q

What is the Shapiro-Wilk test used for in exploratory data analysis?

A

The Shapiro-Wilk test is used to assess the normality of data distribution. A p-value ≥ 0.05 indicates the data is normally distributed (bell-shaped), while a p-value < 0.05 indicates the data is non-normally distributed (skewed).

143
Q

Why is understanding the type of variable important in data analysis?

A

It guides the selection of appropriate statistical methods.

144
Q

What are the main types of variables in data analysis?

A

categorical and numerical.

145
Q

What does bivariate descriptive statistics involve?

A

Examining relationships between two variables.

146
Q

What’s the role of a grouping variable in bivariate analysis?

A

It categorizes data for comparison across groups.

147
Q

How do we compare variables in bivariate analysis?

A

By assessing differences across groups defined by the grouping variable.

148
Q

Give an example of bivariate analysis with categorical variables.

A

Comparing prevalence of diabetes between men and women.

149
Q

Provide an example of bivariate analysis with numerical variables.

A

Examining gestational age at antenatal care between planned and unplanned pregnancies.

150
Q

Why is it important to create categorical variables?

A

To group data for meaningful comparisons in analysis.

151
Q

When might you create a categorical variable from a numerical one?

A

When comparing outcomes across distinct groups, like age brackets.

152
Q

What’s the benefit of using bivariate analysis?

A

It helps uncover relationships and differences between variables, enhancing data interpretation.

153
Q

What should you determine about grouping variables?

A

Determine how many distinct groups are present in your data.

154
Q

How do you classify groups based on their relationship?

A

Groups can be classified as independent or dependent.

155
Q

What are independent groups?

A

They are different and unrelated groups of people.

156
Q

Give examples of independent groups.

A

Men and women; Women with planned pregnancies and women with unplanned pregnancies.

157
Q

What are dependent (paired) groups?

A

These groups are related to each other in some way.

158
Q

Provide examples of dependent (paired) groups.

A

Measuring knowledge in the same people before and after an intervention; Measuring nutrition in people within the same household.

159
Q

Why is it important to distinguish between independent and dependent groups?

A

It determines the appropriate statistical methods for analysis.

160
Q

When might you use independent group comparisons?

A

When comparing characteristics or outcomes between distinct, unrelated groups.

161
Q

When is analysis of dependent/paired groups beneficial?

A

When examining changes within the same individuals or related groups over time or under similar conditions

162
Q

What do grouping variables help determine in research or analysis?

A

They help clarify relationships and guide the selection of appropriate statistical tests for data analysis.

163
Q

What method is used to compare two numerical variables without groups?

A

Use a scatter plot to visualize the relationship between the variables.

164
Q

What patterns can you identify in a scatter plot?

A

You can identify trends, clusters, or correlations between the variables.

165
Q

What does a strong positive correlation in a scatter plot indicate?

A

It indicates that as one variable increases, the other variable tends to increase as well

166
Q

What does a strong negative correlation in a scatter plot indicate?

A

: It indicates that as one variable increases, the other variable tends to decrease.

167
Q

When might you use a scatter plot in analysis?

A

Use it to explore relationships between variables before performing further statistical analysis.

168
Q

Why is a scatter plot useful for comparing two numerical variables?

A

It provides a visual representation of the relationship between variables, helping in understanding patterns and correlations in the data.

169
Q

What are the limitations of using a scatter plot?

A

It may not reveal causation, and outliers can distort the interpretation of the relationship.