Chapter 12: Data-Based and Statistical Reasoning Flashcards

1
Q

What are the measures of central tendency?

A

Measures of central tendency are those that describe the middle of a sample.

The three measures of central tendency are the mean, median, and mode.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the mean? Can it be used for populations and samples? When is it useful? What causes a skewed mean?

A

The mean or average of a set of data (the arithmetic mean) is calculated by adding up all of the individual values within the data set and dividing the result by the number of values.

The mean may be a parameter or a statistic (parameter for population, statistics for sample).

Mean values are good indicator of central tenancy when all of the values tend to be fairly close to one another.

Having an outlier, and extremely large or extremely small value compared to the other data values, can shift the mean toward one end of the range.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Example mean page 437

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the median? How do we calculate the median? When is the use of a median appropriate or not? Are medians susceptible to outliers?

A

The median value for a set of data is its midpoint, were half of the data points are greater than the value and half are smaller. In data sets with an odd number of values, the median will actually be one of the data points. And data sets with an even number of values, the median will be the mean of the two central data points.

The data must first be listed in increasing fashion, and then can be calculated as the image shows.

The median tends to be least susceptible to outliers, but may not be useful for data sets with very large ranges or modes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Median example page 438

A

If n, the number of data points, is even, the median will be the average of the two center data points.

If n is odd, the median will be the center data point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What can be implied if the mean and the median are far from each other? Close together?

A

If the mean, and the median are far from each other, this implies the presence of an outlier or a skewed distribution. If the main and median are very close, this implies a symmetrical distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the mode? When are modes used?

A

The mode is the number that appears most often in a set of data.

There may be multiple modes in a data set, or if all numbers appear equally, there can even be no mode for a set of data.

The mode is not typically used as a measure of central tendency for a set of data, but the number of modes and their distance from one another is often informative. If the data set has two modes with a small number of values between them, it may be useful to analyze these portions separately, or to look for other variables that may be responsible for dividing the distribution in two parts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

MCAT concept check central tendency 12.1 page 439 question 1

What types of data sets are best analyzed using the mean as a measure of central tendency?

A

The mean is the best measure of central tendency for a data set with a relatively normal distribution. The mean performs poorly in data sets with outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

MCAT concept check central tendency 12.1 page 439 question 2

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is normal distribution? What are the mean, median, and mode for normal distribution? What percentage of distribution is within one standard deviation of the mean? Within two? Three?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is standard distribution?

A

Normal distribution can be transformed to a standard distribution with a mean of zero and a standard deviation of one, and then use the newly generated curve to get information about probability or percentages of populations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a skewed distribution? Where will the mean, median, and mode be on negative or positive skewed distributions?

A

Skewed distribution is an asymmetric distribution and contains a tail on one side or the other of the data.

The tail points to the skew direction. Tail on the left equals skewed left. Tail on the right equals skewed right.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is bimodal distribution?

A

A distribution containing two peaks with the valley in between is called bimodal.

Bimodal distributions might only have one mode if one peak is slightly higher than the other. However, even when the peaks are of two different sizes, we still call the distribution bimodal.

If there is a sufficient separation of the two peaks, or a sufficiently small amount of data within the valley region, bimodal distributions can often be analyzed as two separate distributions. However, they do not HAVE to be analyzed as two separate distributions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

MCAT concept check distributions 12.2 page 442 question 1

How do the mean, median, and mode compare for a right skewed distribution?

A

The mean of a right (positively) skewed distribution is to the right of the median, which is to the right of the mode.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

MCAT concept check distributions 12.2 page 442 question 2

Can data that do not follow a normal distribution be analyzed with measures of central tendency and measures of distribution? Why or why not?

A

Any distribution can be mathematically, or procedurally, transformed to follow a normal distribution by virtue of the central limit theorem. A distribution that is not normal may still be analyzed with these measures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

MCAT concept check distributions 12.2 page 442 question 3

What is the difference between normal or skewed distributions, and bimodal distributions?

A

Bimodal distributions have two peaks, whereas normal or skewed distributions have only one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the range of a data set? How do we calculate the range of a data set? What do we do when we cannot calculate the standard deviation for a normal distribution?

A

The range of a data set is the difference between its largest and smallest values.

Range does not consider the number of items of the data set, nor does it consider the placement of any measures of central tendency. Range is therefore heavily affected by the presence of data outliers.

In the case where it is not possible to calculate the standard deviation for a normal distribution because the entire data is not provided, it is possible to approximate the standard deviation is 1/4 of the range.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the interquartile range?

A

Interquartile range is related to the median, first, and third quartiles. Quartiles, including the median (Q2), divide data (when placed in ascending order) into groups that comprise one fourth of the entire set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How do we calculate quartiles? How do we calculate inter quartile range (IQR)?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Interquartile range calculation example page 443. Including the first part of the question in the question card because the answer is split between two pages.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is standard deviation? How do we calculate it?

A

Standard deviation is an informative measure of distribution. It is calculated relative to the mean of the data.

Standard deviation is calculated by taking the difference between each data point and the mean, squaring the value, dividing the sum of all the squared values by the number of points in the data set minus one, and then taking the square root of the result……………..

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How do we determine an outlier using standard deviation?

A

Another definition of an outlier is any value that lies more than three standard deviations from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Standard deviation calculation example page 445

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What percentage of the data points fall within one standard deviation of the mean? Two standard deviations? Three?

A

68% fall working one standard deviation.

95% fall within two.

99% fall within three.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are the three typical causes of outliers?

A

A true statistical anomaly.

Measurement error (example reading centimeters instead of inches)

A distribution that is not approximated by the normal distribution (example: a skewed distribution with a long tail)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

MCAT concept check measures of distribution 12.3 page 446 question 1

Compare the method of determining outliers from the interquartile range and from the standard deviation.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

MCAT concept check measures of distribution 12.3 page 446 question 2

A

Where are the data are not available, the range can be approximated as four times the standard deviation. For the status, the relationship fails. The range is nine, which is only a little more than twice the standard deviation. This is because the data that does not fall in a normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

MCAT concept check measures of distribution 12.3 page 446 question 3

Why would the average difference from the mean be an inappropriate measure of distribution?

A

The average distance from the mean will always be zero. This is why, in calculations of standard deviation, we always square the distance from the mean and then take the square root at the end this forces all of the values to be positive numbers, which will not cancel out to zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Blood problems, we must first determine the relationship between events and outcomes. We are interested in independence or dependence. What is an independent event, what is a dependent event?

A

Independent events have no effect on one another. Independent events can occur in any order without impacting one another. Dice are a good example. If you roll a day and get a three, then pick it up and roll it again, the probability of getting a three on the second roll is no different than it was before the first roll.

Dependent events do you have an impact on one another, such that the order changes the probability. An example would be a container with five red balls, and five blue balls. The probability that one will choose a red ball is 5/10. If a red ball is chosen then the probability of drawing another red ball is 4/9. If a blue ball was chosen, then the probability of drawing a red ball is 5/9.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What are mutually exclusive outcomes? What is the probability of two mutually exclusive events occurring together? Provide an example.

A

We are also concerned with whether events are mutually exclusive or not. This term applies to outcomes, rather than events.

Mutually exclusive outcomes cannot occur at the same time. The probability of two mutually exclusive outcomes occurring together is 0%.

One cannot flip both heads and tails in one throw, or be both 10 and 20 years old.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

We must consider if a set of outcomes is exhaustive or not. What is exhaustive?

A

A group of outcomes is said to be exhausted if there are no other possible outcomes.

For example, flipping heads or tails are said to be exhaustive outcomes of a coin flip; these are the only two possibilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

For independent events, what is the probability of two or more events occurring at the same time?

A

For independent events, the probability of two or more events occurring at the same time as the product of their probabilities alone.

For example, the probability of getting heads on a coin flip twice in a row is the same as the probability of getting heads the first time times the probability of getting heads the second time. (0.5x0.5=0.25 or 1/4).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is the probability of at least one of two events occurring?

A

The probability of at least one of two events occurring is equal to the sum of the initial probability, minus the probability that they will both occur.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Probability example page 448

35
Q

MCAT concept check probability 12.4 page 449 question 1

Assume the likelihood of having a male child is equal to the likelihood of having a female child. In a series of 10 life births, the probability of having at least one male child is equal to:

A

Simplify this question by reworking it as the probability of not having all female children. Having at least one male child, and having all female children are mutually exclusive events, and no other possibilities can occur. Thus, the probability of having all female children is (0.5)^10=0.000977 or 0.09%

Therefore, the probability of having at least one male child is 1-(0.5)^10=0.999 or 99.99%.

36
Q

MCAT concept check probability 12.4 page 449 question 2

Describe the following terms: independence, mutual exclusivity, exhaustiveness.

A

Independence is a condition of events where in the outcome of one event has no effect on the outcome of the other.

Mutual exclusivity is a condition where two outcomes cannot occur simultaneously.

When a set of outcomes is exhaustive, there are no other possible outcomes.

37
Q

What does hypothesis testing and confidence intervals allow us to do?

A

Hypothesis testing and confidence intervals allow us to draw conclusions about populations based on our sample data.

38
Q

What is hypothesis testing? What is a null hypothesis? What is the alternative hypothesis?

A

Hypothesis testing begins with an idea about what may be different between two populations.

Null hypothesis says that the two populations are equal, or that a single population can be described by a parameter equal to a given value. A null hypothesis is always a hypothesis of equivalence.

Null hypothesis is the default position that states no relationship exists between two variables or groups, or that there’s no difference between certain population characteristics.

Alternatively hypothesis may be non directional (that the populations are not equal) or directional (increased study time increases test scores, decreased food intake decreases weight, exposing plants to sunlight is hypothesized to promote growth and development).

39
Q

What are the two most common hypothesis tests? What is p value? Significance level? Type I and II errors? Power?

40
Q

What are confidence intervals? Page 450

A

Confidence intervals are essentially the reverse of hypothesis testing. With a confidence interval, we determine a range of values from the sample mean and standard deviation.

For example. Consider a population for which we want to know the mean age. We draw a sample from the population and find that the mean of the sample is 30, with a standard deviation of 3. If we wish to have 95% confidence, the corresponding z-score (provided) is 1.96. Thus, the range is 30-(3x1.96) to 30+(3x1.96) = 24.12 to 35.88. We then report that we are 95% confident that the true mean age of the population from which the sample is drawn is between 24.12-35.88.

41
Q

MCAT concept check statistical testing 12.5 page 451 question 1

How do hypothesis tests and confidence intervals differ?

A

Hypothesis tests are used to validate or invalidate a claim that two populations are different, or that one population differs from a given parameter. In a hypothesis test, we calculate a p-value and compare it to a chosen significance level (alpha) to conclude if an observed difference between two populations (or between a population and a parameter) is significant or not.

Confidence intervals are used to determine a potential range of values for the true mean of the population.

42
Q

MCAT concept check statistical testing 12.5 page 451 question 2

If p-value is greater than alpha in a given statistical test, what is the outcome of the test?

A

Fail to reflect the null hypothesis, supporting the alternative hypothesis.

43
Q

MCAT concept check statistical testing 12.5 page 451 question 3

How is the p-value calculated during a hypothesis test?

A

After the test statistic is calculated, a computer program or table is consulted to determine the p-value of the statistic.

44
Q

MCAT concept check statistical testing 12.5 page 451 question 4

True or false. Power is the probability of correctly rejecting the null hypothesis.

A

True. Power is the probability that the individual reflects the null hypothesis when the alternative hypothesis is true for the population.

45
Q

What is a pie chart? What are the strengths and weaknesses of a pie chart?

A

Pie or circle charts are used to represent relative amounts of entities and are especially popular and demographics.

The primary downside to pie chart is that as the number of representative categories increases, the visual representation loses impact and becomes confusing.

Pie charts are frequently used to present demographic information.

46
Q

What is a box plot? What is a box and whisker plot?

A

Box plots are used to show the range, median, quartile, and outliers for a set of data. A labeled box plot is also called a box and whisker plot.

The box of a box and whisker plot is bounded by Q1 and Q3. Q2 (the median) is the line in the middle of the box. The ends of the whiskers correspond to maximum and minimum values of the data set. Outliers can be presented as individual points, with the ends of the whiskers corresponding to the largest and smallest values in the data set that are still within 1.5(IQR) of the median.

Box and whisker plots are especially useful for comparing data because they contain a large amount of data in a small amount of space, and multiple plots can be oriented on a single axis.

47
Q

What is a bar chart, histogram?

A

Bar charts and histogram contain more information than a pie chart for the same space.

Bar charts are used for categorical data, which short data points based on categories.

Histograms present numerical data rather than discrete categories. Histogram are useful for determining the mode of a data set because they are used to display the distribution of a data set.

48
Q

When presented with a graph, what are the first two things we should do?

A

First. Look at the axis of the graph and identify meaning and scale.

Second. Attempt to draw a rough conclusion immediately without spending a lot of time, analyzing all the details of the graph, unless asked to do so.

49
Q

What is a linear graph? Do they have to be a straight line? What are the five kinds of linear graphs we can expect to see? Briefly describe.

A

A linear graph shows the relationship between two variables. They involve two direct measurements and do not have to be a straight line.

Linear: straight line
Parabolic: U-shaped
Exponential: y=2^x
Logarithmic: y=log(x)
Sigmoidal: s shaped (titration curve)

50
Q

What shape of graph is this:

What is the equation for the line?

A

This is a linear graph. Be careful as this could be a logarithmic graph (always check the scaling)

Y=X

51
Q

What kind of graph is this:

What is the equation for this graph?

A

That is a parabolic graph. The equation being Y equals X squared.

52
Q

What kind of graph is this? What is the equation for the graph?

A

This is an exponential graph. The equation is Y equals two raised to the X.

53
Q

What kind of graph is this? What is the equation for the graph?

A

That is a logarithmic graph. The equation is Y equals log X.

54
Q

What is the slope of a line? What is the equation for the slope of a line?

A

We’re both the shape of the graph and the graph type are linear, we should be able to calculate the slope of the line.

Slope is the change in the wide direction divided by the change in the X direction for any two points.

55
Q

Slope calculation of a linear graph example page 457

56
Q

What is a semi log graph?

A

By changing the axis ratio, we can create a specialized representation of a logarithmic data set called a semi log graph. They can be easier to interpret because this creates a linear association from otherwise curved logarithmic data set.

57
Q

What is a log – log graph?

A

In some cases, both axes can be given a different access ratio to create a linear plot. When both axes use a constant ratio from point to point on the axis, this is term to a log – log graph.

58
Q

What accounts for the difference between the three plot types (linear, semi log, and log log)?

A

The difference between these three plot types is based on the labeling of the axis. It is crucial to pay attention to the axes on test day to be able to interpret a graph correctly.

59
Q

Semi log graph example page 458.

A

Find one hour on the X axis. Find the corresponding point on the line and the note of the location on the Y axis. You will find that it is approximately 70% remaining.

Multiply the initial quantity by 0.70 to get your answer.

60
Q

When a test question asks for the interpretation of the slope without actually providing a graph, what should we do?

A

We should be able to convert it to a rough graph or to a linear equation to extrapolate the slope.

61
Q

MCAT concept check 12.6 charts, graphs, tables page 459 question 1

What type of data relationship is least likely to require transformation into a semi log or log log plot?

A

Linear relationships can be analyzed without any data or access transformation into semi log or log log plots.

62
Q

MCAT concept check 12.6 charts, graphs, tables page 459 question 2

63
Q

MCAT concept check 12.6 charts, graphs, tables page 459 question 3

How do exponential and parabolic curves differ in shape?

A

Exponential and parabolic curves both have a steep component; however, exponential curves have horizontal asymptotes and become flat on one side, parabolic curves are symmetrical and have steep components on both side sides of a center plot.

64
Q

What is correlation? Positive correlation? Negative? What is a correlation coefficient?

A

Correlation refers to the connection between data (direct, inverse, or otherwise).

Positive correlation: two variables trend together (one increases and so does the other; one decreases and the other also decreases)

Negative correlation: the two variables trend in opposite direction (one increases the other decreases, visa versa)

Correlation coefficient is a number between -1 and 1 that represents the strength of a relationship:

+1 is a strong positive relationship
-1 is a strong negative relationship
0 is no relationship

65
Q

Give example of strong positive and strong negative correlation.

A

A strong positive correlation means as one variable increases, so does the other (e.g., hours studied and grades), while a strong negative correlation means as one variable increases, the other decreases (e.g., hours of exercise and weight).

66
Q

Applying data example page 461

67
Q

MCAT concept check applying data 12.7 page 461 question 1

True or False. Statistical significance is sufficient criteria to enact policy change.

A

False. There must be practical (clinical) along with statistical significance for a conclusion to be useful.

68
Q

MCAT concept check applying data 12.7 page 461 question 2

True or false. Two variables that are causally related will also be correlated with each other.

69
Q

MCAT Mastery data based and statistical reasoning chapter 12 page 430 question 1

70
Q

MCAT Mastery data based and statistical reasoning chapter 12 page 430 question 2

71
Q

MCAT Mastery data based and statistical reasoning chapter 12 page 430 question 3

72
Q

MCAT Mastery data based and statistical reasoning chapter 12 page 430 question 4

73
Q

MCAT Mastery data based and statistical reasoning chapter 12 page 430 question 5

74
Q

MCAT Mastery data based and statistical reasoning chapter 12 page 430 question 6

75
Q

MCAT Mastery data based and statistical reasoning chapter 12 page 430 question 7

76
Q

MCAT Mastery data based and statistical reasoning chapter 12 page 430 question 8

77
Q

MCAT Mastery data based and statistical reasoning chapter 12 page 431 question 9

78
Q

MCAT Mastery data based and statistical reasoning chapter 12 page 431 question 10

79
Q

MCAT Mastery data based and statistical reasoning chapter 12 page 431 question 11

80
Q

MCAT Mastery data based and statistical reasoning chapter 12 page 431 question 12

A

We need to calculate outliers by 1.5 IQR method to answer this question.

81
Q

MCAT Mastery data based and statistical reasoning chapter 12 page 431 question 13

82
Q

MCAT Mastery data based and statistical reasoning chapter 12 page 432 question 14

83
Q

MCAT Mastery data based and statistical reasoning chapter 12 page 432 question 15

84
Q

What is a histogram?

A

A histogram is a graph that uses rectangles to display the frequency of numerical data. It’s a common tool for initial data analysis and is used in fields like finance, healthcare, and marketing. How it works: The data is divided into ranges, or “bins”. The height of each bar in the histogram represents how many data points fall into that range.