Chapter 12: Data-Based and Statistical Reasoning Flashcards

Question

What are the three typical causes of outliers?

Answer 1

A true statistical anomaly. Measurement error (example reading centimeters instead of inches) A distribution that is not approximated by the normal distribution (example: a skewed distribution with a long tail)

Answer 2

Where are the data are not available, the range can be approximated as four times the standard deviation. For the status, the relationship fails. The range is nine, which is only a little more than twice the standard deviation. This is because the data that does not fall in a normal distribution.

Answer 3

The average distance from the mean will always be zero. This is why, in calculations of standard deviation, we always square the distance from the mean and then take the square root at the end this forces all of the values to be positive numbers, which will not cancel out to zero.

Answer 4

Independent events have no effect on one another. Independent events can occur in any order without impacting one another. Dice are a good example. If you roll a day and get a three, then pick it up and roll it again, the probability of getting a three on the second roll is no different than it was before the first roll. Dependent events do you have an impact on one another, such that the order changes the probability. An example would be a container with five red balls, and five blue balls. The probability that one will choose a red ball is 5/10. If a red ball is chosen then the probability of drawing another red ball is 4/9. If a blue ball was chosen, then the probability of drawing a red ball is 5/9.

Answer 5

We are also concerned with whether events are mutually exclusive or not. This term applies to outcomes, rather than events. Mutually exclusive outcomes cannot occur at the same time. The probability of two mutually exclusive outcomes occurring together is 0%. One cannot flip both heads and tails in one throw, or be both 10 and 20 years old.

Answer 6

A group of outcomes is said to be exhausted if there are no other possible outcomes. For example, flipping heads or tails are said to be exhaustive outcomes of a coin flip; these are the only two possibilities.

Answer 7

For independent events, the probability of two or more events occurring at the same time as the product of their probabilities alone. For example, the probability of getting heads on a coin flip twice in a row is the same as the probability of getting heads the first time times the probability of getting heads the second time. (0.5x0.5=0.25 or 1/4).

Answer 8

The probability of at least one of two events occurring is equal to the sum of the initial probability, minus the probability that they will both occur.

Answer 9

Simplify this question by reworking it as the probability of not having all female children. Having at least one male child, and having all female children are mutually exclusive events, and no other possibilities can occur. Thus, the probability of having all female children is (0.5)^10=0.000977 or 0.09% Therefore, the probability of having at least one male child is 1-(0.5)^10=0.999 or 99.99%.

Answer 10

Independence is a condition of events where in the outcome of one event has no effect on the outcome of the other. Mutual exclusivity is a condition where two outcomes cannot occur simultaneously. When a set of outcomes is exhaustive, there are no other possible outcomes.

Answer 11

Hypothesis testing and confidence intervals allow us to draw conclusions about populations based on our sample data.

Answer 12

Hypothesis testing begins with an idea about what may be different between two populations. Null hypothesis says that the two populations are equal, or that a single population can be described by a parameter equal to a given value. A null hypothesis is always a hypothesis of equivalence. Null hypothesis is the default position that states no relationship exists between two variables or groups, or that there's no difference between certain population characteristics. Alternatively hypothesis may be non directional (that the populations are not equal) or directional (increased study time increases test scores, decreased food intake decreases weight, exposing plants to sunlight is hypothesized to promote growth and development).

Answer 13

Confidence intervals are essentially the reverse of hypothesis testing. With a confidence interval, we determine a range of values from the sample mean and standard deviation. For example. Consider a population for which we want to know the mean age. We draw a sample from the population and find that the mean of the sample is 30, with a standard deviation of 3. If we wish to have 95% confidence, the corresponding z-score (provided) is 1.96. Thus, the range is 30-(3x1.96) to 30+(3x1.96) = 24.12 to 35.88. We then report that we are 95% confident that the true mean age of the population from which the sample is drawn is between 24.12-35.88.

Answer 14

Hypothesis tests are used to validate or invalidate a claim that two populations are different, or that one population differs from a given parameter. In a hypothesis test, we calculate a p-value and compare it to a chosen significance level (alpha) to conclude if an observed difference between two populations (or between a population and a parameter) is significant or not. Confidence intervals are used to determine a potential range of values for the true mean of the population.

Answer 15

Fail to reflect the null hypothesis, supporting the alternative hypothesis.

Answer 16

After the test statistic is calculated, a computer program or table is consulted to determine the p-value of the statistic.

Answer 17

True. Power is the probability that the individual reflects the null hypothesis when the alternative hypothesis is true for the population.

Answer 18

Pie or circle charts are used to represent relative amounts of entities and are especially popular and demographics. The primary downside to pie chart is that as the number of representative categories increases, the visual representation loses impact and becomes confusing. Pie charts are frequently used to present demographic information.

Answer 19

Box plots are used to show the range, median, quartile, and outliers for a set of data. A labeled box plot is also called a box and whisker plot. The box of a box and whisker plot is bounded by Q1 and Q3. Q2 (the median) is the line in the middle of the box. The ends of the whiskers correspond to maximum and minimum values of the data set. Outliers can be presented as individual points, with the ends of the whiskers corresponding to the largest and smallest values in the data set that are still within 1.5(IQR) of the median. Box and whisker plots are especially useful for comparing data because they contain a large amount of data in a small amount of space, and multiple plots can be oriented on a single axis.

Answer 20

Bar charts and histogram contain more information than a pie chart for the same space. Bar charts are used for categorical data, which short data points based on categories. Histograms present numerical data rather than discrete categories. Histogram are useful for determining the mode of a data set because they are used to display the distribution of a data set.

Answer 21

First. Look at the axis of the graph and identify meaning and scale. Second. Attempt to draw a rough conclusion immediately without spending a lot of time, analyzing all the details of the graph, unless asked to do so.

Answer 22

A linear graph shows the relationship between two variables. They involve two direct measurements and do not have to be a straight line. Linear: straight line Parabolic: U-shaped Exponential: y=2^x Logarithmic: y=log(x) Sigmoidal: s shaped (titration curve)

Answer 23

This is a linear graph. Be careful as this could be a logarithmic graph (always check the scaling) Y=X

Answer 24

That is a parabolic graph. The equation being Y equals X squared.

Answer 25

This is an exponential graph. The equation is Y equals two raised to the X.

Answer 26

That is a logarithmic graph. The equation is Y equals log X.

Answer 27

We’re both the shape of the graph and the graph type are linear, we should be able to calculate the slope of the line. Slope is the change in the wide direction divided by the change in the X direction for any two points.

Answer 28

By changing the axis ratio, we can create a specialized representation of a logarithmic data set called a semi log graph. They can be easier to interpret because this creates a linear association from otherwise curved logarithmic data set.

Answer 29

In some cases, both axes can be given a different access ratio to create a linear plot. When both axes use a constant ratio from point to point on the axis, this is term to a log – log graph.

Answer 30

The difference between these three plot types is based on the labeling of the axis. It is crucial to pay attention to the axes on test day to be able to interpret a graph correctly.

Answer 31

Find one hour on the X axis. Find the corresponding point on the line and the note of the location on the Y axis. You will find that it is approximately 70% remaining. Multiply the initial quantity by 0.70 to get your answer.

Answer 32

We should be able to convert it to a rough graph or to a linear equation to extrapolate the slope.

Answer 33

Linear relationships can be analyzed without any data or access transformation into semi log or log log plots.

Answer 34

Exponential and parabolic curves both have a steep component; however, exponential curves have horizontal asymptotes and become flat on one side, parabolic curves are symmetrical and have steep components on both side sides of a center plot.

Answer 35

Correlation refers to the connection between data (direct, inverse, or otherwise). Positive correlation: two variables trend together (one increases and so does the other; one decreases and the other also decreases) Negative correlation: the two variables trend in opposite direction (one increases the other decreases, visa versa) Correlation coefficient is a number between -1 and 1 that represents the strength of a relationship: +1 is a strong positive relationship -1 is a strong negative relationship 0 is no relationship

Answer 36

A strong positive correlation means as one variable increases, so does the other (e.g., hours studied and grades), while a strong negative correlation means as one variable increases, the other decreases (e.g., hours of exercise and weight).

Answer 37

False. There must be practical (clinical) along with statistical significance for a conclusion to be useful.

Answer 38

We need to calculate outliers by 1.5 IQR method to answer this question.

Answer 39

A histogram is a graph that uses rectangles to display the frequency of numerical data. It's a common tool for initial data analysis and is used in fields like finance, healthcare, and marketing. How it works: The data is divided into ranges, or "bins". The height of each bar in the histogram represents how many data points fall into that range.