Module 1-8 Flashcards

Question

Experimental vs. non-experimental

Answer 1

-Comparative studies may be experimental or non-experimental -In EXPERIMENTAL DESIGNS, the investigator assign the subjects to groups according to the explanatory variable ~Exposed and unexposed groups -In NONEXPERIMENTAL DESIGNS, the investigator does not assign subjects into groups; individuals are merely classified as "exposed" or "non-exposed"

Answer 2

-Experimental and non-expermental study design

Answer 3

-The Women's Health Initiative study randomly assigned about half its subjects to a group that received hormone replacement therapy (HRT) -Subject were followed for ~5 years to ascertain various health outcomes, including heart attacks, stroke, the occurrence of breast cancer and so no

Answer 4

-The Nurse's Health study classified individuals according to whether they received HRT -Subjects were followed for ~5 years to ascertain the occurrence of various health outcomes

Answer 5

-In both the experimental (WHI) study and nonexperimental (Nurse's Health) study, the relationship between HRT (explanatory variable) and various health outcomes (response variables) was studied -In the experimental design, the investigators controlled who was and who was not exposed -In the nonexperimental design, the study subjects (or their physicians) decided on wether or not subjects were wxposed

Answer 6

-A subject = an individual participating in the experiment -A factor = an explanatory variable being studied; experiments may address the effect of multiple factors -A treatment = a specific set of factors

Answer 7

-Ages of people in group A ~21, 42, 5, 11, 30, 50, 28, 27, 24, 52

Answer 8

-You can observe a lot by looking - Yogi Berra -Starting by exploring the data with Exploratory Data Analysis (EDA) -A popular univariate EDA technique is the stem-and-leaf plot -The stem of the stempolt is a number-line (axis) -Each leaf represents a data point -Ex: 0 l 5 1 l 1 2 l 1 4 7 8 3 l 0 4 l 2 5 l 0 2

Answer 9

-10 ages (data sequenced as an ordered array) ~ 5, 11, 21, 24, 27, 28, 30, 42, 50, 52 -Draw the stem to cover the range 5 to 52 0 l 1 l 2 l 1 3 l 4 l 5 l x 10 <- axis multiplier -Divide each data point into a stem-value (in this example, the tens place) and leaf-value (the ones-place, in this example) -Place leaves next to the stem value -Example of a leaf: 21 (plotted) -Plot all the data points in rank order 0 l 5 1 l 1 2 l 1478 3 l 0 4 l 2 5 l 02

Answer 10

-Symmetry (mirror image of itself around its center) -Modality (number of peaks) - Kurtosis (width of tails or steepness of the mound) -Departures (outliers)

Answer 11

-Gravitational center -> mean -Middle value -> median

Answer 12

-Range and inter-quartile range -Standard deviation and variance (chapter 4)

Answer 13

-"Shape" refers to the pattern when plotted -Here's the "skyline silhouette" of our data x x x x x x x x x x 0 1 2 3 4 5 - Consider: symmetry, modality, kurtosis -Do NOT 'over-interpret" plots when n is small

Answer 14

-"Eye-ball method" -> visualize where the plot would balance ~Around 25 to 35 -Arithmetic method = sum values and divide by n ~mean = 290/10 = 29

Answer 15

-Ordered array: 5, 11, 21, 24, 27, 28, 30, 42, 50, 52 -The median has depth (n+1) / 2 - n = 10, median's depth = (10+1) / 2 = 5.5 -Falls between 27 and 28 -When n is even, average the adjacent values ~Meadian = 27.5

Answer 16

-For now, report the range (minimum and maximum values) -Current data range is "5 to 52" -The range is the easiest but not the best way to describe spread (better methods described later)

Answer 17

-An outlier is a striking deviation from the overall pattern or shape of the distribution 0 l 679 1 l 124557 2 l 3 l 4 l 5 l 0 x10

Answer 18

-Data: ~1.47, 2.06, 2.36, 3.43, 3.74, 3.78, 3.94, 4.42 -Stem = ones-place -Leaves = tenths-place -Truncate extra digit (ex., 1.47 -> 1.4) ~DO NOT plot decimal -Center ~Between 3.4 and 3.7 -Spread ~ 1.4 to 4.4 -Shape ~Mound, no outliers

Answer 19

-Data 14, 17, 18, 19, 22, 22, 23, 24, 24, 26, 26, 27, 28, 29, 30, 30, 30, 31, 32, 33, 34, 34, 35, 36, 37, 38, -Regular stemplot 1 l 4789 2 l 2234466789 3 l 000123445678 x10 Too squished to see the shape

Answer 20

-Split stem values into two ranges ~First "1" holds leaves between 0 to 4, and second "1" will hold leaves between 5 to 9 -Split-stem 1 l 4 1 l 789 2 l 2234 2 l 66789 3 l 00012344 3 l 5678 x10 -negative skew now evident

Answer 21

-Start with between 4 and 12 stem-values -Trial and error ~Try different stem multiplier ~Try splitting stem ~Look for most informative plot

Answer 22

-x100 axis multiplier -> only two stem-values (1x100 and 2x100) -x100 axis-multiplier w/ split stem -> only 4 stem values -> might be okay -x10 axis-multiplier -> see next slide

Answer 23

10 l 0166 11 l 009 12 l 0034578 13 l 00359 14 l 08 15 l 00257 16 l 555 17 l 000255 18 l 000055567 19 l 245 20 l 3 21 l 025 22 l 0 23 l 24 l 25 l 26 l 0 x10 -Shape ~Positive skew, high outlier (260) -Location ~Median about 165 -Spread ~From 100 to 260

Answer 24

-Frequency = count -Relative frequency = proportion or % -Cumulative frequency = % less than or equal to level

Answer 25

-When data are sparse, group data into class intervals -Create 4 to 12 class intervals -Classes can be uniform or non-uniform -End-point convention ~First class interval of 0 to 10 will include o but exclude 10 (0 to 9.99) -Tally frequencies -Calculate relative frequency -Calculate cumulative frequency

Answer 26

-Uniform class intervals table (width 10) for data ~5, 11, 21 ,24, 27, 28, 30, 42, 50, 52 Class Freq Relative Freq (%) Cumulative Freq (%) 0-9 1 10 10 10-19 1 10 20 20-29 4 40 60 30-39 1 10 70 40-44 1 10 80 50-59 2 20 100 Total 10 100

Answer 27

-A histogram is a frequency chart for a quantitative measurement ~The bars will touch

Answer 28

-A bar chart with non-touching bars is reserved for categorical measurements and non-uniform class

Answer 29

-Central location ~Mean ~Median ~Mode -Spread ~Range and interquartile range (IQR) ~Variance and standard deviation -Shape Summaries ~Seldom used in practice

Answer 30

-n = sample size -x = the variable (ex. ages of subjects) -xi = the value of individual i for variable X -E = sum all values (capital sigma) -Illustrative data (ages of participants) 5, 11, 21, 24, 27, 28, 30, 42, 50, 52 n = 10 x = Age variable x1 = 5, x2 = 11, ...... x10 = 52 Exi = x1 + x2 + ..... + x10 = 5+11+ .... + 52 =290

Answer 31

-"Arithmetic average" -Traditional measure of central location -Sum the values and divide by n -"xbar" refers to the sample mean pages 77-79 in the textbook has the equation to use -

Answer 32

-Ten individuals selected at random have the following ages 21, 42, 5, 11, 30, 50, 28, 27, 24, 52 *Note that n = 10, Exi = 21 +41, + .... + 52 = 290, 1/10(290) = 29.0

Answer 33

-The sample mean: ~The value of an observation drawn at random from the sample can be used to predict the population mean

Answer 34

- -Same operation as the sample mean except based on the entire population (N = population size) -Conceptually important -Usually not available in practice -Sometimes referred to as the expected value

Answer 35

-The median is the value with a depth on (n + 1) / 2 -When n is even, average the two values that straddle a depth of (n + 1) / 2 -For the 10 values listed below, the median has depth (10 + 1) / 2 = 5.5, placing it between 27 and 28 ~Average these two values to get the median = 27.5 5, 11, 21, 24, 27, 28, 30, 42, 50, 52 M = 27.5

Answer 36

-Ex A: ~2, 4, 6 *M = 4 -Ex B: ~2, 4, 6, 8 *M = 5 -Ex C: ~6, 2, 4 *M does not = 2 **(Values MUST be ORDERED first)

Answer 37

-The median is more resistant to skews and outliers than the mean; it is more robust -This data set has a mean of 1636 1362, 1439, 1460, 1614, 1666, 1792, 1867 -Here's the same data set with a data entry error "outlier" ~This data set has a mean of 2743 1362, 1439, 1460, 1614, 1666, 1792, 9867 -The median is 1614 in both instances, demonstrating its robustness in the face of outliers

Answer 38

-The mode is the most commonly encountered value in the dataset -This data set had a mode of 7 {4, 7, 7, 7, 8, 8, 9} -This data set has no mode {4, 6, 7, 8} (each point appears only once) -The mode is useful only in large data sets with repeating values

Answer 39

-Most common descriptive measures of spread -Based on deviations around the mean -This figure demonstrates the deviations of two of its values

Answer 40

-Deviation = xi- ~Sum of squared deviations = SS = E(xi- ~Sample variance = s^2 = (SS/(n-1)) ~Sample standard deviation = s= *Go back to slides to write down the rest of the equations

Answer 41

s = -Sample standard deviation s is the estimator of population standard deviation ~See "Facts About the Standard Deviation" page 93 *Go back to slides to write down the equation

Answer 42

Observation Deviations Squared deviations 36 36-36 = 0 0^2 = 0 38 38-36 = 2 2^2 = 4 39 39-36 = 3 3^2 = 9 40 40-36 = 4 4^2 = 16 36 36-36 = 0 0^2 = 0 34 34-36 = -2 -2^2 = 4 33 33-36 = -3 -3^2 = 9 32 32-36 = -4 -4^2 = 16 SUMS -> 0* SS = 58 *SUM of deviations always equal zero

Answer 43

Observation Deviations Squared deviations 36 36-36 = 0 0^2 = 0 38 38-36 = 2 2^2 = 4 39 39-36 = 3 3^2 = 9 40 40-36 = 4 4^2 = 16 36 36-36 = 0 0^2 = 0 34 34-36 = -2 -2^2 = 4 33 33-36 = -3 -3^2 = 9 32 32-36 = -4 -4^2 = 16 SUMS -> 0* SS = 58 *SUM of deviations always equals zero -Sample variances (s^2) ~ -Standard deviation(s) ~

Answer 44

-Measure spread (ex. if group was s1 = 15 and group 2 s2 = 10, group 1 has more spread, i.e., variability)

Answer 45

-Two distributions can be quite different yet can have the same mean -This data compares particulate matter in air samples (up/m^3) at two sites ~Both sites have a mean of 36, but Site 1 exhibits much greater variability *We would miss the high pollution days if we relied solely on the mean Site 1 l l Site 2 42 l 2 l 8 l 2 l 2 l 3 l 234 86 l 3 l 6689 2 l 4 l 0 l 5 l l 5 l l 6 l 8 l 6 l x10

Answer 46

-Range = maximum - minimum -Illustrative example ~Site 1 range 68 - 22 = 46 -Site 2 range = 40 - 32 = 8 -Beware: ~The sample range will tend to underestimate the population range -Always supplement the range with at least one addition measure of spread Site 1 l l Site 2 42 l 2 l 8 l 2 l 2 l 3 l 234 86 l 3 l 6689 2 l 4 l 0 l 5 l l 5 l l 6 l 8 l 6 l x10

Answer 47

-Quartile 1 (Q1) ~Cuts off bottom quarter of data = median of the lower half of the data set -Quartile 2 (Q2) ~Cuts off top quarter of data = median of the upper half of the data set -Interquartile Range (IQR) = Q3-Q1 covers the middle 50% of the distribution 5, 11, 21, 24, 27, 28, 30, 42, 50, 52 Q1 = 21, Q3 = 42, and IQR = 42-21 = 21

Answer 48

-Quartile 1 (Q1) ~Cuts off bottom quarter of data = median of the lower half of the data set -Quartile 2 (Q2) ~Cuts off top quarter of data = median of the upper half of the data set -Interquartile Range (IQR) = Q3-Q1 covers the middle 50% of the distribution 5, 11, 21, 24, 27, 28, 30, 42, 50, 52 Q1 = 21, Q3 = 42, and IQR = 42-21 = 21

Answer 49

1362, 1439, 1460, 1614, 1666, 1792, 1867 Median = 1614 -When n is odd, include the median in both halves of the data set -Bottom half: ~ 1362, 1439, 1460, 1614 which has a median of 1449.5 (Q1) -Top half ~1614, 1666, 1792, 1867 which has a median of 1729 (Q3)

Answer 50

-Q0 (the minimum) -Q1 (25th percentile) -Q2 (median) -Q3 (75th percentile) -Q4 (the maximum)

Answer 51

-Calculate 5-point summary ~Draw box from Q1 to Q3 with line at median -Calculate IQR and fences as follows ~Fence lower = Q1-1.5(IQR) ~Fence upper = Q3 + 1.5(IQR) *DO NOT DRAW FENCES -Determine if any values lie outside the fences (outside values) ~If so, plot these separately -Determine values inside the fences (inside values) ~Draw whisker from Q3 to upper inside value ~Draw whisker from Q1 to lower inside value

Answer 52

Data: 5, 11, 21, 24, 27, 28, 30, 42, 50, 52 -5 pt summary: {5, 21, 27.5, 42, 52}; box from 21 to 42 with line @ 27.5 -IQR = 42 - 21 = 21 ~Fu = Q3 + 1.5 (21) = 73.5 ~Fl = Q1 - 1.5 (21) = -10.5 -None values above the upper fence and below the lower fence -Upper inside value = 52 -Lower inside value = 5 -Draw whiskers

Answer 53

-5 pt summary ~3, 22, 25.5, 29, 51: draw a box -IQR = 29 - 22 = 7 ~Fu = Q3 + 1.5 (7) = 39.5 ~Fl = Q1 - 1.5 (7) = 11.6 -One above the top fence (51) and one below the bottom fence (3) -Upper inside value is 31 -Lower inside value is 21 -Draw whiskers

Answer 54

-Seven metabolic rates 1362, 1439, 1460, 1614, 1666, 1792, 1867 -5 pt summary ~1362, 1449.5, 1614, 1729, 1867 -IQR = 1729 - 1449.5 = 279.5 ~Fu = Q3 + 1.5 (279.5) = 2148.25 ~Fl = Q1 - 1.5 (279.5) = 1030.25 -None outside -Whiskers end @ 1867 and 1362

Answer 55

-Location ~Position of median ~Position of box -Spread ~Hinge-spread (IQR) ~Whisker-to-whisker spread -Shape ~Symmetry or direction of skew ~Long whiskers (tails) indicate leptokurtosis

Answer 56

-Boxplots are especially useful when comparing groups

Answer 57

-Always report a measure of a central location, a measure of spread, and the sample size -Symmetrical mound-shaped distributions -> report the mean and standard deviation -Odd-shaped distributions -> report 5-point summaries (or median and IQR)

Answer 58

-Random variable = a numerical quantity that takes on different values depending on chance -Ex: ~Number of smokers in a simple random sample of size n, the ages of subjects selected at random at UNR -Sample Space = the set of all possible values from a random variable -Ex: ~If the subject's age is a random variable of interest, the set of all possible values for this random variable is??? -Event = an outcome or set of outcomes from random variables -Probability = the proportion of times an event is expected to occur in the population -Ex: ~Roll a fair die: the probability that the die lands on "one" *Ideas about probability are founded on relative frequencies (proportions) in populations

Answer 59

-Random Variable ~The number on the face -Population (Sample Space): (not a population of people) {1, 2, 3, 4, 5, 6} -Event: 1 -Probability: 1/6 EX: ~Event: 5 or 6 ~Probability: 2/6 or 1/3

Answer 60

-In a given year, there were 42,636 traffic fatalities in a population of N= 293,655,000 -If randomly selected a person from this population, what is the probability that they will experience a traffic fatality by the end of that year -ANS ~The relative frequency of that event in the population = 42,636 / 293,655, 000 = 0.0001452 *Thus, Pr(traf. fatality) = 0.0001452 (about 1 in 6887)

Answer 61

-Random variable = a numerical quantity that takes on different values depending on chance -Two types of random variables -Discrete random variables ~A countable set of possible outcomes *X = nu ber of smokers (cannot have half of a person) -Continuous random variable ~An unbroken continuum of possible outcomes *Weight in pounds (cannot have 0 due to it not existing)

Answer 62

-Discrete Random Variables ~Acountable set of possible outcomes *The variable number of leukemia cases in a geographic region in a given period *The variable number of success in n independent treatments *The variable number of smokers in a simple random sample of size n -Continuous random variable ~An unbroken continuum of possible outcomes *The variable Amount of time it takes to complete a task *The variable Height of an individual selected at random

Answer 63

-Probability mass function (pmf) = a mathematical relation that assigns probabilities to all possible outcomes for discrete random variables -Illustrative example: ~One rolls a die 2 times *Let X = the variable number of times one gets six *This is the pmf for the random variable X 0 1 2 Pr(X=x) 0.6944 0.2778 0.0278 Illustrative example 2 -"Four Patients" ~Suppose one treat four patients with an intervention that is successful 75% of the time *Let X = the variable number of successes in this experiment *This is the pmf for this random variable X 0 1 2 3 4 Pr(X=x) 0.0039 0. 0469 0.2109 0.4219 0.3164

Answer 64

-Intersection ~For two events A and B, the intersection A B represents the events that both A and B occur -Union ~For two events A and B, the union A U B represents the events that A or B occurs *A occurs without B, B occurs without A, or A and B both occur -Complement ~For an event A, the complement of A represents the event that occurs if A does not occur. It is typically denoted by A-bar

Answer 65

-Property 1 ~Probabilities are always between 0 and 1 -Property 2 ~A sample space is all possible outcomes *The probabilities in the sample space to 1 (exactly) -Property 3 ~The complement of an event is "the event not happening" *The probability of a complement is 1 minus the probability of the event **Pr(rain tomorrow) = 0.6 **Pr(not rain tomorrow) = 0.4 -Property 4 ~Probabilities of disjoint events can be added *Pr(X = 1) + Pr(X = 2) **X = number in die

Answer 66

-Property 1. 0 < Pr(A) < 1 -Property 2. Pr(S) = 1, where S represents the sample space (all possible outcomes) -Property 3. Pr (A-bar) = 1- Pr(A), A-bar represents the complement of A (NOT A) -Property 4. If A and B are disjoint, the Pr(A or B) = Pr (A) + Pr(B)

Answer 67

-Figure 5.2 -Property 1. 0 < Pr(A) < 1 ~Note that all individual probabilities are between 0 and 1 -Property 2 Pr(S) = 1 ~Note that the summ of all probabilities = .0039 + .0469 + .2109 + .4219 + .3164 = 1

Answer 68

-Property 3 Pr (A-bar) = 1- Pr(A) ~As an example, let A represent 4 successes Pr (A) = .3164 -Let A-bar represent the complement of A ("NOT A"), which is "3 or fewer" Pr(A-bar) = 1 - Pr(A) = 1 - 0.3164 = 0.6836

Answer 69

-Property 4 Pr(A or B) = Pr (A) + Pr(B) for disjoint events ~Let A represent 4 successes ~Let B represent 3 successes -Sine A and B are disjoin, Pr (A or B) = Pr(A) + Pr(B) = 0.3164 + 0.4219 = 0.7383 -The probability of observation 3 or 4 successes is 0.7383 or about 74%

Answer 70

-The area under curves (AUC) on a pmf corresponds to the probability -Pr (X = 2) ~area of shaded region = height x base *.2109(1.0) = .2109

Answer 71

-"Cumulative probability" refers to the probability of the value or less -Notation Pr(X < x) -Corresponds to AUC to the left of the point ("Left tail") -Ex: Pr (X < 2) ~Shaded "tail" 0.0039 + 0.0469 + 0.2109 = 0.2617

Answer 72

-Definitional formula for mean or expectation (p.111) -Definitional formula for variance (p.111) .

Answer 73

. X 0 1 2 3 4 Pr(X=x) 0.0039 0.0469 0.2109 0.4219 0.3164 How to calculate the expected mean? u = 0*0.0039 + 1*0.0469 + 2*0.2109 +3*0.4219 + 4*0.3164 = 3

Answer 74

. X 0 1 2 3 4 Pr(X=x) 0.0039 0.0469 0.2109 0.4219 0.3164 How to calculate the variance? (0-3)^2 * 0.0039 + (1-3) ^2 * 0.0469 + (2-3) ^2 * 0.2109 + (3-3)^2 * 0.4219 + (4-3)^2 * 0.3164 = 0.75

Answer 75

-Continuous random variables form a continuum of possible values -As an illustration, consider the spinner -The spinner will generate a continuum of random numbers between 0 to 1 -A probability density function (pdf) is a mathematical relation that assigns probabilities to all possible outcomes for a continuous random variable -The pdf for our random spinner is shown here -The shaded area under the curve represents probability, in this instance Pr(0 < X < 0.5) = 0.5 0.5 - 0 = 0.5 * 1 = 0.5 Pr( 0.25 < x < 0.5 ) = 0.25 0.5 - 0.25 = 0.25 * 1 = 0.25 Pr( X > 0.7) = 0.3 1 - 0.7 = 0.3 * 1 = 0.3

Answer 76

-pdfs obey all the rules of probabilities -pdfs come in many forms (shapes) ~Uniform pdf ~Normal pdf ~Chi-square pdf ~Exercise 5.13 pdf *The most common pdf is the normal (We study the Normal pdf in detain in the next chapter)

Answer 77

-As was the case with pmfs, pdfs display probability with the area under the curve (AUC) -This histogram shades bars corresponding to ages < 9 (~40% of histograms) -This shaded AUC on the Normal pdf curve also corresponds to ~40% of total X = age X Normal Pr (X < 9) = 0.4

Answer 78

-Binomial = a family of discrete random variables -Binomial Random Variable = the random number of successes in n independent Bernoulli trials (a Bernoulli trial has two possible outcomes: "success" or "failure" -Binomials random variables have toe parameters ~n = number of trials ~P = probability of success of each trial

Answer 79

-Consider the random number of successful treatments when treating four patients -Suppose the probability of success in each instance is 75% -The random number of successes can vary from 0 to 4 -The random number of successes is a binomial with parameters n = 4 and p = 0.75 -Notation ~Let X ~b(n,p) represent a binomial random variable with parameters n and p *The illustration variable is X ~ b(4, 0.75)

Answer 80

-Formula for binomial probabilities Pr(X = x) = nCx p^x q^(n-x) -Where ~nCx = the binomial coefficient (next slide ~p = probability of success for each trial ~q = probability of failure = 1-p

Answer 81

-Formula for the binomial coefficient nCx = (n!) / (x! (n-x)!) -Where ! represent the factorial function, calculated -X! = x * (x-1) * (x-2) *....* 1 -Ex: ~ 4! = 4*3*2*1 = 24 -By definition 1! = 1 and 0! = 1 -Ex: 4C2 = (4!) / (2!) (4-2) = (4!) / (2!) (2) ! = (4*3*2*1) / (2*1) (2*1) = 6

Answer 82

nCx = (n!) / (x!(n-x))! -The binominal coefficient is called the "choose function" because it tells you the number of ways you could choose x items out of n nCx = the number of ways to choose x items out of n -Ex: 4C2 = 6 means there are six ways to choose two items out of four

Answer 83

-Recall the "Four patients example" -Four patients; probability of success of each treatment = 0.75 -The number of success is the binomial random variable X ~b(4, 0.75) -Note q = 1 - 0.75 = 0.25 -What is the probability of observing 0 successes under these circumstances? Pr (X = 0) = nCx p^x q^(n-x) 4C0 * 0.75^0 * 0.25^(4-0) (4!) / (0! * 4!) * 0.75 ^0 * 0.25 ^4 1 * 1 * 0.0039 0.0039 Pr(X= 1) = 4C1 * 0.75^1 * 0.25^4-1 4 * 0.75 * 0.0156 0.0469 Pr(X= 2) = 4C2 * 0.75^2 * 0.25^4-2 6 * 0.5625 * 0.0625 0.2109 Pr(X = 3) = 4C3 * 0.75^3 * 0.25^4-3 4 * 0.4219 * 0.25 0.4219 Pr(X = 4) = 4C4 * 0.75^4 * 0.25^4-4 1 * 0.3164 * 1 0.3164

Answer 84

-Recall the area under the curve (ACU) concept ACU = probability

Answer 85

-Recall the cumulative probability concept -Cumulative probability = the probability of that value or less -Pr(X < x) -Correspond to left tail of pmf

Answer 86

-Cumulative probability function lists cumulative probabilities for all possible outcome -Ex: ~The cumulative probability function for X ~b(4, 0.75) Pr(X < 0) = 0.0039 Pr(X < 1) = 0.0508 Pr(X < 2) = 0.2617 Pr(X < 3) = 0.6836 Pr(X < 4) = 1.000

Answer 87

-The expected value (mean) u of a binomial pmf is its "balance point" -The variance ^2 is its spread -Shortcut formula u =np ^2 = npq

Answer 88

-For the "Four patients" pmf of X~b(4, 0.75) u = n*p 4(0.75) = 3 ^2 = n(p)(q) 4(0.75)(.25) = 0.75

Answer 89

-Suppose we observe 2 successes in the "Four patients" example -Note u = 3, suggesting we should see 3 success on average -Does the observation of 2 successes cast doubt on p = 0.75 -No, because Pr(X < 2) = 0.2617 is not too unusual

Answer 90

-Normal random variables are the most common type of continuous random variable -More importantly, describe the behavior of means

Answer 91

-Recall the continuous random variables are described with smooth probability density functions (pdfs) - see chapter 5 -Normal pdfs are recognized by their familiar bell-shape

Answer 92

-Histogram with overlying Normal Curve ~The overlying curve represents its Normal pdf model

Answer 93

-The darker bars of the histogram in Figure 7.2 correspond to ages less than or equal to 9 (~40% of observations) -This darker area under the curve (see Figure 7.3) also correspond to ages less than 9 (~40% of the total area)

Answer 94

-Proportion less than 9 shaded darker color

Answer 95

-Proportion less than 9 (area under the curve) ~This shaded area is the probability associated with the range 0-9 years old f(x) = 1 / (sq root 2 pi sigma) e^((-1/2)((x-u) / sigma)^2

Answer 96

-Normal pdfs are a family of distributions -family members identified by parameters mu (mean) and sigma (standard deviation -mu control location (see Figure 7.4) -sigma control spread (see Figure 7.5)

Answer 97

-Point of inflections (where the slopes of the curve begins to level) occur one sigma below and about mu

Answer 98

-Normal distribution is often written as N(mu, sigma^2) to indicate that the density curve depends upon the parameters mu and sigma^2, which are the mean and variance of the random variable ~mu corresponds to the middle of the curve ~sigma^2 determines the spread of the curve -The standard Normal Distribution is a normal distribution with mu = 0, sigma = 1

Answer 99

-A standard normal random variable is generally denoted as Z ~The area between a and b under the standard normal density curve provides the probability that Z will assume a value over the interval (a,b): P(a

Answer 100

-Let Z be a standard normal random variable ~Find the following probabilities using Table B a) P(Z < 1.96) = 0.9750 b) P(-2.00 < Z < 2.00) = 0.0228 < Z < 0.9772 = 0.9772 - 0.0228 = 0.9544 c) P (Z > -1.28) = Z > 0.1003 = 1 - 0.1003 = 0.8997 d) P (-5.13 < Z < 2.00) = 0 < Z < 0.9772 = 0.9772 - 0 = 0.9772 e) P (Z = 1.71) = 0

Answer 101

-To determine a Normal Probability ~State the problem ~Standardize the value (z score) ~Sketch and shade the curve ~Use Table B to determine the probability

Answer 102

-Standard Normal Variable = a Normal random variable with mu = 0 and sigma = 1 -Called "z variables" -Notation Z ~N(0,1) -Use Table B to look up cumulative probabilities

Answer 103

-Portion of Table B highlighting P (Z < 1.96) = 0.9750

Answer 104

-We want to determine the percentage of human gestations that are less than 40 weeks in length -We know that uncomplicated human pregnancy from conception to birth is approximately Normally distributed with mu = 39 wees and sigma = 2 weeks *Note: clinicians measure gestation from the last menstrual period to birth, which adds 2 weeks to the sigma X = human gestation in weeks -Let X represent human gestation X ~N (39,2) -Statement of the problem Pr(X < 40) =

Answer 105

-To standardize, subtract mu and divide by sigma Z = (x-mu) / sigma -The z-score tells one how the number of sigma-units the value falls above or below mu -Ex: ~The value 40 from X~N(39,2) has Z= (40-39) / 2 = 0.5 Pr(Z < 0.5) = 0.6915

Answer 106

-Sketch and label axes -Use Table b to lookup Pr(Z < 0.5) = 0.6915

Answer 107

-Let a represent the lower boundary and b represent the upper boundary of a range Pr(a < Z < b) = Pr (Z < b) - Pr (Z < a)

Answer 108

-Use Table B to look up the z-percentile value ~Ex: *The score for the probability in questions -Look inside the table for the entry closest to the associated cumulative probability -Then trace the z score to the row and column labels

Answer 109

-Suppose one wanted the 97.5th percentile z score ~Look inside the table for 0.975 *Then trace the z-score to the margins -Notation ~Let Zp represents the z-score with cumulative probability p ~EX: Z.975 = 1.96

Answer 110

-Statistical inference is the act of generalizing from a sample to a population with a calculated degree of certainty ~We are curious about parameters in the population ~We calculate statistics in the sample

Answer 111

-It is essential to draw the distinction between parameters and statistics Parameters Statistics Source Population Sample Calculated? No Yes Constant? Yes No Notation (examples) Mu, sigma, p x-bar, s, p̂

Answer 112

-How precisely does a given sample mean reflect the underlying population mean?

Answer 113

-Age ~Population 65 students in our CHS 280 class -Which sample mean reflects the underlying population mean more precisely? -If the sample size is 3 ~Sampling distribution of the sample mean N= 65 mu = age X-bar1 = (18 + 18 + 19) / 3 = X-bar2 = (19 + 20 + 21) / 3 = -If the sample size is 50 ~Sampling distribution of the sample mean N = 65 X-bar1 = (...+...+...+...) / 50 =

Answer 114

-Population (Individual observation) Sigma -Sampling Distribution of x-bar Sigma / (sq root n)

Answer 115

-Standard error of the mean Sigma lower x-bar SE lower x-bar Sigma / (sq root n) *The square root law says the SE of the mean is inversely proportional to the square root of the sample size

Answer 116

-For n = 1 -> SE lower x-bar = Sigma / (sq root n) = 15 / sq root 1 = 15 -For n = 4 -> SE lower x-bar = Sigma / (sq root n) = 15 / sq root 4 = 7.5 -For n = 16 -> SE lower x-bar = Sigma / (sq root n) = 15 / sq root 16 = 3.75 *Quadrupling the sample size cut the SE in half Square root law

Answer 117

-Sampling distribution of the mean based on n = 10 compared of population values, Wechsler Adult Intelligence Scale scores

Answer 118

-Sampling distribution of x-bar tends toward Normality even when the population distribution is not Normal ~This effect is strong in large samples

Answer 119

-As a sample size gets larger and larger, the sample mean tends to get closer and closer to the mu

Answer 120

-Wechsler Adult Intelligence Scale (WAIS) scores vary according to a Normal distribution with mu = 100 and sigma = 15 a) what can we say about the sampling distribution of a mean based on an SRS of 10 such scores? mu = 100 sigma = 15 SE lower x-bar = sigma / (sq root n) = 15 / (sq root 10) = 4.7434 X-bar~N (100, 4.74) b) What is the probability of getting an x-bar less than 90? Pr(X-bar < 90) = ? X-bar to Z = (X-bar - mu) / SE lower X-bar Pr (Z < 90) = (X-bar - 100) / 4.74 = (90 -100) / 4.74 = -2.109 = 0.0174

Answer 121

-Binomial Random Variable ~Random number of successes (X) in n independent "success/ failure" trials ~Probability of success for each trial is p -Notation X~b(n,p) ~When n is large (npq > = 5), we can do normal approximation to the Binomial

Answer 122

Mu = np and sigma = sqroot npq -When Normal approximation applies X~N (np, sq root npq)

Answer 123

-mu = p, and sigma = sq root (pq) / n p̂~N(p, sq root ((pq) / n))

Answer 124

Recent statistics claim the prevalence of maternal smoking is quite low, at only 5% ~Suppose another research group sampled 107 pregnant mothers in their third trimester n = 107 p = 0.05 q = 0.95

Answer 125

n = 107 p = 0.05 q = 0.95 A. Can we assume an approximation to the normal distribution for this case? npq = 5.0825

Answer 126

n = 107 p = 0.05 q = 0.95 B. Calculate the probability of observing at least 12 mothers among 107 are smokers during their pregnancy using a Normal Approximation mu = np = (107 * 0.05) = 5.35 sigma = (sq root npq) = (Sq root 107 * 0.05 * 0.95) = 2.2544 X = 12 X~N (5.35, 2.25444) Pr(X> 12) = Pr ( Z > 12) = (12-5.35)/ 2.25444 = 2.949 = 2.95 = 0.9984 1- 0.9984 = 0.0016

Module 1-8 Flashcards

(150 cards)