Probability Basics Flashcards

Question

What is the Chi square test

Answer 1

The chi square test is to establish ‘p’ or the probability of independence. p<0.05 is considered dependent! p > 0.05 is considered independent! Important points before we get started: • This test only works for discreet categorical data (data in categories), such as Gender {Men, Women} or color {Red, Yellow, Green, Blue} etc, but not numerical data such as height or weight. • The numbers must be large enough. Each entry must be 5 or more. In our example we have values such as 209, 282, etc, so we are good to go.

Answer 2

Calculate Slope m: m = (N * Σ(xy) − Σx Σy) / (N * Σ(x^2) − (Σx)^2) (N is the number of points.) Calculate Intercept b: b = (Σy − m * Σx) / N Assemble the equation of a line y = mx + b

Answer 3

It works by making the total of the square of the errors as small as possible (that is why it is called "least squares"): The straight line minimizes the sum of squared errors So, when we square each of those errors and add them all up, the total is as small as possible. You can imagine (but not accurately) each data point connected to a straight bar by springs: Be careful! Least squares is sensitive to outliers. A strange value will pull the line towards it.

Answer 4

It is also possible to group the values of collected data. they are grouped in Xs It is very useful when the scores have many different values. Or when we want to go from continuous to discreet data

Answer 5

Now calculate an approximate group size, by dividing the range by how many groups you would like. Then round that group size up to some simple value (like 2 instead of 1.83 or 5 instead of 4.26).

Answer 6

``` We can estimate the Mean by using the midpoints. EG: The groups (51-55, 56-60, etc), also called class intervals, are of width 5. The midpoints are in the middle of each class: 53, 58, 63 and 68 ``` Class Interval A / 2 = Midpoint A &a-z (Mid a * N of Mid a) / N Estimated Mean = Sum of (Midpoint × Frequency) / Sum of Frequency Add all Midpoints times N of data points within that midpoint Then divide that sum by N or All data points

Answer 7

Median is value at n/2. Estimated Median = L + [(n/2 − B) / G ] × w where: L is the lower class boundary of the group containing the median (This is tricky, it starts where the values would round up to the lowest possible number in that class interval) n is the total number of values B is the cumulative frequency of the groups before the median group G is the frequency of the median group w is the group width or class interval

Answer 8

We can easily find the modal group (the group with the highest frequency), Estimated Mode = L + (fm − fm-1) / (fm − fm-1 + fm − fm+1) × w L + (B-A) / ( B-A + B-C) * W where: L is the lower class boundary of the modal group (This is tricky, it starts where the values would round up to the lowest possible number in that class interval) A (or fm-1) is the frequency of the group before the modal group B (fm) is the frequency of the modal group C (fm+1) is the frequency of the group after the modal group w is the group width

Answer 9

It is a value we give to observed events that we are uncertain of beforehand 1. A Random Variable is a set of possible values from a random experiment. 2. The set of possible values is called the Sample Space. 3. A Random Variable is given a capital letter, such as X or Z. 4. Random Variables can be discrete or continuous. EG: We have an experiment (such as tossing a coin) We give values to each event The set of values is a Random Variable

Answer 10

How likely a random variable is to occur Probability of an event happening = Number of ways it can happen / Total number of outcomes ``` P(X = x) = x / n P(X = value) = probability of that value ``` (Notice the different uses of X and x: X is the Random Variable "The sum of the scores on the two dice". x is a value that X can take.)

Answer 11

Experiment: a repeatable procedure with a set of possible results.

Answer 12

Outcome: A possible result of an experiment.

Answer 13

Sample Space: all the possible outcomes of an experiment. The Sample Space is made up of Sample Points: Sample Point: just one of the possible outcomes

Answer 14

Event: one or more outcomes of an experiment

Answer 15

Random Variables can be either Discrete or Continuous: Continuous Data can take any value within a range (such as a person's height)

Answer 16

Random Variables can be either Discrete or Continuous: Discrete Data can only take certain values (such as 1,2,3,4,5)

Answer 17

When we know the probability p of every value x we can calculate the Expected Value (Mean) of X: μ = Σxp Note: Σ is Sigma Notation, and means to sum up. To calculate the Expected Value: - multiply each value by its probability - sum them up If all chances are random or equal then 1/n Σx = Σxp

Answer 18

The Variance is: Var(X) = Σx^2p − μ^2 To calculate the Variance: square each value and multiply by its probability sum them up and we get Σx^2p then subtract the square of the Expected Value μ^2

Answer 19

Standard Deviation is the square root of the Variance: σ = √Var(X) Var (X) = Σx^2p − μ^2

Answer 20

But what if the coins are biased (land more on one side than another) or choices are not 50/50. = p^k (1-p)^(n-k) Where p is the probability of each choice we want k is the the number of choices we want n is the total number of choices p = 0.7 (chance of chicken) k = 2 (chicken choices) n = 3 (total choices) So we get: = p^k (1-p)^(n-k) = 0.7^2 (1-0.7)^(3-2)

Answer 21

Now we know how to calculate how many: n!/(k!(n-k)!) And the probability of each: p^k (1-p)^(n-k) When multiplied together we get: Probability of k out of n ways: P(k out of n) = n!/(k!(n-k)!) * p^k (1-p)^(n-k) Important Notes: The trials are independent, There are only two possible outcomes at each trial, The probability of "success" at each trial is constant.

Probability Basics Flashcards

(45 cards)