Final Of Everythin Flashcards

1
Q

Data Matrix

A

A convenient way to store data (eg spread sheet, table). Each row is a unique case (observational unit). Each column corresponds to a variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The two types of variables

A

Numerical or Categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Numerical Variables

A

Can be discrete or continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Categorical Variables

A

Can be ordered or nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What type of variable is “Number of Siblings”?

A

Numerical (discrete)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What type of variable is “Student Height”?

A

Numerical (continuous)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What type of variable is “Previous Stats Courses Taken”?

A

Categorical (nominal)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explanatory variables might affect

A

Response variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Two types of data collection

A

Observational Studies and Experiments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Researchers collect data passively they merely observe

A

Observational studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Researchers actively control the data collection trying to establish causation

A

Experiments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Sampling principles and strategies

A

1st step: Identify topics and questions to be investigated
2nd: clearly laid out research questions is important to identify important subjects/causes and what variables are important
3rd: Consider how data are collected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Example: suppose we want to estimate household size where a household is defined as people living together in the same dwelling and sharing living accommodation. If we selected students at random at an elementary school and asked them what their family size is, wilk this be a good measure of house hold size

A
  • Average will be biased
  • Only measuring households with children, not single people or people without children.
  • Would likely estimate a higher number than the true number.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Relationship between Sample and Population

A

Sample is a subset of population:
Population- people
Sample- a group of selected people

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Three sampling methods

A

1) simple random sample
2) stratified sample
3) cluster sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Simple random sample

A

Randomly selected from population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What type of sample is cars passing through intersections in Kelowna

A

Simple random sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Stratified sample

A

Cases grouped into strata, then simple random sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Cluster sample

A

Divide into clusters and sample all

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Multistage sampling

A

Clusters are sampled randomly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Scatterplot

A

A way to provide case by case view of data. Can visualize relationship between two numerical variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Dot plot

A

Visualize one numerical variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Sample mean (sample average formula)

A

x̄ = (x1 + x2 + x3 +… +xn)/n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the unit of sample mean

A

The same as the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Symbol for population mean

A

μ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Histograms

A

Provides a view of the data density (ie the data distribution)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Unimodal histogram distribution

A

A single prominent peak

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Bimodal/ multimodal histogram distribution

A

Several prominent peaks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Uniform histogram distribution

A

No apparent peaks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Types of skewness

A

Right skewed (tail on right), left skewed (tail on left) or symmetric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Deviation

A

Distance from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Sample variance

A

S^2 = ((x1- x̄)^2 + (x2-x̄)+…+(xn-x̄)^2)/(n-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What are the units of sample variance?

A

Squared of the units of the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Sample standard deviation formula

A

S =sqrt(s^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Population variance formula

A

σ^2 = ((x1-x̄)^2 +… (xn-x̄)^2)/n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Population standard deviation

A

σ = sqrt (σ^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Main components of a box plot

A
  • Median Q2
  • First quartile Q1 (median of half)
    -third quartile Q3 (median of other half)
    -Max and min wiskers Q3 + 1.5IQR and Q1-1.5IQR
  • IQR is Q3-Q1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

IQR formula

A

Q3-Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Steps to draw a box plot

A

1) Draw a thick line for the median (Q2)
2) Draw rectangle with bounds Q1 and Q3
3) Draw a dotted line for Q1-1.5IQR and Q3+1.5IQR
4) Label outliers and draw T shape upper/lower whiskers ( they only go as far as highest or lowest data points)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Robust Statistics

A

Median and IQR are more robust than mean and standard deviation (less affected by outlier behavior)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Common practices

A

-Symmetric distributions-> mean and SD
-Skewed distributions -> median and IQR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What type of plot would be most useful for visualizing the data density

A

Histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Suppose a data set only has two values. What can you say about the relationship between mean and median?

A

Mean= median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Consider a population of [1,2,3,4,10]. What are three mean and variance (VAR)?

A

Mean =4 Var=10

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Consider a population of [1,2,3,4,10]. What are three mean and variance (VAR)?

A

Mean =4 Var=10

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

A company records the commute distances of all 42 of its employees. By mistake the smallest commute was measured at 1 mile instead of 10. compre recorded median to actual median

A

The recorded median will be the same as the actual median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Suppose we are interested in estimating the malaria rate known as a dense tropical portion of a southeastern country. We learn there are 30 villages, each more or less similar to the next. Our goal is to test 150 individuals. What sampling method should be used

A

Cluster sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What are the odds of rolling a 1 with a fair dice

A

1/6

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Probability Definition

A

The probability of an outcome is the proportion of times the outcome would occur if we observed the random process an infinite number of times

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Mutually exclusive or disjoint

A

Have no outcomes in common

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Outcome

A

Random result from an experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

Event

A

Set of outcomes has probability assigned to it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

Sample space

A

All possible outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

Complement

A

Probability that the event does not occur

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

There are 18 balls in a box. Five are white, thirteen are black. Choose two balls at random, on after another find the probability that both chosen balls are white

A

20/306

56
Q

A fair coin is flipped twice what is the probability at least one flip is tails

A

3/4

57
Q

Twenty students including Miriam and Rachel are to be placed in four classes of equal size at random. What is the probability they end up in different classes?

A

15/19

58
Q

If two events are independent then P(A|B) = P(B|A)

A

No

59
Q

Random Variable

A

An assignment of numbers to outcomes in some sample space

60
Q

Dataset

A

Mean and variance

61
Q

Random variable

A

Expected value (similar to mean) and variance.

62
Q

Expected value equation

A

E(x) = x1 P(x=x1) + x2 P(x=x2) +…+ xn P(x=n)

63
Q

Expected value symbol

A

E(x) or μ

64
Q

Variance of Random Variables (RV)

A

Var(x) = (x1-μ)^2 p(x=x1) +…+ (x2-μ)^2P(x=xn)

65
Q

Variance of X symbols

A

Var(x) or σ^2

66
Q

Standard deviation notiation

A

SD(x) or σ

67
Q

E(ax)=

A

aE(x)

68
Q

E(ax+b)

A

aE(x) +b

69
Q

SD(ax) =

A

|a| SD(x)

70
Q

SD(ax+b) =

A

|a| SD(x)

71
Q

Var(ax) =

A

a^2 Var(x)

72
Q

Dependent Events probability notation

A

P( A n B) = P(B) P(A|B)

73
Q

Independent events probability notation

A

P(A n B) = P(B) * P(A)

74
Q

Area under the gaussian curve

A

Area = 1

75
Q

Normal distribution Parameter notation

A

N( μ, σ)
Mean- μ
Standard deviation- σ

76
Q

What is a Z score

A

A z score does the conversion to N(0,1)
A z score is a way to describe the relationship of a value to the mean of a group of values

77
Q

Z score formula

A

z = (x- μ)/σ

78
Q

Quantile

A

A quantile os an equal distribution of the probability distribution eg quartile 4 groups, percentile (100 groups)

79
Q

Q-Q plot of symmetric distribution

A

Straight line following y=x

80
Q

Q-Q plot of T shaped distribution

A

Starts lower than the line y=x then meets the line at the origin then slowly goes above the line.

81
Q

Q-Q plot of a right skew distribution

A

Concave up curve, curve points right

82
Q

Q-Q plot of left skew data

A

Concave down curve, pointing left

83
Q

Geometric distributions

A
  • goes until something happens (ie successful outcome)
  • a series of independent trials with two outcomes
84
Q

Binomial distribution:

A
  • # of successes in a set # of trials-two variables success or failure
85
Q

4 conditions of binomial distribution

A

1) trials are independent
2) # of trials, n, is fixed
3) each trial is success or failure
4) probability of success, p, is same for each trial.

86
Q

Confidence Intervals

A

A confidence interval is the range of values to which we are a certain percentage confident (95%) that pur sample measurement represents the actual population mean.

87
Q

Point estimate

A

A point estimate is the calculation of a single value which is the best guess as to the population parameter which is unknown (eg mean, proportion in support of a statement)

88
Q

Population proportion notation

A

P

89
Q

Sample proportion notation

A

90
Q

Central limit theorem

A

-When many sample means are taken, the distribution of these sample means look like a normal distribution (particularly for larger sample sites)
- The populations distribution (even when skewed) does not actually change this normal distribution appearance of the sample means.

91
Q

How large is large enough when it comes to sample size?

A

Generally n= 30

92
Q

Success failure condition

A

np>= 10 and n(1-p) >=10

93
Q

95% confidence interval of containing the mean

A

Point estimate +- 1.96 *SE

94
Q

Standard Error SE

A

SE = σ/sqrt(n)

σ- population standard deviation
n- sample size

95
Q

The 95% confidence interval means:

A

Roughly 95% of the time, the interval sample mean +- 1.96 σ/sqrt(n) will contain the population mean

96
Q

99% confidence interval

A

Point estimate +- 2.58 σ/sqrt(n)

97
Q

Consider the case for finding confidence intervals without population standard deviation

A
  1. Use sample SD instead of population SD
  2. Use t-tables instead of z table
98
Q

t formula

A

t = (x̄- μ)/(s/sqrt(n))

99
Q

Proof by contradiction

A

If the prob is very small we should reject the claim and accept our conjecture. Either you are observing a rare event or something is wrong about the original claim

100
Q

Four steps of proof by contradiction

A

1) state hypothesis:
- null hupothesis Ho : μ =
- alternative hypothesis Ha: μ …
2) compute z score from the sample mean
3) find the pvalue: area to the right of z score
4) make the decision:
- reject the null hypothesis and accept alternative or accept null hypothesis based on alpha value

101
Q

When do you use z tables

A

Population SD is given and you are trying to estimate population mean

102
Q

When do you use t tables

A

Population SD is not given and you are trying to estimate population mean

103
Q

When do you use chi squared tables (X^2)

A

Population SD is not given and you are trying to estimate population variance.

104
Q

Using chi squared tables

A
  • Examine a row for distributions with degree of freedom
    -Identify a range for the area (eg 0.025 to 0.05)
    -Chi squared table provides upper tail values which is different than z- and t distribution tables
105
Q

Population variance confidence interval

A

[ (n-1)s^2/x2^2 , (n-1)s^2/x1^2]

106
Q

What is instrumentation?

A

Term to describe the instruments used to measure physical quantities eq, pressure temperature, voltage

107
Q

Active instruments

A

-Have external power
- expensive (complicated)
- resolution can be very small

108
Q

Passive instruments

A

-Do not have external power
- inexpensive (simple)
-resolution is limited

109
Q

Null type instruments

A

-No display
- null pressure gauges have weights coming on/off to measure pressure( cumbersome) weights are balanced until reference mark is reached.

110
Q

Deflection type instruments

A

-Display,
-previous pressure gauges conveniently has a pointer against a scale

111
Q

Analog instruments

A
  • has output vary continuously. Resolution is determined by what your eye can distinguish
112
Q

Digital instruments

A

-Has discrete steps in resolution
- requires analog to digital converter (A/D)
-Expensive
- Slow, not good for fast processes

113
Q

Smart instruments

A

Has a microprocessor

114
Q

Non-smart instruments

A

Does not have a microprocessor

115
Q

Inaccuracy

A

The extent to which a reading might be wrong and is often quotes as a percentage of the full scale(f.s) (max value) reading of an instrument.

116
Q

Tolerance

A

Describes the maximum deviation of a manufactured component from some specified value.

117
Q

Range or span

A

Defines the min and max values of quantity that instruments can measure

118
Q

Threshold

A

The input will have to reach a certain level before then change in the instruments output is large enough to be detectable

119
Q

Resolution

A

The lower limit on the magnitude of change in the input measured quantity that produces an observable change in the instrument output.

120
Q

Nonlinearity

A

Maximum deviation of any of the output readings marked x from the straight line.

121
Q

Linear-regression line

A

Estimate of y is z-score = R any given x in z-score

122
Q

Linear regression line:

A

(ŷ-ȳ)/sy = R (x-x̄)/sx

ȳ- ave pf y data points

ŷ- the line of best fit

123
Q

Sensitivity

A

-Slope
- the measure of change in instrument output that occurs when the quantity measured changes by a given amount
-scale deflection/ value of measurement producing deflection

124
Q

Zero drift

A

Bias zero reading of instrument modified by change in ambient conditions

125
Q

Sensitivity drift

A
  • slope (ie sensitivity) drifts because of change in ambient conditions
    -eg modulus of elasticity in spring changing as a function of temperature
126
Q

Sensitivity drift coefficient

A

= sens drift/ change in environment

127
Q

Zero drift coefficient

A

= Zero drift/ change in environment

128
Q

Reasons for incorrect or inaccurate measurements

A
  • Behavior will gradually dicerge from the stated specifications
    -effects of dust dirt fumes and chemicals in the environment
129
Q

Several factors impacting rate of divergence

A
  1. type of instrument
  2. the frequency of usage
  3. severity of the operating conditions
130
Q

Systemic errors

A

Mean is wrong (accuracy)

131
Q

Random errors

A

Standard deviation is large (precision)

132
Q

What are static characteristics

A

1) linearity
2) tolerance
3) sensitivity

133
Q

What is the strength of a linear fit model?

A

The R squared value

134
Q

How many outcomes in a Bernoulli trial.

A

There can only be 2 outcomes

135
Q

Expected number and SD given probability

A

1/p - expected value
Sqrt( (1-p)/p^2) - standard deviation