Exam Revision Flashcards

Question

Mode

Answer 1

A measure of central tendency. Value that occurs most often (the most frequent). Not affected by extreme values. Never use the mode by itself, always use in conjunction with median or mean. Unlike mean and median, there may be no unique (single) mode for a given data set. Used for either numerical or categorical (nominal) data.

Answer 2

Quartiles split the ranked data into four segments, with an equal number of values per segment. The first quartile, Q1, is the value for which 25% of the observations are smaller and 75% are larger. The second quartile, Q2, is the same as the median (50% are smaller, 50% are larger). Only 25% of the observations are greater than the third quartile, Q3

Answer 3

Measures of variation give information on the spread or variability of the data values

Answer 4

Like the median and Q1 and Q2, the IQR is a resistant summary measure (resistant to the presence of extreme values) Eliminates outlier problems by using the interquartile range, as high- and low-valued observations are removed from calculations. IQR = 3rd quartile – 1st quartile. IQR = Q3 - Q1

Answer 5

Measures average scatter around the mean. Units are also squared. This measure tells you the average deviation of the mean. The reason we square the values is because some are negative and some are positive. The sample variance is the squared average difference between the mean.

Answer 6

Most commonly used measure of variation. Shows variation about the mean. Has the same units as the original data. It can be considered a measure of uncertainty.

Answer 7

Measures relative variation i.e. shows variation relative to mean. Can be used to compare two or more sets of data measured in different units. Always expressed as percentage (%)

Answer 8

The difference between a given observation and the mean, divided by the standard deviation. A Z score of 2.0 means that a value is 2.0 standard deviations from the mean. A Z score above 3.0 or below -3.0 is considered an outlier

Answer 9

Describes how data are distributed. Measures of shape are symmetric or skewed

Answer 10

When the data is left or negatively skewed the distance between the q1 and q2 is greater than the distance between q2 and q3. The reverse applies for right or positively skewed data. If the data is symmetric the distances are the same

Answer 11

Box and whisker plot show location, spread and shape.

Answer 12

the average of the squared deviations of values from the mean

Answer 13

shows variation about the mean. is the square root of the population variance. has the same units as the original data

Answer 14

The sample covariance measures the strength of the linear relationship between two numerical variables. Only concerned with the direction of the relationship. No causal effect is implied. Is affected by units of measurement

Answer 15

Measures the relative strength of the linear relationship between two variables

Answer 16

Also called Standardised Covariance i.e. invariant to units of measure. Ranges between –1 and 1. The closer to –1, the stronger the negative linear relationship The closer to 1, the stronger the positive linear relationship. The closer to 0, the weaker the linear relationship

Answer 17

Numerical data summarised by quartiles. Xsmallest Q1 Median Q3 Xlargest

Answer 18

a priori Empirical Subjective

Answer 19

Classical probability. Based on prior knowledge

Answer 20

Classical probability. Based on observed data

Answer 21

Subjective probability. Based on individual judgment or opinion about the probability of occurrence

Answer 22

a numerical value that represents the chance, likelihood, possibility that an event will occur (always between 0 and 1)

Answer 23

A discrete probability can only take certain values.

Answer 24

A fixed number of observations Two mutually exclusive and collectively exhaustive events Constant probability for each observation Observations are independent

Answer 25

Index numbers allow relative comparisons over time. Index numbers are reported relative to a Base Period Index. Base period index = 100 by definition. Used for an individual item or measurement.

Answer 26

Paasche is more accurate but more difficult to achieve.

Answer 27

Bell-shaped Symmetrical Mean, median and mode are equal Central location is determined by the mean Spread is determined by the standard deviation (IT IS THE POPULATION STANDARD DEVIATION) The random variable x has an infinite theoretical range

Answer 28

Probability

Answer 29

Do the mean and median have similar values? (Remember there may be no unique mode or there may be multiple modes.) Is the interquartile range approximately 1.33 times the standard deviation? Is the range approximately 6 times the standard deviation?

Answer 30

Do approximately 2/3 of the observations lie within mean 1 standard deviation? Do approximately 80% of the observations lie within mean 1.28 standard deviations? Do approximately 95% of the observations lie within mean 2 standard deviations?

Answer 31

Mathematical expression that defines the distribution of the values for a continuous random variable.

Answer 32

A sampling distribution is a distribution of all of the possible values of a statistic for a given size sample selected from a population.

Answer 33

Different samples of the same size from the same population will yield different sample means. A measure of the variability in the mean from sample to sample is given by the Standard Error of the Mean. Note that the standard error of the mean decreases as the sample size increases.

Answer 34

We can apply the Central Limit Theorem, which states that regardless of the shape of individual values in the population distribution, as long as the sample size is large enough (generally n ≥ 30) the sampling distribution of XBAR will be approximately normally distributed with:

Answer 35

Selecting all possible samples of a certain size, the distribution of all possible sample proportions is the sampling distribution of the proportion.

Answer 36

Every individual or item from the frame (N) has an equal chance of being selected (1/N). Selection may be with replacement or without replacement. Samples can be obtained from a table of random numbers or computer random number generators. Simple to use but may not be a good representation of the population’s underlying characteristics.

Answer 37

Divide frame of N individuals into n groups of k individuals: k = N/n. Randomly select one individual from the 1st group. Select every kth individual thereafter. Like simple random sampling, simple to use but may not be a good representation of the population’s underlying characteristics.

Answer 38

Divide population into two or more subgroups (called strata) according to some common characteristic. A simple random sample is selected from each subgroup, with sample sizes proportional to strata sizes – called proportionate stratified sampling. Samples from subgroups are combined into one.

Answer 39

More efficient than simple random sampling or systematic sampling because of assured representation of items across entire population. Homogeneity of items within each stratum provides greater precision in the estimates of underlying population parameters.

Answer 40

Population is divided into several ‘clusters’, each representative of the population e.g. postcode areas, electorates etc. A simple random sample of clusters is selected: All items in the selected clusters can be used, or items can be chosen from a cluster using another probability sampling technique.

Answer 41

More cost effective than random sampling, especially if population is geographically widespread. Often requires a larger sample size compared to simple random sampling or stratified sampling for same level of precision.

Answer 42

Coverage error – appropriate or adequate frame? Non-response error – results in non-response bias. Measurement error – ambiguous wording, halo effect or respondent error. Sampling error – always exists and is the difference between sample statistic and population parameter.

Answer 43

A point estimate is the value of a single sample statistic.

Answer 44

A confidence interval provides a range of values constructed around the point estimate.

Answer 45

An interval gives a range of values: Takes into consideration variation in sample statistics from sample to sample. Based on observations from 1 sample. Gives information about closeness to unknown population parameters. Stated in terms of level of confidence. Can never be 100% confident.

Answer 46

In the long run, 90%, 95% or 99% of all the confidence intervals that can be constructed (in repeated samples) will contain the unknown true parameter.

Answer 47

Assumptions: Population standard deviation σ is known Population is normally distributed If population is not normal, use Central Limit Theorem.

Answer 48

Not necessarily. , A good but not perfect measure

Answer 49

If the population standard deviation σ is unknown, we can substitute the sample standard deviation, S. This introduces extra uncertainty, since S is variable from sample to sample. So we use the Student t distribution instead of the normal distribution: The t value depends on degrees of freedom denoted by sample size minus 1 i.e. (d.f = n - 1). d.f are number of observations that are free to vary after sample mean has been calculated.

Answer 50

: Number of observations that are free to | vary after sample mean has been calculated

Answer 51

We are 95% confident that the true percentage of left-handers in the population is between 0.1651 and 0.3349 i.e.: Although the interval from 0.1651 to 0.3349 may or may not contain the true proportion, 95% of intervals formed from repeated samples of size 100 in this manner will contain the true proportion.

Answer 52

The required sample size can be found to reach a desired margin of error (e) with a specified level of confidence (1 - alpha). The margin of error is also called a sampling error: The amount of imprecision in the estimate of the population parameter. The amount added and subtracted to the point estimate to form the confidence interval.

Answer 53

Always round up (sideways)

Answer 54

A hypothesis is a statement (assumption) about a population parameter

Answer 55

States the belief or assumption in the current situation (status quo) Begin with the assumption that the null hypothesis is true (similar to the notion of innocent until proven guilty) Refers to the status quo Always contains ‘=‘, ‘≤’ or ‘’ sign May or may not be rejected Is always about a population parameter; e.g. μ, not about a sample statistic

Answer 56

Is the opposite of the null hypothesis e.g. The average number of TV sets in Australia homes is not equal to 3 ( H1: μ ≠ 3 ) Challenges the status quo Can only can contain either the ‘’ or ‘≠’ sign May or may not be proven Is generally the claim or hypothesis that the researcher is trying to prove

Answer 57

Type I error Reject a true null hypothesis Considered a serious type of error Type II error Fail to reject a false null hypothesis

Answer 58

The probability of Type I error is alpha Called level of significance of the test; i.e. 0.01, 0.05, 0.10 Set by the researcher in advance The probability of Type II error is β

Answer 59

p-value: Probability of obtaining a test statistic more extreme ( ≤ or ) than the observed sample value, given H0 is true Also called observed level of significance Smallest value of  for which H0 can be rejected Obtain the p-value from Table E.2 or computer If p-value < alpha , reject H0 If p-value >= alpha , do not reject H0

Answer 60

Regression analysis is used to: predict the value of a dependent variable (Y) based on the value of at least one independent variable (X) explain the impact of changes in an independent variable on the dependent variable

Answer 61

Dependent variable (Y): the variable we wish to predict or explain (response variable)

Answer 62

Independent variable (X): the variable used to explain the dependent variable (explanatory variable)

Answer 63

Only one independent variable, X Relationship between X and Y is described by a linear function Changes in Y are assumed to be caused by changes in X

Answer 64

b0 and b1 are obtained by finding the values of b0 and b1 that minimise the sum of the squared differences between actual values (Y) and predicted values ( )

Answer 65

b0 is the estimated average value of Y when the value of X is zero

Answer 66

b1 is the estimated change in the average value of Y as a result of a one-unit change in X

Answer 67

The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable The coefficient of determination is also called r-squared and is denoted as r2

Answer 68

Linearity of the relationship Independence of error values Normality of error values constant variance of the errors of the probability distribution Check these assumptions by examining residuals

Answer 69

The residual for observation i, ei, is the difference between its observed and predicted value

Answer 70

Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (Xi).

Answer 71

r2 never decreases when a new X variable is added to the model. This can be a disadvantage when comparing models. What is the net effect of adding a new variable? We lose a degree of freedom when a new X variable is added. Did the new X variable add enough explanatory power to offset the loss of one degree of freedom?

Answer 72

Shows the proportion of variation in Y explained by all X variables adjusted for the number of X variables used. Penalises excessive use of unimportant independent variables. Smaller than r2 Useful in comparing among models.

Answer 73

Shows if there is a linear relationship between all of the X variables considered together and Y.

Answer 74

The errors are normally distributed. Errors have a constant variance. The model errors are independent.

Answer 75

A dummy variable is a categorical explanatory variable with two levels: yes or no, on or off, male or female coded as 0 or 1 Regression intercepts are different if the variable is significant. Assumes equal slopes for other variables. If more than two levels, the number of dummy variables needed is number of levels minus 1.

Answer 76

Numerical data obtained at regular time intervals. The time intervals can be annually, quarterly, daily, hourly etc. A time-series plot is a two-dimensional plot of time series data. The vertical axis measures the variable of interest. The horizontal axis corresponds to the time periods.

Answer 77

Trend component Seasonal component Cyclical component Irregular component

Answer 78

Long-run increase or decrease over time (overall upward or downward movement). Data taken over a long period of time. Trend can be upward or downward. Trend can be linear or non-linear.

Answer 79

Short-term regular wave-like patterns. Observed within 1 year. Often monthly or quarterly.

Answer 80

Long-term wave-like patterns. Usually occur every 2-10 years. Often measured peak to peak or trough to trough.

Answer 81

Unpredictable, random, ‘residual’ fluctuations. Due to random variations of: Nature. Accidents or unusual events. ‘Noise’ in the time series. Usually short duration and non-repeating.

Answer 82

A series of arithmetic means over time. Calculate moving averages to get an overall impression of the pattern of movement over time. Moving averages can be used for smoothing: averages of consecutive time-series values for a chosen period of length (L). Result dependent upon choice of L (length of period for computing means). Examples: For a 5 year moving average, L = 5. For a 7 year moving average, L = 7 etc.

Answer 83

Frequency distribution, histogram and graphing

Answer 84

EMPIRICAL RULE

Answer 85

BOX AND WHISKER

Answer 86

BAYES THEOREM

Answer 87

INVESTMENT RETURNS

Answer 88

PORTFOLIO RETURN AND RISK

Answer 89

INDEX NUMBERS INTERPRETATION

Answer 90

ALMOST 4 MONTHS WITH SAM!!! SHE'S SO INCREDIBLE AND MAKES ME SO HAPPY!!!!

Answer 91

NORMAL PROBABILITY PLOT

Answer 92

TUTORS NORMAL DISTRIBUTION EXAMPLE

Answer 93

STANDARD ERROR OF THE MEAN

Answer 94

SAMPLING DISTRIBUTION PROPERTIES

Answer 95

CENTRAL LIMIT THEOREM

Answer 96

CONFIDENCE INTERVAL ESTIMATION PROCESS

Answer 97

CONFIDENCE INTERVAL EXAMPLE

Answer 98

DETERMINING SAMPLE SIZE

Answer 99

OUTCOMES AND PROBABILITIES OF HYPOTHESIS TESTING

Answer 100

2 TAIL TESTS

Answer 101

P VALUE 2 TAIL TESTS

Answer 102

1 TAIL TESTS

Answer 103

P VALUE 1 TAIL

Answer 104

HYPOTHESIS TESTING FOR THE PROPORTION

Answer 105

SIMPLE REGRESSION MODEL AND EQUATION

Answer 106

SIMPLE REGRESSION EXAMPLE

Answer 107

INTERPOLATION V EXTRAPOLATION

Answer 108

EXAMPLES OF R2

Answer 109

COMPARING STANDARD ERRORS

Answer 110

F TEST FOR SIGNIFICANCE

Answer 111

CONFIDENCE INTERVAL ESTIMATE FOR THE SLOPE

Answer 112

MULTIPLE REGRESSION MODEL AND EQUATION

Answer 113

MULTIPLE REGRESSION EXAMPLE

Answer 114

ADJUSTED R2

Answer 115

SIGNIFICANCE F TEST MULTIPLE

Answer 116

ARE INDIVIDUAL VARIABLES SIGNIFICANT

Answer 117

CONFIDENCE INTERVAL ESTIMATE FOR THE SLOPE MULTIPLE

Answer 118

DUMMY VARIABLES

Answer 119

INTERACTION BETWEEN VARIABLES

Answer 120

TREND AND SEASONAL COMPONENT

Answer 121

MULTIPLICATIVE TIME SERIES MODEL

Answer 122

MOVING AVERAGES

Answer 123

LEAST SQUARES TREND FITTING

Answer 124

QUADRATIC FORM TREND FORECASTING

Answer 125

EXPONENTIAL TREND FORECASTING

Answer 126

MODEL SELECTION

Answer 127

RESIDUAL ANALYSIS FORECASTING

Answer 128

FORECASTING WITH SEASONAL DATA

Answer 129

QUARTERLY MODEL

Answer 130

No assumptions are necessary (Central limit theorem)

Answer 131

We are 95% confident that the average total compensation of all CEOs in the Service industry falls in the interval $2,181,260 to $5,836,180.

Answer 132

the probability of rejecting H0 when it is false.

Answer 133

P(A intersection B) = {P}(A) * {P}(B).

Answer 134

With the sample size increasing from n = 25 to n = 100, more sample means will be closer to the distribution mean. The standard error of the sampling distribution of size 100 is much smaller than that of size 25, so the likelihood that the sample mean will fall within  0.2 minutes of the mean is much higher for samples of size 100 (probability = 0.8413) than for samples of size 25 (probability = 0. 3830).

Answer 135

if all possible samples of the same size n are taken, 95% of them include the true population average monthly sales of the product within the interval developed. Thus you are 95% confident that this sample is one that does correctly estimate the true average amount.

Answer 136

No. Since the population standard deviation is known and n = 50, from the Central Limit Theorem, we may assume that the sampling distribution of is approximately normal.

Answer 137

The reduced confidence level narrows the width of the confidence interval.

Answer 138

The store owner can be 95% confident that the population mean retail value of greeting cards that the store has in its inventory is somewhere between $4.56 and $5.34. The store owner could multiply the ends of the confidence interval by the number of cards to estimate the total value of his inventory.

Answer 139

You are 95% confident that the population proportion of employers who have used a recruitment service within the past two months to find new staff is between 0.17 and 0.24. You are 99% confident that the population proportion of employers who have used a recruitment service within the past two months to find new staff is between 0.17 and 0.25.

Answer 140

When the level of confidence is increased, the confidence interval becomes wider. The loss in precision reflected as a wider confidence interval is the price you have to pay to achieve a higher level of confidence.

Answer 141

Decision rule: Reject if smaller than lower bound or greater than upperbound

Answer 142

photo in favourites on phone 31/5/2018

Answer 143

There is enough evidence to conclude the population mean delivery time has been reduced below the previous value of 25 minutes, at the 5% level of significance.

Answer 144

Since p- value = 0.0047 is less than alpha there is enough evidence to conclude the population mean delivery time has been reduced below the previous value of 25 minutes.

Answer 145

A larger sample size implies that there is more information about the population and reduces the standard error (variation) of the sample proportion

Answer 146

The samples used need to be random. As the sample size is large the condtions that np>5 and n(1-p) need t be met

Answer 147

You must assume the the observed sequence in which the data were collected is random and that the data are approx normally distributed

Answer 148

Photos on phone album