Mid-Term Exam Flashcards

Question 1

Q

population

Answer

A

the group of all items (data) of interest.

- frequently very large; sometimes infinite.

Question 2

Q

sample

Answer

A

a sample of items (data) drawn from the population of interest.

potentially large but much less than population.
the sample is a subset of the population.

Question 3

Q

parameter

Answer

A

a descriptive measure of a population.

- Ex. population mean

Question 4

Q

statistic

Answer

A

a descriptive measure of a sample.

- Ex. sample mean

Question 5

Q

statistical inference

Answer

A

sample statistics are used to make inferences about population parameters, meaning an estimate, prediction or decision can be produced about a population based on sample data. therefore what is known about a sample can be applied to the larger population.

Question 6

Q

numerical data

Answer

A

values are real numbers
all calculations are valid
data may be treated as ordinal or nominal

Question 7

Q

nominal data

Answer

A

values are the arbitrary numbers that represent categories
only calculations, such as proportions based on the frequencies of occurrence are valid
data may be treated as ordinal or numerical

Question 8

Q

ordinal data

Answer

A

values must represent the ranked order of the data
calculations based on an ordering process are valid
data may be treated as nominal but not as numerical

Question 9

Q

bar chart

Answer

A

a bar chart is mainly used for nominal data and graphically represents the frequency of each category as a bar rising vertically from the horizontal axis.
- bar height is proportional to frequency of the corresponding category

Question 10

Q

pie chart

Answer

A

a circle that is subdivided into slices whose area are proportional to the frequencies, therefore displaying the proportion of occurrences of each category.
- popular tool to represent proportions of appearance for nominal data

Question 11

Q

steps to building a histogram (3)

Answer

A

1) collect the data
2) create a frequency distribution for the data
- determine number of classes
- determine class width
3) draw a histogram of rectangle bars using the class intervals and the corresponding frequencies

Question 12

Q

class width

Answer

A

generally best to use equal class widths. 
unequal class widths are used when the frequency associated with some classes is too low, then: 
- several classes are combined together to form a wider and more populated class
- it is possible to form an open-ended class at the higher or lower of the histogram

Question 13

Q

relative frequency

Answer

A

proportion of observations falling into each class, and should be used when comparing two or more histograms, each with different numbers/observations.
- often preferable than the frequency itself

Question 14

Q

class relative frequency (formula)

Answer

A

(class frequency) divided by (total number of observations)

Question 15

Q

equal class width (formula)

Answer

A

(largest value - smallest value) divided by (number of classes)

Question 16

Q

cumulative frequency of a class

Answer

A

the number of measurements less than the upper limit of that class.

Question 17

Q

to obtain the cumulative frequency of a class

Answer

A

add the frequency of that class with the frequencies of all previous classes.

Question 18

Q

cumulative relative frequency of a particular class

Answer

A

the proportion of measurements that are less than the upper limit of that class.

Question 19

Q

arithmetic mean

Answer

A

most popular and useful measure of central location.

all values are used
it is unique
the sum of the deviations from the mean is 0
calculated by summing the values and dividing by the number of values

Question 20

Q

median of a set of measurements

Answer

A

the value that falls in the middle when the measurements are arranged in order of magnitude.

unique median for each data set
commonly used measure of central location

Question 21

Q

mode of a set of observations

Answer

A

the value that occurs most frequently.

data set may have one, two or more modes (modal classes)
useful for all data, mainly used for nominal
for large data sets, modal class is more relevant than a single-value mode

Question 22

Q

which measure of central location?

Answer

A

mean is generally first selection unless outliers are present in the dataset, then the median should be used.
mode is seldom the best measure of central location.
median is not as sensitive to extreme as is the mean.

Question 23

Q

variance

Answer

A

this measure of dispersion reflects the values of all the measurements.

Question 24

Q

standard deviation

Answer

A

the square root of the variance of the measurements.

Question 25

Q

empirical rules

Answer

A

approximately 68% of all observations fall within 1 standard deviation of the mean
approximately 95% of all observations fall within 2 standard deviations of the mean
approximately 99.7% of all observations fall within 3 standard deviations of the mean

Question 26

Q

probability of an event

Answer

A

the probability P(A) of event A is the sum of the probabilities assigned to the simple events contained in A.

Question 27

Q

intersection of event A and B

Answer

A

the event that occurs when both A and B occur.

Question 28

Q

joint probability of A and B

Answer

A

the probability of intersection A and B.

Question 29

Q

conditional probability

Answer

A

conditional probability is used to determine how two events are related; that is, it can be determined the probability of one event given the occurrence of another related event.

Question 30

Q

discrete random variable

Answer

A

one that takes on a countable number of values (integers).

Question 31

Q

continuous random variable

Answer

A

one whose values are not discrete, not countable (real numbers).

Question 32

Q

discrete probability distribution

Answer

A

a table, formula or graph that lists all possible values a discrete random variable can assume, together with their associated probabilities.

Question 33

Q

expected value

Answer

A

the weighted average of the possible values it can assume, where the weights are the corresponding probabilities of each xi.

Question 34

Q

population variance

Answer

A

the weighted average of the squared deviations of the values of x from their mean, where the weights are the corresponding probabilities of each xi.

Question 35

Q

Statistical inference

Answer

A

The process of drawing conclusions about the properties of a population based on information obtained from a sample.

Question 36

Q

Sampling distribution

Answer

A

The tool that tells us how close the statistic is to the parameter.

Question 37

Q

Standard error

Answer

A

The standard deviation of the sampling distribution of the sample mean.

Question 38

Q

Central limit theorem

Answer

A

Random sample from normal population = sampling distribution of the sample mean is normally distributed

Random sample from any population = sampling distribution of sample mean is approximately normal for a large sample size (n>=30)

Question 39

Q

What causes a more closer resemblance of the sampling distribution of the sample mean to a normal distribution?

Answer

A

A larger sample size (n)

Question 40

Q

What does capital N mean?

Answer

A

Population size

Question 41

Q

A population size large relative to the sample size, the correction factor is …

Answer

A

Close to 1 and can be ignored

Question 42

Q

How large does a population sample have to be, to be considered “large”?

Answer

A

20 times larger than the sample size

Question 43

Q

Method for making statistical inferences:

Answer

A

identify the parameter to be estimated
specify the parameters estimator and its sampling distribution
construct an interval estimator

Question 44

Q

Types of estimation (2)

Answer

A

point estimator

- interval estimator

Question 45

Q

Point estimator

Answer

A

Estimates the value of an unknown parameter using a single value calculated from the sample data.

Question 46

Q

Interval estimator

Answer

A

Draws inferences about a population by estimating the value of an unknown population parameter by using an interval.

Question 47

Q

Estimator characteristics (3)

Answer

A

unbiasedness
consistency
relative efficiency

Question 48

Q

Unbiasedness

Answer

A

An unbiased estimator is one whose expected value is equal to the parameter it estimates.

Question 49

Q

Consistency

Answer

A

An unbiased estimator is said to be consistent if the difference between the estimator and the population grows smaller as the sample size increases.

Question 50

Q

Relative efficiency

Answer

A

If there are two unbiased estimators available, the one with a smaller variance is said to be relatively efficient.

Question 51

Q

Examples of unbiased estimators

Answer

A

sample mean
sample median
sample variance
sample proportion

Question 52

Q

Examples of consistent estimators

Answer

A

sample mean

- sample median

Question 53

Q

Examples of efficient estimators

Answer

A

Both the sample mean and median are unbiased estimators of the population mean. However the median has a greater variance than the sample mean, so the sample mean is relatively efficient when compared to the sample median.

Question 54

Q

Which is the “best” estimator?

Answer

A

The sample mean as it is unbiased, consistent and relatively efficient.

Question 55

Q

The expected value (E(X)) of the sampling distribution of the sample mean equals the population mean…

Answer

A

…for all populations.

Question 56

Q

As the level of confidence increases…

Answer

A

…the width also increases.

Question 57

Q

If the standard deviation is doubled…

Answer

A

…2B is doubled and visa versa

Question 58

Q

when n increases…

Answer

A

…the width of the confidence interval increases.

Question 59

Q

The width of the confidence (2B) interval is affected by:

Answer

A

level of confidence
population standard deviation
sample size

Question 60

Q

Wide confidence intervals provide:

Answer

A

Little information

Question 61

Q

t-distribution

Answer

A

Mound-shaped and symmetrical around zero.

Question 62

Q

Degrees of freedom (n-1)

Answer

A

A function of the sample size, which determines how spread the distribution is compared to the normal distribution.

Question 63

Q

Purpose of hypothesis testing

Answer

A

To determine whether there is enough statistical evidence in favour of a certain belief about a population parameter.

Question 64

Q

Rejection region

Answer

A

Consists of all values of the statistic for which Ho is rejected.

Answer 65

A

Consists of all values of the rest statistic for which Ho is not rejected.

Answer 66

A

Value that separates the acceptance and rejection region.

Answer 67

A

Defines the range of values of the test statistic for which Ho is rejected in favour of HA.

Answer 68

A

If we repeatedly draw samples of the same size from the same population, 90% of values of the samples means will result in a confidence interval that includes the population mean.

Answer 69

A

The minimum level of significance that is required to reject the null hypothesis.

Answer 70

A

…not he rejected at the 0.05 level.

Answer 71

A

Good measure of amount of statistical evidence supporting HA
Only employed statistical computer software
Yields same conclusions as rejection region method

Answer 72

A

…always correct.

Answer 73

A

covariance

- correlation coefficient

Answer 74

A

Use correlation and regression analysis

Answer 75

A

Used to predict the value of one variable on the basis of other variables.

Answer 76

A

An equation or set of equations that allow us to fully determine the value of the dependent variable from the values of the independent variables.

Answer 77

A

A model used to capture the randomness that is part of a real-life process.

Answer 78

A

Start with deterministic model that approximates the relationship we want to model and add a random term that measures the error of the deterministic model.

Answer 79

A

Difference between actual selling price and estimated price based on the size of the house.

Answer 80

A

This least square method, produces a straight line that minimises the sum of the squared differences between the points and line.

Answer 81

A

… the better the fit.