Mid-Term Exam Flashcards

1
Q

population

A

the group of all items (data) of interest.

- frequently very large; sometimes infinite.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

sample

A

a sample of items (data) drawn from the population of interest.

  • potentially large but much less than population.
  • the sample is a subset of the population.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

parameter

A

a descriptive measure of a population.

- Ex. population mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

statistic

A

a descriptive measure of a sample.

- Ex. sample mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

statistical inference

A

sample statistics are used to make inferences about population parameters, meaning an estimate, prediction or decision can be produced about a population based on sample data. therefore what is known about a sample can be applied to the larger population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

numerical data

A
  • values are real numbers
  • all calculations are valid
  • data may be treated as ordinal or nominal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

nominal data

A
  • values are the arbitrary numbers that represent categories
  • only calculations, such as proportions based on the frequencies of occurrence are valid
  • data may be treated as ordinal or numerical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

ordinal data

A
  • values must represent the ranked order of the data
  • calculations based on an ordering process are valid
  • data may be treated as nominal but not as numerical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

bar chart

A

a bar chart is mainly used for nominal data and graphically represents the frequency of each category as a bar rising vertically from the horizontal axis.
- bar height is proportional to frequency of the corresponding category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

pie chart

A

a circle that is subdivided into slices whose area are proportional to the frequencies, therefore displaying the proportion of occurrences of each category.
- popular tool to represent proportions of appearance for nominal data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

steps to building a histogram (3)

A

1) collect the data
2) create a frequency distribution for the data
- determine number of classes
- determine class width
3) draw a histogram of rectangle bars using the class intervals and the corresponding frequencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

class width

A
generally best to use equal class widths. 
unequal class widths are used when the frequency associated with some classes is too low, then: 
- several classes are combined together to form a wider and more populated class
- it is possible to form an open-ended class at the higher or lower of the histogram
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

relative frequency

A

proportion of observations falling into each class, and should be used when comparing two or more histograms, each with different numbers/observations.
- often preferable than the frequency itself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

class relative frequency (formula)

A

(class frequency) divided by (total number of observations)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

equal class width (formula)

A

(largest value - smallest value) divided by (number of classes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

cumulative frequency of a class

A

the number of measurements less than the upper limit of that class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

to obtain the cumulative frequency of a class

A

add the frequency of that class with the frequencies of all previous classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

cumulative relative frequency of a particular class

A

the proportion of measurements that are less than the upper limit of that class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

arithmetic mean

A

most popular and useful measure of central location.

  • all values are used
  • it is unique
  • the sum of the deviations from the mean is 0
  • calculated by summing the values and dividing by the number of values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

median of a set of measurements

A

the value that falls in the middle when the measurements are arranged in order of magnitude.

  • unique median for each data set
  • commonly used measure of central location
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

mode of a set of observations

A

the value that occurs most frequently.

  • data set may have one, two or more modes (modal classes)
  • useful for all data, mainly used for nominal
  • for large data sets, modal class is more relevant than a single-value mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

which measure of central location?

A
  • mean is generally first selection unless outliers are present in the dataset, then the median should be used.
  • mode is seldom the best measure of central location.
  • median is not as sensitive to extreme as is the mean.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

variance

A

this measure of dispersion reflects the values of all the measurements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

standard deviation

A

the square root of the variance of the measurements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

empirical rules

A
  • approximately 68% of all observations fall within 1 standard deviation of the mean
  • approximately 95% of all observations fall within 2 standard deviations of the mean
  • approximately 99.7% of all observations fall within 3 standard deviations of the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

probability of an event

A

the probability P(A) of event A is the sum of the probabilities assigned to the simple events contained in A.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

intersection of event A and B

A

the event that occurs when both A and B occur.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

joint probability of A and B

A

the probability of intersection A and B.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

conditional probability

A

conditional probability is used to determine how two events are related; that is, it can be determined the probability of one event given the occurrence of another related event.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

discrete random variable

A

one that takes on a countable number of values (integers).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

continuous random variable

A

one whose values are not discrete, not countable (real numbers).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

discrete probability distribution

A

a table, formula or graph that lists all possible values a discrete random variable can assume, together with their associated probabilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

expected value

A

the weighted average of the possible values it can assume, where the weights are the corresponding probabilities of each xi.

34
Q

population variance

A

the weighted average of the squared deviations of the values of x from their mean, where the weights are the corresponding probabilities of each xi.

35
Q

Statistical inference

A

The process of drawing conclusions about the properties of a population based on information obtained from a sample.

36
Q

Sampling distribution

A

The tool that tells us how close the statistic is to the parameter.

37
Q

Standard error

A

The standard deviation of the sampling distribution of the sample mean.

38
Q

Central limit theorem

A

Random sample from normal population = sampling distribution of the sample mean is normally distributed

Random sample from any population = sampling distribution of sample mean is approximately normal for a large sample size (n>=30)

39
Q

What causes a more closer resemblance of the sampling distribution of the sample mean to a normal distribution?

A

A larger sample size (n)

40
Q

What does capital N mean?

A

Population size

41
Q

A population size large relative to the sample size, the correction factor is …

A

Close to 1 and can be ignored

42
Q

How large does a population sample have to be, to be considered “large”?

A

20 times larger than the sample size

43
Q

Method for making statistical inferences:

A
  • identify the parameter to be estimated
  • specify the parameters estimator and its sampling distribution
  • construct an interval estimator
44
Q

Types of estimation (2)

A
  • point estimator

- interval estimator

45
Q

Point estimator

A

Estimates the value of an unknown parameter using a single value calculated from the sample data.

46
Q

Interval estimator

A

Draws inferences about a population by estimating the value of an unknown population parameter by using an interval.

47
Q

Estimator characteristics (3)

A
  • unbiasedness
  • consistency
  • relative efficiency
48
Q

Unbiasedness

A

An unbiased estimator is one whose expected value is equal to the parameter it estimates.

49
Q

Consistency

A

An unbiased estimator is said to be consistent if the difference between the estimator and the population grows smaller as the sample size increases.

50
Q

Relative efficiency

A

If there are two unbiased estimators available, the one with a smaller variance is said to be relatively efficient.

51
Q

Examples of unbiased estimators

A
  • sample mean
  • sample median
  • sample variance
  • sample proportion
52
Q

Examples of consistent estimators

A
  • sample mean

- sample median

53
Q

Examples of efficient estimators

A

Both the sample mean and median are unbiased estimators of the population mean. However the median has a greater variance than the sample mean, so the sample mean is relatively efficient when compared to the sample median.

54
Q

Which is the “best” estimator?

A

The sample mean as it is unbiased, consistent and relatively efficient.

55
Q

The expected value (E(X)) of the sampling distribution of the sample mean equals the population mean…

A

…for all populations.

56
Q

As the level of confidence increases…

A

…the width also increases.

57
Q

If the standard deviation is doubled…

A

…2B is doubled and visa versa

58
Q

when n increases…

A

…the width of the confidence interval increases.

59
Q

The width of the confidence (2B) interval is affected by:

A
  • level of confidence
  • population standard deviation
  • sample size
60
Q

Wide confidence intervals provide:

A

Little information

61
Q

t-distribution

A

Mound-shaped and symmetrical around zero.

62
Q

Degrees of freedom (n-1)

A

A function of the sample size, which determines how spread the distribution is compared to the normal distribution.

63
Q

Purpose of hypothesis testing

A

To determine whether there is enough statistical evidence in favour of a certain belief about a population parameter.

64
Q

Rejection region

A

Consists of all values of the statistic for which Ho is rejected.

65
Q

Acceptance region

A

Consists of all values of the rest statistic for which Ho is not rejected.

66
Q

Critical value

A

Value that separates the acceptance and rejection region.

67
Q

Decision rule

A

Defines the range of values of the test statistic for which Ho is rejected in favour of HA.

68
Q

A 90% confidence interval estimate of the population mean can be interpreted to mean…

A

If we repeatedly draw samples of the same size from the same population, 90% of values of the samples means will result in a confidence interval that includes the population mean.

69
Q

P-value

A

The minimum level of significance that is required to reject the null hypothesis.

70
Q

If a hypothesis is not rejected at the 0.10 level of significance it will…

A

…not he rejected at the 0.05 level.

71
Q

P-value method:

A
  • Good measure of amount of statistical evidence supporting HA
  • Only employed statistical computer software
  • Yields same conclusions as rejection region method
72
Q

The expected value of the difference of two sample means the difference of the corresponding means is…

A

…always correct.

73
Q

Description of linear relationship between two variables:

A
  • covariance

- correlation coefficient

74
Q

If the problem objective is to analyse the relationship…

A

Use correlation and regression analysis

75
Q

Regression analysis

A

Used to predict the value of one variable on the basis of other variables.

76
Q

Deterministic model

A

An equation or set of equations that allow us to fully determine the value of the dependent variable from the values of the independent variables.

77
Q

Probabilistic model

A

A model used to capture the randomness that is part of a real-life process.

78
Q

To create a probabilistic model:

A

Start with deterministic model that approximates the relationship we want to model and add a random term that measures the error of the deterministic model.

79
Q

Random term (error variable)

A

Difference between actual selling price and estimated price based on the size of the house.

80
Q

Estimated least square regression line

A

This least square method, produces a straight line that minimises the sum of the squared differences between the points and line.

81
Q

The smallest the sum of the square differences…

A

… the better the fit.