Statistics As Flashcards

1
Q

Population

A

The whole set of items that are of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Census

A

Observes/Measures every member of a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sample

A

A selection of observations taken from a subset of the population which is used to find out information about the population as a whole

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Ad, Disad: Census

A

Ad: It should give a completely accurate result
Dis: Time consuming and expensive
Cannot be used when the testing process destroys the item
Hard to process large quantity of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Ad, Disad: Sample

A

Ad: less time consuming and expensive than census
Fewer people have to respond
Less data to process than census
Dis: The data may not be as accurate
The sample may not be large enough to give information about small subgroups of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sampling units

A

Individual units of a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sampling frame

A

Sampling units of a population that are named/numbered to form a list

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the 3 methods of random sampling?

A

1) Simple random
2) Systematic
3) Stratified

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does random sampling do?

A

Every member of the population has an equal chance of being selected. Therefore should be representative of the population. Also helps to remove bias from a sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you carry out simple random sampling?

A

Need a sampling frame. Each unit is allocated a unique number and a selection of these numbers is chosen at random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the ways of picking a random unit in simple random sampling?

A

1) Generating random numbers (using calculator, computer etc.)
2) Lottery sampling - The members of the sampling frame could be written on tickets and placed into a ‘hat’. The required number of tickets is then drawn out

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Systematic sampling

A

The required elements are chosen at regular intervals from an ordered list. Eg.
Sample size: 20
Population: 100
100/20 = 5. Every 5th person is picked.
The first person to be chosen is picked at random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Stratified sampling

A

the population is divided into mutually exclusive strata and a random sample is taken from each
The proportion of each strata should be the same.
No. sampled in stratum = (No. in strata / No. in population) * Overall sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Ad, Disad: Simple random sampling

A

Ad: Free of bias
Easy and cheap to implement for small populations and small samples
Each sampling unit has a known and equal chance of selection
Dis: Not suitable when the pop. size or sample size is large
a sampling frame is needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Ad, Disad: Systematic sampling

A

Ad: Simple and quick to use
Suitable for large samples and populations
Dis: A sampling frame is needed
It can introduce bias if the sampling frame is not random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Ad, Disad: Stratified sampling

A

Ad: Sample accurately reflects the population structure
Guarantees proportional representation of groups within a population
Dis: Population must be clearly classified into distinct strata
Selection within each stratum suffers from the same disad. as simple random sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Two types of non-random sampling:

A

1) Quota sampling

2) Opportunity sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Quota sapling

A

An researcher selects a sample that reflects the characteristics of the whole population
the population is divided into groups according to a given characteristic. The size of each group determines the proportion of the sample that should give that characteristic
As an interviewer, you would meet people, assess their group and then allocates them into the appropriate quota
This continues until all quotas have been filled. If a person refuses to be interviewed or the quota is full then you ignore them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Opportunity sampling

A

Taking the sample from people who are available at the the time of the study, who fit the criteria you are looking for

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Ad. Disad; Quota

A

Ad: Allows a small sample to still be representative of the population
No sampling frame required
Quick, easy and inexpensive
Allows for easy comparison between different groups within a population
Dis: Non-random samples can introduce bias
Population must be divided into groups, which can be costly or inaccurate
Increasing scope of study increase number of groups which adds time and expense
Non-responses are not recorded as such

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Ad. Disad; Opportunity

A

Ad: Easy to carry out
Inexpensive
Dis: Unlikely to provide a representative sample
Highly dependant on individual researcher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Quantitative data

A

Numerical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Qualitative data

A

Non-numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Continuous

A

Take any value in a given range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Discrete

A

Only specific values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Explain briefly what you understand by

(i) a statistical experiment [1]

A

A test/investigation adopted for collecting data to provide evidence for or against a hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Explain briefly what you understand by

(ii) an event. [1]

A

Sub-set of of possible outcomes of an experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

State one advantage and one disadvantage of a statistical model. [2]

A

Ad: Quick, cheap, vary parameters/predict
Dis: Does not replicate real-world situation in every detail

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Define Hypothesis Test

A

A statistical test that is used to determine whether there is enough evidence in a sample of data to infer that a certain condition is true for the entire population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What are the advantages and disadvantages of using the median over the mean?

A

The median is used when there are extreme values, as they do not affect it
However, because the mean uses all the pieces of data, it gives a true measure of the data. It is affected by extreme values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Describe how to find the lower quartile for discrete data. n = number of data points

A

Divide n by 4. If this is a whole number, the lower quartile is halfway between this data point and the one above. If it is not a whole number, round up and pick this data point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Describe how to find the upper quartile look for discrete data
n = number of data points

A

Find 3/4 of n. If this is a whole number, the upper quartile is halfway between this date point and the one above. If it is not a whole number, round up and pick this date point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Finding quartiles in data:

What do you assume when you use interpolation?

A

That the data values are evenly distributed within each class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

How do you find the quartiles for groups continuous data, or data presented in a cumulative frequency table?

A
Q1 = n/4th data point
Q2 = n/2th data point
Q3 = 3n/4th data point
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What are alternative phrases for measures of spread?

A

Measures of dispersion

Measures of variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What is range?

A

The difference between the largest and smallest values in the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What is interquartile range?

A

The difference between the upper quartile and the lower quarter

38
Q

What is interpercentile range?

A

The difference between the value so to give in percentiles

39
Q

What is coding?

A

Coding is a way of simplifying statistical calculations. Each data value is coded to make a new set of data values which are easier to work with

40
Q

If data is coded using the formula y = (x - a)/b, what is the mean of the coded data?

A

y̅ = (x̅ - a)/b

41
Q

If data is coded using the formula y = (x - a)/b, what is the standard deviation of the coded data?

A

σᵧ = σₓ/b

42
Q

What is the general definition of an outlier? In terms of quartiles

A

Greater than Q3 + k(Q3 - Q1)

Or Less than Q1 - k(Q3 - Q1)

43
Q

What are outliers, that should be removed from data, called?

A

Anomalies

44
Q

What does cleaning the data mean?

A

The process of removing anomalies from a dataset

45
Q

What can cause anomalies?

A

They can be the result of experimental or recording error, or could be data values which are not relevant to the investigation

46
Q

How do you represent an outlier on a box plot?

A

As a cross

47
Q

How do you plot points on a cumulative frequency diagram?

A
Plot points using the upper-class boundary for x
For a cumulative frequency of zero, plot the point with the lowest value of lower class boundary
48
Q

Do you connect points with straight lines, or draw a smooth curve on a cumulative frequency diagram?

A

Smooth curve

49
Q

What type of data can be represented in a histogram?

A

Grouped continuous data

50
Q

Why is a histogram a good picture of how some data is distributed?

A

It enables you to see a rough location, the general shape, and how spread out the data is

51
Q

How do you draw a frequency polygon, given a histogram?

A

Join the middle of the top of each bar in the histogram

52
Q

What two things can you comment on, when comparing data sets?

A

A measure of location
A measure of spread
Remember you can only compare using the mean and standard deviation or the median and interquartile range. Eg, not the mean and interquartile range

53
Q

What is bivariate data?

A

Data which has pairs of values for two variables

54
Q

What is the explanatory variable?

A

The control variable, the independent variable

Usually plotted on the horizontal axis

55
Q

What is the response variable?

A

The variable that is measured, the dependent variable. It is usually plotted on the vertical axis

56
Q

What does correlation describe?

A

The nature of the linear relationship between two variables

57
Q

When do two variables have a causal relationship?

A

If a change in one variable causes a change in the other.

58
Q

Given a bivariate data set, you can use the regression line to make a prediction or estimate about the corresponding value of which variable?

A

If you know the value of the independent variable you can make a prediction of the dependent variable, and only if they are both within the range of data given

59
Q

What is extrapolation?

A

Making a prediction based on a value outside the range of the given data

60
Q

What is an experiment?

A

A repeatable process that gives rise to a number of outcomes

61
Q

What is an event?

A

A collection of one or more outcomes

62
Q

What is the sample space?

A

The set of all possible outcomes

63
Q

What is the Venn diagram used for?

A

To represent events graphically. Frequencies of probabilities can be placed in the regions of the diagram

64
Q

What is the equation for mutually exclusive events?

A

P(A or B) = P(A) + P(B)

65
Q

What do you mutually exclusive events look like on a Venn diagram?

A

Two circles which do not touch

66
Q

What are independent events?

A

Events that have no effect on each other

67
Q

What is the equation for independent events?

A

P(A and B) = P(A) * P(B)

68
Q

What does a tree diagram show?

A

Shows the outcomes of two or more events happening in succession

69
Q

What is a random variable?

A

A variable whose value depends on the outcome of a random event
And if the outcome is not known until the experiment is carried out

70
Q

What is a variables sample space?

A

The range of values that a random variable can take

71
Q

A variable can take ( ) values

A

any of a range of specific

72
Q

What is a discrete variable?

A

A variable that can only take certain numerical values

73
Q

Explain what the notation of capital letters all lowercase letters means for probability distributions?

A

Random variables are written using uppercase letters (X)
The particular values the random variable can take a written using equivalent lowercase letters (x)
The probability that the random variable X takes a particular value x is written as P(X = x)

74
Q

What does a probability distribution describe?

A

It fully describes the probability of any outcome in the sample space

75
Q

What does a discrete uniform distribution mean about the probabilities?

A

They’re all the same

76
Q

When can you model something with a binomial distribution?

A

There are a fixed number of trials
There are two possible outcomes, success and failure
There is a fixed probability of success
The trials are independent of each other

77
Q

When do you use binomial PD or binomial CD?

A

PD: P(X = x)
CD: P(X > x), P(X < x), P(X ≥ x), P(X ≤ x)

78
Q

What is a hypothesis?

A

A statement made about the value of a population parameter.

79
Q

How can you test a hypothesis about a population?

A

By carrying out an experiment or taking a sample from the population

80
Q

What is the tester statistic?

A

The result of the experiment or the statistic that is calculated from the sample when testing the hypothesis

81
Q

What is the null hypothesis?

A

The hypothesis that you assume to be correct

82
Q

What is the alternative hypothesis?

A

Tells you about the parameter if your assumption is shown to be wrong

83
Q

What is a critical region?

A

A region of the probability distribution which, if the test statistic falls within it, would cause you to reject the null hypothesis

84
Q

What is the critical value?

A

The first value to fall inside of the critical region

85
Q

What is the actual significance level of a hypothesis test?

A

The probability of incorrectly rejecting then null hypothesis

86
Q

What does p-value mean?

A

It is the actual significance level

87
Q

What do you always need to find, for two tailed hypothesis tests?

A

The critical region

88
Q

What do you always need to find for a two tailed hypothesis test?

A

The critical region

89
Q

What do you need to define in a hypothesis test?

A

Any letter you use. eg, μ = population mean

It doesn’t matter which letters you use as long as you define them

90
Q

What significance level should you use if they do not state it in the question?>

A

5%