Statistics As Flashcards

1
Q

Population

A

The whole set of items that are of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Census

A

Observes/Measures every member of a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sample

A

A selection of observations taken from a subset of the population which is used to find out information about the population as a whole

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Ad, Disad: Census

A

Ad: It should give a completely accurate result
Dis: Time consuming and expensive
Cannot be used when the testing process destroys the item
Hard to process large quantity of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Ad, Disad: Sample

A

Ad: less time consuming and expensive than census
Fewer people have to respond
Less data to process than census
Dis: The data may not be as accurate
The sample may not be large enough to give information about small subgroups of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sampling units

A

Individual units of a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sampling frame

A

Sampling units of a population that are named/numbered to form a list

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the 3 methods of random sampling?

A

1) Simple random
2) Systematic
3) Stratified

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does random sampling do?

A

Every member of the population has an equal chance of being selected. Therefore should be representative of the population. Also helps to remove bias from a sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you carry out simple random sampling?

A

Need a sampling frame. Each unit is allocated a unique number and a selection of these numbers is chosen at random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the ways of picking a random unit in simple random sampling?

A

1) Generating random numbers (using calculator, computer etc.)
2) Lottery sampling - The members of the sampling frame could be written on tickets and placed into a ‘hat’. The required number of tickets is then drawn out

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Systematic sampling

A

The required elements are chosen at regular intervals from an ordered list. Eg.
Sample size: 20
Population: 100
100/20 = 5. Every 5th person is picked.
The first person to be chosen is picked at random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Stratified sampling

A

the population is divided into mutually exclusive strata and a random sample is taken from each
The proportion of each strata should be the same.
No. sampled in stratum = (No. in strata / No. in population) * Overall sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Ad, Disad: Simple random sampling

A

Ad: Free of bias
Easy and cheap to implement for small populations and small samples
Each sampling unit has a known and equal chance of selection
Dis: Not suitable when the pop. size or sample size is large
a sampling frame is needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Ad, Disad: Systematic sampling

A

Ad: Simple and quick to use
Suitable for large samples and populations
Dis: A sampling frame is needed
It can introduce bias if the sampling frame is not random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Ad, Disad: Stratified sampling

A

Ad: Sample accurately reflects the population structure
Guarantees proportional representation of groups within a population
Dis: Population must be clearly classified into distinct strata
Selection within each stratum suffers from the same disad. as simple random sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Two types of non-random sampling:

A

1) Quota sampling

2) Opportunity sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Quota sapling

A

An researcher selects a sample that reflects the characteristics of the whole population
the population is divided into groups according to a given characteristic. The size of each group determines the proportion of the sample that should give that characteristic
As an interviewer, you would meet people, assess their group and then allocates them into the appropriate quota
This continues until all quotas have been filled. If a person refuses to be interviewed or the quota is full then you ignore them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Opportunity sampling

A

Taking the sample from people who are available at the the time of the study, who fit the criteria you are looking for

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Ad. Disad; Quota

A

Ad: Allows a small sample to still be representative of the population
No sampling frame required
Quick, easy and inexpensive
Allows for easy comparison between different groups within a population
Dis: Non-random samples can introduce bias
Population must be divided into groups, which can be costly or inaccurate
Increasing scope of study increase number of groups which adds time and expense
Non-responses are not recorded as such

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Ad. Disad; Opportunity

A

Ad: Easy to carry out
Inexpensive
Dis: Unlikely to provide a representative sample
Highly dependant on individual researcher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Quantitative data

A

Numerical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Qualitative data

A

Non-numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Continuous

A

Take any value in a given range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Discrete
Only specific values
26
Explain briefly what you understand by | (i) a statistical experiment [1]
A test/investigation adopted for collecting data to provide evidence for or against a hypothesis
27
Explain briefly what you understand by | (ii) an event. [1]
Sub-set of of possible outcomes of an experiment
28
State one advantage and one disadvantage of a statistical model. [2]
Ad: Quick, cheap, vary parameters/predict Dis: Does not replicate real-world situation in every detail
29
Define Hypothesis Test
A statistical test that is used to determine whether there is enough evidence in a sample of data to infer that a certain condition is true for the entire population
30
What are the advantages and disadvantages of using the median over the mean?
The median is used when there are extreme values, as they do not affect it However, because the mean uses all the pieces of data, it gives a true measure of the data. It is affected by extreme values
31
Describe how to find the lower quartile for discrete data. n = number of data points
Divide n by 4. If this is a whole number, the lower quartile is halfway between this data point and the one above. If it is not a whole number, round up and pick this data point
32
Describe how to find the upper quartile look for discrete data n = number of data points
Find 3/4 of n. If this is a whole number, the upper quartile is halfway between this date point and the one above. If it is not a whole number, round up and pick this date point
33
Finding quartiles in data: | What do you assume when you use interpolation?
That the data values are evenly distributed within each class
34
How do you find the quartiles for groups continuous data, or data presented in a cumulative frequency table?
``` Q1 = n/4th data point Q2 = n/2th data point Q3 = 3n/4th data point ```
35
What are alternative phrases for measures of spread?
Measures of dispersion | Measures of variation
36
What is range?
The difference between the largest and smallest values in the dataset
37
What is interquartile range?
The difference between the upper quartile and the lower quarter
38
What is interpercentile range?
The difference between the value so to give in percentiles
39
What is coding?
Coding is a way of simplifying statistical calculations. Each data value is coded to make a new set of data values which are easier to work with
40
If data is coded using the formula y = (x - a)/b, what is the mean of the coded data?
y̅ = (x̅ - a)/b
41
If data is coded using the formula y = (x - a)/b, what is the standard deviation of the coded data?
σᵧ = σₓ/b
42
What is the general definition of an outlier? In terms of quartiles
Greater than Q3 + k(Q3 - Q1) | Or Less than Q1 - k(Q3 - Q1)
43
What are outliers, that should be removed from data, called?
Anomalies
44
What does cleaning the data mean?
The process of removing anomalies from a dataset
45
What can cause anomalies?
They can be the result of experimental or recording error, or could be data values which are not relevant to the investigation
46
How do you represent an outlier on a box plot?
As a cross
47
How do you plot points on a cumulative frequency diagram?
``` Plot points using the upper-class boundary for x For a cumulative frequency of zero, plot the point with the lowest value of lower class boundary ```
48
Do you connect points with straight lines, or draw a smooth curve on a cumulative frequency diagram?
Smooth curve
49
What type of data can be represented in a histogram?
Grouped continuous data
50
Why is a histogram a good picture of how some data is distributed?
It enables you to see a rough location, the general shape, and how spread out the data is
51
How do you draw a frequency polygon, given a histogram?
Join the middle of the top of each bar in the histogram
52
What two things can you comment on, when comparing data sets?
A measure of location A measure of spread Remember you can only compare using the mean and standard deviation or the median and interquartile range. Eg, not the mean and interquartile range
53
What is bivariate data?
Data which has pairs of values for two variables
54
What is the explanatory variable?
The control variable, the independent variable | Usually plotted on the horizontal axis
55
What is the response variable?
The variable that is measured, the dependent variable. It is usually plotted on the vertical axis
56
What does correlation describe?
The nature of the linear relationship between two variables
57
When do two variables have a causal relationship?
If a change in one variable causes a change in the other.
58
Given a bivariate data set, you can use the regression line to make a prediction or estimate about the corresponding value of which variable?
If you know the value of the independent variable you can make a prediction of the dependent variable, and only if they are both within the range of data given
59
What is extrapolation?
Making a prediction based on a value outside the range of the given data
60
What is an experiment?
A repeatable process that gives rise to a number of outcomes
61
What is an event?
A collection of one or more outcomes
62
What is the sample space?
The set of all possible outcomes
63
What is the Venn diagram used for?
To represent events graphically. Frequencies of probabilities can be placed in the regions of the diagram
64
What is the equation for mutually exclusive events?
P(A or B) = P(A) + P(B)
65
What do you mutually exclusive events look like on a Venn diagram?
Two circles which do not touch
66
What are independent events?
Events that have no effect on each other
67
What is the equation for independent events?
P(A and B) = P(A) * P(B)
68
What does a tree diagram show?
Shows the outcomes of two or more events happening in succession
69
What is a random variable?
A variable whose value depends on the outcome of a random event And if the outcome is not known until the experiment is carried out
70
What is a variables sample space?
The range of values that a random variable can take
71
A variable can take ( ) values
any of a range of specific
72
What is a discrete variable?
A variable that can only take certain numerical values
73
Explain what the notation of capital letters all lowercase letters means for probability distributions?
Random variables are written using uppercase letters (X) The particular values the random variable can take a written using equivalent lowercase letters (x) The probability that the random variable X takes a particular value x is written as P(X = x)
74
What does a probability distribution describe?
It fully describes the probability of any outcome in the sample space
75
What does a discrete uniform distribution mean about the probabilities?
They’re all the same
76
When can you model something with a binomial distribution?
There are a fixed number of trials There are two possible outcomes, success and failure There is a fixed probability of success The trials are independent of each other
77
When do you use binomial PD or binomial CD?
PD: P(X = x) CD: P(X > x), P(X < x), P(X ≥ x), P(X ≤ x)
78
What is a hypothesis?
A statement made about the value of a population parameter.
79
How can you test a hypothesis about a population?
By carrying out an experiment or taking a sample from the population
80
What is the tester statistic?
The result of the experiment or the statistic that is calculated from the sample when testing the hypothesis
81
What is the null hypothesis?
The hypothesis that you assume to be correct
82
What is the alternative hypothesis?
Tells you about the parameter if your assumption is shown to be wrong
83
What is a critical region?
A region of the probability distribution which, if the test statistic falls within it, would cause you to reject the null hypothesis
84
What is the critical value?
The first value to fall inside of the critical region
85
What is the actual significance level of a hypothesis test?
The probability of incorrectly rejecting then null hypothesis
86
What does p-value mean?
It is the actual significance level
87
What do you always need to find, for two tailed hypothesis tests?
The critical region
88
What do you always need to find for a two tailed hypothesis test?
The critical region
89
What do you need to define in a hypothesis test?
Any letter you use. eg, μ = population mean | It doesn't matter which letters you use as long as you define them
90
What significance level should you use if they do not state it in the question?>
5%