Probability chapter 9 Flashcards

1
Q

Simple random sampling

A

Every member is equally likely to be chosen. EG: allocate each member of population a no. then use random numbers to chose desired sample.(sequences, )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Systematic sampling

A

Find sample size for n from population N taking 1 number from the first k members of population at random. pick every Kth member where k=N/n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Example of systematic sampling

A

suppose you want to sample 8 houses from a street of 120 houses. 120/8=15, so every 15th house is chosen after a random starting point between 1 and 15.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Stratified sampling

A

(When you want distinct groups in sample) Split groups up into distinct groups + then sample within each group in proportion to its size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Example of stratified sampling

A

One might divide a sample of adults into subgroups by age, like 18-29, 30-39, 40-49, 50-59, and 60 and above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Opportunity sampling

A

Take samples from members of the population you have access to until you have sample of the desired size.
EG: An example would be selecting a sample of students from those coming out of the library.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Quota sampling

A

Want distinct groups to be represented in sample, decide how many members of each groups you wish to sample in advance + use opportunity sampling until there’s enough of the sample for each group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

An example of quota sampling

A

A researcher might ask for a sample of 100 females, or 100 individuals between the ages of 20-30.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Cluster sampling

A

Split the population into clusters that you expect to be similar to each other, the take a sample from each of these clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Example of cluster sampling

A

For example, a researcher wants to survey academic performance of high school students in Spain. He can divide the entire population (population of Spain) into different clusters (cities). Then the researcher selects a number of clusters depending on his research through simple or systematic random sampling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When deciding on a sampling method

A

1) Consider whether or not you can list every member of the population.
2) Identify any sources of bias + any difficulties you might face in taking certain examples.
3) Compare the different sampling methods you have available + choose that one that best suits your needs and limitations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When is a sampling method biased?

A

If it creates a sample that does not represent the whole population .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

To solve a problem about summary statistics

A

1) Identify the summary statistics appropriate to the problem.
2) Calculate the values of the required statistics, using a calculator where appropriate.
3) Use the statistics to describe key features of the data set and make comparisons.
4) If not already done, identify any outliers and remove them, then see how this affects calculations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define outliers.

A

Are values that lie significantly outside the typical set of values of the variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Mode

A

PROS:
-Useful for non-numerical data
-Not usually affected by outliers
-Not usually affected by errors or omissions
-Is an always observed data point
CONS:
- Doesn’t use all of the data
-May not be representative if it has a low frequency
-There may be other values with similar frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Median

A
PROS:
- Not affected by any outliers
-Not significantly affected by errors
CONS:
-Doesn't make use of all the data
17
Q

Mean

A

PROS:
- When the data set is large a few extreme values have negligible impact.
CONS:
-When data set is small a few extreme values have a big impact

18
Q

Range

A

PROS:
Reflects the full data set
CONS:
Distorted (misrepresented) by outliers

19
Q

IQR

A

PROS:
Not distorted by outliers
CONS:
Doesn’t reflect all of the data

20
Q

Standard deviation

A

PROS:
When the data set is very large, few outliers have negligible impact.
CONS:
When the data set is small few outliers have a big impact.

21
Q

distribution

A

How often each outcome occurs. Each outcome with a given frequency

22
Q

continuity correction

A

Involves altering end points of an interval of rounded data to include values which would fall in the interval when rounded.

23
Q

Frequency density

A

= frequency / class width

24
Q

Histogram

A

You can use a histogram to display continuous data

25
Q

When deciding on the appropriate diagram to represent data

A

1) Consider whether you need to be able to display all of the values, including outliers.
2) Consider whether you need to be able display relative or absolute frequencies.
3) Consider whether you are more interested in displaying the distribution or the summary statistics.

26
Q

Box plot

A
Advantages:
Highlights outliers.
Makes it easy to compare data sets.
CONS:
Data is grouped into 4 categories so detailed analysis is not possible
27
Q

Histogram

A
PRO:
Clearly shows shape of distribution
CONS:
Doesn't always highlight outliers
It is possible but not easy to estimate Q1, Q2 and Q3
28
Q

Cumulative frequency curve

A

PRO:
Makes it easy to find the 5 number summary
CON:
Doesn’t always highlight outliers
If interval boundaries are not shown the degree of detail is not clear.

29
Q

When interpreting a diagram displaying data

A

1) Consider what is being represented + whether your data is discrete or continuous.
2) If necessary, identify any outliers or missing/ incorrect data, and consider the effects of removing them.
3) Read what is being asked for in the question and use the diagram to answer.

30
Q

What are variables that are statistically related?

A

They are describes as correlated. There are 3 types of positive r=1 , negative r=-1 + zero r=0

31
Q

To solve a question about bivariate data + correlation

A

1) Draw a scatter diagram to identify any correlation between 2 variables.
2) Identify data points that don’t fit the general pattern shown by the data
3) Use correlation in the scatter diagram to determine the value of the missing data points.