Probability chapter 9 Flashcards
Simple random sampling
Every member is equally likely to be chosen. EG: allocate each member of population a no. then use random numbers to chose desired sample.(sequences, )
Systematic sampling
Find sample size for n from population N taking 1 number from the first k members of population at random. pick every Kth member where k=N/n
Example of systematic sampling
suppose you want to sample 8 houses from a street of 120 houses. 120/8=15, so every 15th house is chosen after a random starting point between 1 and 15.
Stratified sampling
(When you want distinct groups in sample) Split groups up into distinct groups + then sample within each group in proportion to its size.
Example of stratified sampling
One might divide a sample of adults into subgroups by age, like 18-29, 30-39, 40-49, 50-59, and 60 and above
Opportunity sampling
Take samples from members of the population you have access to until you have sample of the desired size.
EG: An example would be selecting a sample of students from those coming out of the library.
Quota sampling
Want distinct groups to be represented in sample, decide how many members of each groups you wish to sample in advance + use opportunity sampling until there’s enough of the sample for each group
An example of quota sampling
A researcher might ask for a sample of 100 females, or 100 individuals between the ages of 20-30.
Cluster sampling
Split the population into clusters that you expect to be similar to each other, the take a sample from each of these clusters.
Example of cluster sampling
For example, a researcher wants to survey academic performance of high school students in Spain. He can divide the entire population (population of Spain) into different clusters (cities). Then the researcher selects a number of clusters depending on his research through simple or systematic random sampling.
When deciding on a sampling method
1) Consider whether or not you can list every member of the population.
2) Identify any sources of bias + any difficulties you might face in taking certain examples.
3) Compare the different sampling methods you have available + choose that one that best suits your needs and limitations.
When is a sampling method biased?
If it creates a sample that does not represent the whole population .
To solve a problem about summary statistics
1) Identify the summary statistics appropriate to the problem.
2) Calculate the values of the required statistics, using a calculator where appropriate.
3) Use the statistics to describe key features of the data set and make comparisons.
4) If not already done, identify any outliers and remove them, then see how this affects calculations.
Define outliers.
Are values that lie significantly outside the typical set of values of the variables.
Mode
PROS:
-Useful for non-numerical data
-Not usually affected by outliers
-Not usually affected by errors or omissions
-Is an always observed data point
CONS:
- Doesn’t use all of the data
-May not be representative if it has a low frequency
-There may be other values with similar frequency
Median
PROS: - Not affected by any outliers -Not significantly affected by errors CONS: -Doesn't make use of all the data
Mean
PROS:
- When the data set is large a few extreme values have negligible impact.
CONS:
-When data set is small a few extreme values have a big impact
Range
PROS:
Reflects the full data set
CONS:
Distorted (misrepresented) by outliers
IQR
PROS:
Not distorted by outliers
CONS:
Doesn’t reflect all of the data
Standard deviation
PROS:
When the data set is very large, few outliers have negligible impact.
CONS:
When the data set is small few outliers have a big impact.
distribution
How often each outcome occurs. Each outcome with a given frequency
continuity correction
Involves altering end points of an interval of rounded data to include values which would fall in the interval when rounded.
Frequency density
= frequency / class width
Histogram
You can use a histogram to display continuous data
When deciding on the appropriate diagram to represent data
1) Consider whether you need to be able to display all of the values, including outliers.
2) Consider whether you need to be able display relative or absolute frequencies.
3) Consider whether you are more interested in displaying the distribution or the summary statistics.
Box plot
Advantages: Highlights outliers. Makes it easy to compare data sets. CONS: Data is grouped into 4 categories so detailed analysis is not possible
Histogram
PRO: Clearly shows shape of distribution CONS: Doesn't always highlight outliers It is possible but not easy to estimate Q1, Q2 and Q3
Cumulative frequency curve
PRO:
Makes it easy to find the 5 number summary
CON:
Doesn’t always highlight outliers
If interval boundaries are not shown the degree of detail is not clear.
When interpreting a diagram displaying data
1) Consider what is being represented + whether your data is discrete or continuous.
2) If necessary, identify any outliers or missing/ incorrect data, and consider the effects of removing them.
3) Read what is being asked for in the question and use the diagram to answer.
What are variables that are statistically related?
They are describes as correlated. There are 3 types of positive r=1 , negative r=-1 + zero r=0
To solve a question about bivariate data + correlation
1) Draw a scatter diagram to identify any correlation between 2 variables.
2) Identify data points that don’t fit the general pattern shown by the data
3) Use correlation in the scatter diagram to determine the value of the missing data points.