Chpt 1 - Introduction Flashcards
Statistics ___________ numerical or non-numerical data
Collect
Statistics __________ data for the purpose of making generalizations and decisions
Analyze
What is the core of statistics?
Data
What is data?
Any information that has been collected
Is data always numerical?
No
For example, which political party does someone support is not numerical
Another example is a yes/no answer to a question
What is statistics?
The science of organizing and summarizing data, either numerical or non-numerical, to provide useful and accessible information about a particular subject
What are the 4 steps to statistics?
- collect data
- summarize data
- analyze and interpret data
- draw conclusion from data
What are the 2 different ways to classify statistical sudies?
Method 1 - descriptive statistics or inferential statistics
Method 2 - observational studies or designed experiments
What is the purpose of the descriptive statistics study?
Summarize data
Examine and explore information for INTRINSIC interest only
For example, if we wanted to know about 100 students but only had enough time to ask 5 students, if we only talk about the 5 students we asked, that’s descriptive statistics
Basically, we are only describing the data we already have
What is the purpose of the inferential statistics study?
To use information from a sample to draw a conclusion about the population
For example, if we wanted to know about 100 students but only had enough time to ask 5 students, and we use this sample information to make a conclusion about the whole 100 students
Basically, we are making inferences of a population based on the information we have
What does inferential statistics consist of?
Methods of drawing and measuring the reliability of conclusions about a population based on information obtained from a sample of the population
What does descriptive statistics consist of?
Methods for organizing and summarizing information and may include:
- Constructing graphs and tables
- Calculating various numerical measures such as averages, variations, and percentiles
What is a population?
A collection of ALL individuals of items under consideration in a statistical study
What is a sample?
Part of a population from which information is obtained
What is the notation for the population size?
N
(Make sure it’s a capital letter)
What is the notation for the sample size?
n
(make sure it’s lower case)
What is the number of individuals in a population called?
Population size
What is the number of individual in a sample called?
Sample size
How are descriptive statistics and inferential statics interrelated?
Before carrying out an inferential analysis, descriptive statistics should be applied to organize and summarize information from a sample. This step helps us to choose appropriate inferential methods.
In a STAT151 class of 60 students, the average score of 20 randomly selected
students is 71/100. Among these 20 students, 5 of them got scores among 80-100; 12 of them got scores among 50-79; and 3 of them are lower than 49. Is this study descriptive or inferential?
Discriptive
In a national poll, 1000 adults were asked the following question: “If you won 10
million dollars in a lottery, would you continue to work, or would you stop working?” Based on the results of this poll, researchers made a conclusion “At least 60% Canadians would still work even if they won millions.” Is this study descriptive or inferential?
Inferential
What is an observational study?
Researches simply observe characteristics and take measurements. The data are collected without any plan
What is a designed experiement?
Researches first impose treatments and controls, and then observe characteristics and take measurements. The data are collected with a “plan”
When the instructor asks 5 students about if they know what statistics are before the class even starts, what type of study is this? What type of study would it be if this question was asked at the end of the stats class?
Observational
Designed experiment (we have changed something to change the effect)
One hundred 30-year-old people participated a project studying the relationship
between exercise and a person’s fitness. These participants were randomly assigned into two
groups.
In Group 1, 50 participants were asked to do exercise more than 5 hours per week.
The 50 participants in Group 2 were asked to do exercise less than 2 hours per week.
Their body mass index (BMI) were measured after 6 months, analyzed, and interpreted.
Is this an observational study or a designed experiment?
Suppose that participants in group 1 significantly decreased BMI than group 2. Can we conclude that a person will become fit from doing a lot of exercise?
Designed experiment
Yes, because it’s a designed experiment that can help us determine causation
A scientist was interested to know if smoking is a risk factor for lung cancer. She randomly selected 500 people and summarized the data obtained:
smoker w/ lung CA: 200
smoker w/o lung CA: 60
nonsmoker w/ lung CA: 100
nonsmoker w/o lung CA: 140
Totals:
smokers - 260
nonsmokers - 240
w/ lung CA: 300
w/o lung CA: 200
Is this an observational study or a designed experiment?
Can we conclude that smoking is a risk factor for lung CA?
Observational study
While we know as common knowledge that smoking can cause cancer, this is not a designed experiment, just an observational study, so causation cannot be determined
What are the two main methods for collecting data used in STATS 151?
Census
Sample
What is census data collection? Why would this be a less attractive option to use?
Collect data from all individuals in the population
It is time consuming and expensive
What is sampling data collection?
Collecting data from a sample of the population
We need to make sure the sample is representative of the population
How often is the census collected in Canada?
Every 5 years
What is the main type of sampling done in STATs 151?
Simple random sampling
What is simple random sampling?
A sample taken in a way that each sample with the sample size has equal chance of being selected
It depends on how they were selected, not the sample itself
If a population has 5 letter, A, B, C, D, and E, what are the possible samples with the sample size
n=2
A, B
A, C
A, D
A, E
B, C
B, D
B, E
C, D
C, E
D, E
Consider a population containing 3 students, Anna, Bruce, and Cindy (ABC lol).
All possible samples have a sample size n=2, what are all the options?
If all of these samples have an equal chance to be selected, what is the chance to select any one of the sample sizes?
If the chances are equal, what type of sampling is this?
Possible samples:
A, B
A, C
B, C
There are three samples above, so equal chance for each would be 1/3
Simple random sample
What is a simple random sample with replacement?
An individual can appear in the sample more than once
What is a simple random sample without replacement?
An individual can appear in the sample AT MOST once
If not particularly indicated in this course, does a simple random sample (SRS) occur with replacement or without?
Without replacement
Consider a population containing 3 students, Anna, Bruce, and Cindy (ABC lol).
All possible samples have a sample size n=2, if we use simple random sample with replacement, what are all the options?
A, A
A, B
A, C
B, B
B, C
C, C
What are some ways to obtain a simple random sample (SRS)?
By computer
Random-number tables
Assume that there are 500 residents in a community and we need to select a SRS with sample size 10.
How do you use a random number table to randomly select the sample?
- Number all residents from 1-500
- Randomly pick a starting point from the random-number table. We look at the first 3 digits only as our biggest number (500) also has 3 digits.
- Going from our starting point, we continue down the list selecting the first 10 numbers that are between 1-500
Students were asked how much they liked their English class and options were:
- Dislike very much
- Dislike
- Neutral
- Like
- Like very much
What type of data is this?
Categorical ordinal
What is best, a bar chart or pie chart, for categorical ordinal data?
The bar chart works nicely for ordinal data, as most people readily think from least to most (or lowest to highest) (or 1_dislike_very_much to 5_like_very_much). It requires a bit more cognitive work to move your mind among the outcomes (choices) of the categorical variable when we look at the pie chart.
When looking at intervals for a histogram of continuous data, how do we denote cutpoints of the intervals and what does this mean?
For example:
Interval 1: 20 to under 30
Interval 2: 30 to under 40
[ means left end of data is closed (so the exact value)
) means the right end of the data is open (the up to part)
You are building a histogram with the following intervals:
[10,20), [20, 30), [30, 40), [40, 50), [50, 60)
Which interval would 30 go into?
[30, 40)
For [20, 30), it actually means up to 30, so 29.99999 would go in this bracket, but not 30
What are the cutpoints for the interval [15, 20)?
15 and 20
Cutpoints are the values at the left (closed) and right (open) edges of each interval
What is the midpoint for the interval [15, 20)?
17.5