Stat - Exam #1 Flashcards
What is Statistics?
- The science of COLLECTING, ORGANIZING and SUMMARIZING and ANALYZING information to draw conclusions;
- Science of data
What kind of data is used in Statistics?
Probabilistic data = data that characteristics of being unknown for one observation, but many observations is know;
— characterize well in the long run, but unknown individually
What are the techniques of gathering statistical data?
- Sampling;
- Descriptive Stats;
- Inferential Stats
What is Sampling?
-Techniques used to collect info
— Major technique = Simple Random Sampling;
— COLLECTING techniques
What are Descriptive Statistics?
-Techniques used to condense and describe sets of data;
— Major techniques = Frequency Tables, histograms, and summary numbers;
— ORGANIZING and SUMMARIZING techniques
What are Inferential Statistics?
-Techniques used to systematically draw conclusions about a population from a set of sample data;
-Gather population information from a sample;
— Interpretation of data by generalizing info from a sample to apply to a population
— Major tools = Hypothesis testing and confidence intervals;
— ANALYZING techniques
What are Statistical Methods?
-Combo of descriptive and inferential techniques (collect, organize, summarize, analyze)
What is a Population?
- The totality of element in a well-defined group to be studied;
- MUST be WELL-DEFINED by clearly stating what exact and specific elements (people, animals, etc) DO and DO NOT belong in the population
What is a Sample?
A SUBSET of the population;
— Larger the sample size the better, but the METHOD is more important than the size
What is an Individual?
ONE object from the POPULATION
What is the goal of Sampling?
To collect…
- a measurable numbers of individuals that…
- represent the population
* Measuring the sample gives info about the population
What are the 4 sampling techniques?
- **Simple random = best;
- Stratified;
- Systematic;
- Cluster
How can sampling be done?
- WITH replacement;
2. WITHOUT replacement
What are the non sampling errors?
- Coverage errors = incomplete population;
- Nonresponse erros = cannot measure selected element;
- Inaccurate response errors = poor records, lying;
- Measurement erros = ambiguous questions, crude tools
What is Simple Random Sampling?
A method of choosing a sample such that each sample of the same size has the same change of being chosen;
- Each individual has equal chance of being chosen
What does Random sampling do?
- DOES remove SELECTION bias from the sample;
- Does NOT affect the natural variability of data;
- Does NOT guarantee a representative sample;
* *ONLY way to allow inferences (informed guesses) about the population
What is the method of Random Sampling?
- Assign every individual in a population a number;
- Select individuals to be in the sample by:
— Random number table or
— Random number generator
(Data > Random Variates > Distribution)
What are the 3 classes of data?
- Constant = measurement gives only one possible value;
- Variable = repeated measure yields many possible values;
- Random Variable = randomly varying values (value determined by chance:
* *Stats is concerned with data from random variables
What are the types of data?
- Qualitative;
2. Quantitative = Discrete or Continuous
What is Qualitative Data?
Data that can be calcified by some mutually exclusive and exhaustive quality of individuals;
EX: Color, religion, gender
What is Quantitative Data?
Data that are numericanl and allow the use of arithmetic
What is Discrete Data?
*Quantitative;
-Easily countable number of possible values;
EX: 1-10 ( number of items)
What is Continuous Data?
*Quantitative;
- Infinite number of possible values;
EX: Time, weight, length
What is the method to identify TYPES OF DATA?
- Pick any TWO data points;
- Can they be ordered?
—NO = QUALITATIVE:
—YES… - Countable number of values between?
—NO = Continous;
—YES = Discrete
What is a Census?
-Study that measures a characteristic of EVERY individual in a population; — Observational; — DOES NOT involve a sample; — Measure the whole population EX: US Census
What is an Observational Study?
-Study that measures a characteristics in a sample WITHOUT controlling the units or treatment;
— Called an “Ex Post Study” (after the fact) because values have already been established;
— Determines ASSOCIATION, not cause;
EX: students heights in a class
What is Experimental Design?
-Study that measure a characteristic in a sample WITH CONTROLLING units or treatment;
EX: measure wt. gain of 3 emails on a week long high protein diet
1. Independent or
2. Dependent
What is Independent Design?
- Experimental;
- Where all experimental units are randomly chosen and assigned to treatments randomly
What is Dependent Design?
- Experimental;
- One half of the experimental units are chosen randomly and the second half are chosen by matching characteristics;
- “Matched-Pairs Design”
When do you use an Experimental Design?
When you can BOTH:
- Control the individuals characteristics and
- Control the treatment
* If you CAN’T do both, use observational
Why can’t observational studies determine causation?
- Because of possible lurking variables;
- Lurking = No measure, but DO affect the results (EX: Snoring and a risk of heart attack)
What does an experiment mean in statistics?
- High level of control;
- Often takes more than one study to eliminate or control lucking variables
What is an Experimental Unit?
An individual in a sample
What is Treatment?
A A condition of interest that is applied to the experimental unit
What is the Response Variable?
A quantitative or qualitative variable that reflects the characteristic of interest
What is a Double Blind study?
Neither the researcher nor the experimental unit knows whether, or what, treatment is being applied
What is a Placebo?
- A false treatment that has NO effect;
- Used to prevent experimental units from knowing whether they are being treated
How do I describe a column of data?
Distribution of data gives SHAPE, LOCATION, and SPREAD;
-These are very useful in abstracting the info from the data
What is the process of statistics?
- Ask question;
- Collect data = census, observational study, experimental design, or existing data;
- Organize and analyze = overview with tables/graphs; detailed using methods depending on type;
- Make a conclusion
What is Raw Data?
Data NOT organized
How is a variable (column of data) described?
-Condense and described by distribution;
-Distribution described by shape, location, and spread =
— Graphical methods determine shape;
— Numerical methods find location and spread
How can Qualitative data be graphically summarized?
- Frequency table and in a Graph (bar chart, pareto chart, and pie chart);
- Frequency organizes = shows what possible VALUES a variable take and HOW OFTEN each value;
- Picture give better overview
What is Frequency Table?
A table that lists all categories of data, with number of occurrences for each category;
- 5 Columns =
1. Category
2. Frequency
3. Relative Frequency
4. Cumulative Frequency
5. Cumulative Relative Frequency
What is Category?
Lists the names of all categories in a column of data
What is Frequency?
The number of observations in each category
What is Relative Frequency?
The percent, or proportion, of data in each category;
Relative Frequency = (Frequency/Sum of all Frequencies)
What is Cumulative Frequency?
The sum of frequency up through, and including category of interest;
-the number of observations less than or equal to the category value
What is Cumulative Relative Frequency?
The sum of relative frequency up through and including the category of interest
What is a Bar Chart?
A graph of a set of data made with:
- Categories on horizontal axis
- Frequencies on vertical axis;
- Rectangle of equal width (bar) drawn for each category with the hight equal to the category’s frequency (or relative frequency);
- Bars do NOT touch;
- Value are in the middle of the bars
What is a Pareto Chart?
A bar chart whose bars are drawn in descending order of height
What is a Pie Chart?
A circle divided in to wedges, where each wedge represents a category and the size of the wedge represents the relative frequency of a category;
-Summarizes qualitative data
How is Discrete Data summarized?
- HISTOGRAM;
- Values are used to create categories;
- Histogram bar DO touch;
- Values marked in the middle of the bars
How do you graphically represent Continuous Data?
- Too many categories since each number is its own category;
- To condense, need to GROUP data into intervals and create new, smaller categories;
1. Group into classes; make a frequency table; make a histogram;
2. Or make a stem-and-leaf plot