Week 4- Sampling and Surveys and Experiments Flashcards
Population
Entire group of individuals about which we want information. A set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects or a hypothetical and potentially infinite group of objects conceived as a generalization from experience.
Sample
Part of the population from which we collect information; we use information about the sample to draw conclusions about the population. A smaller subset of data taken from a larger group called the “population,” which is used to represent the characteristics of the whole population when it’s impractical to study every member of that population directly; essentially, it’s a smaller, manageable group that is meant to reflect the larger group as a whole.
Convenience Sample
Choosing to sample the individuals who are easiest to reach. A group of participants selected for a study based on their easy accessibility to the researcher, meaning they are readily available and convenient to reach, rather than being chosen randomly from a larger population, often leading to potential bias in the results; it is considered a type of non-probability sampling method.
Bias
The study favors certain outcomes. A systematic error that occurs when a data collection method or analysis technique consistently underestimates or overestimates the true value of a population parameter, leading to inaccurate results that don’t accurately represent the overall population being studied; essentially, it’s a flaw in the research design that skews the data towards a particular outcome, whether intentional or unintentional.
Voluntary Response Sample
People choose themselves to respond; such samples generally show bias because those who choose to respond often have strong opinions, frequently in the
same direction.
A type of sample where participants self-select to be included in a study, meaning they actively choose to respond to a survey or participate in research, rather than being randomly chosen by the researcher; this often leads to biased results as people with strong opinions are more likely to volunteer their feedback.
Simple Random Sample (SRS)
(of size n), is n individuals chosen from the population in such a way that every set n individuals has an equal chance to be the actual sample chosen.
A type of probability sampling in which the researcher randomly selects a subset of participants from a population. Each member of the population has an equal chance of being selected.
Table of Random Digits
(Table D in your textbook), a long string of digits with the following properties: 1) Each entry is equally likely to be any one of the digits 0 through 9; and 2) The
entries are independent of each other, so knowledge of one part of the table provides no information on any other part of the table.
a series of digits (0 to 9) arranged randomly in rows and columns, as demonstrated in the small sample shown below. The table usually contains 5-digit numbers, arranged in rows and columns, for ease of reading. Typically, a full table may extend over as many as four or more pages
Strata
Groups of individuals with similar characteristics. A stratum (plural strata) refers to a subset (part) of the population (entire collection of items under consideration) which is being sampled. Stratification thus consists of dividing the population into strata within each of which an independent sample can be chosen
Stratified Random Sample
Choosing a separate SRS from each stratum and combining these SRS’s into a full sample.
A method of sampling that involves the division of a population into smaller subgroups known as strata. In stratified random sampling, or stratification, the strata are formed based on members’ shared attributes or characteristics, such as income or educational attainment.
Cluster
Small group which mirrors the population.
A group of data points within a dataset that are considered similar to each other based on specific characteristics, essentially forming a distinct subgroup within the larger data set; “cluster analysis” is the statistical method used to identify and group these clusters, allowing researchers to identify patterns and relationships within the data without pre-defined categories.
Cluster Sample
An SRS of clusters, where all individuals in the selected clusters are sampled.
Researchers divide a population into smaller groups known as clusters. They then randomly select among these clusters to form a sample. Cluster sampling is a method of probability sampling that is often used to study large populations, particularly those that are widely geographically dispersed.
Undercoverage
Some groups of the population are left out of the process of choosing the sample.
A situation where a specific segment of the population is not adequately represented in a sample, meaning certain groups are excluded or significantly under-sampled, leading to a biased result that doesn’t accurately reflect the whole population being studied; essentially, it’s when a part of the target population is left out of the sampling process, creating an unrepresentative sample.
Nonresponse
When individual(s) chosen in the sample can’t be contacted or refuse to participate
A selected individual does not respond to a survey or census. It can occur when someone is unable, unavailable, or unwilling to participate.
Observational Study
Observes individuals and measures variables of interest, but does not attempt to influence the responses.
Research design where researchers collect data by observing participants or existing data without actively manipulating any variables, meaning they do not intervene or assign treatments, and simply observe and record information to identify potential relationships between factors; this is in contrast to an experimental study where researchers actively control variables to establish causation
Experiment
Deliberately imposes a treatment on individuals to measure their response.
A controlled procedure designed to test a hypothesis by manipulating one or more variables and observing the effects on a response variable, allowing researchers to establish causality between variables and validate theories; essentially, it’s a structured method to gather data to answer a research question by actively influencing conditions to see how they impact the outcome.
Lurking Variable
A variable, which is not one of the explanatory or response variables in a study, that may influence the response variable.
A variable that is not included in a study but significantly influences the relationship between the measured variables, potentially creating a misleading interpretation of the data; essentially, it’s a hidden factor that affects the observed relationship between two variables, even though it wasn’t considered in the analysis.
A variable that is hidden or not included in an analysis, but impacts the relationship being analyzed. Some lurking variables hide real relationships, while others can make a false relationship appear to exist.
Confounding
When two variables are associated in such a way that their effect on a response variable cannot be distinguished from one another.
a variable that influences both the dependent variable and independent variable, causing a spurious association
Closely related to both the independent and dependent variables in a study. An independent variable represents the supposed cause, while the dependent variable is the supposed effect. A confounding variable is a third variable that influences both the independent and dependent variables.
Factor
Another name for explanatory variable.
A variable that is manipulated or controlled in an experiment to observe its effect on another variable, called the “response variable”; essentially, it’s a categorical variable with a limited set of possible values that researchers actively change to study its influence on the outcome of an experiment.
Treatment
A specific condition applied to individuals in an experiment (if there are multiple explanatory variables, the treatment is the combination of their values).
A specific condition or intervention applied to experimental units in a study, allowing researchers to compare outcomes across different groups and determine the effect of a particular factor on a response variable; essentially, it’s the manipulated variable in an experiment that you want to test the impact of.
Experimental Unit
the smallest collection of individuals to which treatments are applied.
The individual entity (person, animal, object, or group) that receives a specific treatment or condition in an experiment, essentially the primary unit of observation upon which data is collected and inferences are made about the study population; it’s the element that is independently manipulated and measured within an experiment
Subject
If the experimental unit is a person, we refer to the experimental unit as a “subject”.
The individual or object being studied. Subjects can be people, animals, organizations, events, or physical objects.
Random Assignment
Experimental units are assigned to treatments at random (using a chance process)
the process of placing participants in different groups within an experiment using a random method, ensuring that each individual has an equal chance of being assigned to any group, which helps to control for confounding variables and allows researchers to confidently attribute any observed differences to the experimental treatment rather than pre-existing group characteristics.
A procedure used in experiments to create multiple study groups that include participants with similar characteristics so that the groups are equivalent at the beginning of the study
Completely Randomized Design
Treatments are assigned to all experimental units completely by chance
An experimental design where experimental units (participants, samples, etc.) are randomly assigned to different treatment groups, ensuring that each unit has an equal chance of receiving any treatment, with no systematic bias in the assignment process; it’s considered the most basic type of experimental design where the primary focus is on studying the effect of one factor by randomly assigning its levels to the units.
Control Group
A group of experimental units to which no treatment or a treatment with known effects is applied, to serve as a baseline when comparing the effects of other treatments.
A group of participants in an experiment that is not exposed to the variable being tested, serving as a baseline for comparison against the experimental group that does receive the treatment, allowing researchers to isolate the effect of the tested variable and establish causality in their study.
A group in the experiment which a variable is not being tested, such as a test subject that does not receive any treatment. Control groups serve as important benchmarks to compare the results of the experimental group, or the group that is being experimented on.
Placebo Effect
A response to a “dummy” treatment (which is really a non treatment disguised to appear as a treatment, to account for the psychological impact of believing one is being treated)
The difference in response between a treatment and a fake treatment, or placebo. The placebo effect is a psychological phenomenon that occurs when people believe that a treatment will help them feel better.
Illustrates how the mind can trigger changes in the body. Example: The power of suggestion In a study, participants are given a placebo but are told it’s a stimulant. While talking about the “medication,” researchers are convincing and positive about the expected results.
Double-Blind
Neither the subjects nor those who interact with them and measure the response variable know which treatment(s) a subject received.
A research design where neither the participants nor the researchers know which group (experimental or control) each participant is assigned to, minimizing the potential for bias in the results by preventing them from unconsciously influencing the study outcome based on their knowledge of the treatment groups.
The participants do not know what treatment groups they are in and neither do the researchers who are interacting with them directly. Double-blind studies are used to prevent researcher bias.
Statistically Significant
An observed effect so large that it would rarely occur by chance
The claim that a result from data generated by testing or experimentation is likely to be attributable to a specific cause. A high degree of statistical significance indicates that an observed relationship is unlikely to be due to chance.
Block
A group of experimental units that are known before the experiment to be similar in some way that could be expected to affect the response to the treatments
A group of similar experimental units within a study design, where units are grouped together based on shared characteristics to control for potential confounding variables, essentially minimizing the impact of extraneous factors on the results of an experiment; this technique is called “blocking” and is used to create more homogenous groups for comparison within the study.
Randomized Block Design (RBD)
The random assignment of experimental units to treatments is carried out separately within each block
An experimental design where experimental units are first grouped into similar blocks based on a specific characteristic, and then treatments are randomly assigned to the units within each block, allowing researchers to control for the effects of that characteristic while comparing treatment groups; essentially, it’s a way to minimize the impact of extraneous variables by grouping similar subjects together before randomly assigning treatments.
An experimental design where subjects or experimental units are grouped into blocks, with the different treatments to be tested randomly assigned to the units in each block. It is a way to set up an experiment to make data analysis simple and easy to understan
Matched Pairs
Randomized block design for comparing two treatments, in which blocks are created by matching pairs of similar experimental units, in which one unit receives the first treatment, and the other receives the second treatment.
A research design where participants are grouped into pairs based on similar characteristics (like age, gender, or other relevant variables), and then within each pair, one individual is randomly assigned to one experimental condition while the other is assigned to a different condition, allowing for a more controlled comparison between the two groups by minimizing the influence of confounding variables
A Special case of randomized block design, where an experiment only has two treatment conditions. The participants are grouped together into pairs based on an equivalent variable, such as age or gender. Within each pair, subjects are randomly assigned to one of two treatments.