General Definitions Flashcards
DCOVA Framework
Define, Collect, Organize. Visualize, Analyze data.
What are data?
In statistics, data are “the values associated with a trait or property that help distinguishing the occurrences of something”.
Variable
A characteristic of an item or individual.
Descriptive Statistics
Refer to methods that primarily help summarize and present data.
Inferential Statistics
Refer to methods that use data collected from a small group to reach conclusions about a larger group.
Big Data
The collections of data that cannot be easily browsed or analyzed using traditional methods.
Categorical Variables
Take categories as their values (also known as qualitative variables)
Numerical Variables
Have values that represent a counted or measured quantity (also known as quantitative variables).
Discrete Variables
(Numerical variables) Are numerical values that arise from a counting process (e.g. total amount paid).
Continuous Variables
(Numerical variables) Are numerical values that arise from a measuring process and those values depend on the precision of the measuring instrument used (e.g. distance form home to store).
Primary Data Source
Data collected on your own.
Secondary Data Source
Data collected by someone else.
Population
Consists of all the items or individuals about which you want to reach conclusions.
Parameter
When you analyze data from a population you compute a parameter.
Sample
A portion of a population selected for analysis.
Statistics
When you analyze data from a sample you compute statistics.
Structured Data
Refers to all types of data that are structured or organized in any form.
Unstructured Data
Data having very little or no repeating structure or organization.
Frame
A complete or partial listing of the items that make up the population from which the sample will be selected.
Nonprobability Sample
Selecting items or individuals without knowing their probabilities of selection.
ADVANTAGES: convenience, speed, and low cost.
CANNOT be used for statistical inference.
Probability Sample
Selecting items or individuals based on known probabilities.
Convenience and Judgement Samples
Subcategories of nonprobability sample.
In a convenience sample you select items that are easy, inexpensive, or convenient to sample.
In a judgement sample you collect the opinions of preselected experts in the subject matter.
Simple Random Sample
Subcategory of probability sample. It is the most elementary sampling technique. every item from a frame has the same chance of selection as every other item, and every sample of a fixed size has the same chance of selection as every other sample of that size.
Sampling with replacement means that after you select an item, you return it to the frame, where it has the same probability of being selected again.
Sampling without replacement means that once you select an item, you cannot select it again.
Systematic Sample
Subcategory of probability sample. In a systematic sample, you partition the N items in the frame into n groups of k items, where K=N/n
Stratified Sample
Subcategory of probability sample. In a stratified sample, you first subdivide the N items in the frame into separate subpopulations,
or strata. A stratum is defined by some common characteristic, such as gender or year in school. You select a simple random sample within each of the strata and combine the results from the separate simple random samples. Stratified sampling is more efficient than either simple random sampling or systematic sampling because you are ensured of the representation of items across the entire population.
Cluster Sample
Subcategory of probability sample. In a cluster sample, you divide the N items in the frame into clusters that contain several items. Clusters are often naturally occurring groups, such as counties, election districts, city blocks, households, or sales territories. You then take a random sample of one or more clusters and study all items in each selected cluster.
Coverage Error
Coverage error occurs if certain groups of items are excluded from the frame so that they have no chance of being selected in the sample or if items are included from outside the frame. Coverage error results in a selection bias.
Nonresponse Error
Nonresponse error arises from failure to collect data on all items in the sample and results in a nonresponse bias.
Sampling Error
Sampling error reflects the variation, or “chance differences,” from sample to sample, based on the probability of particular individuals or items being selected in the particular samples. This margin of error is the sampling error.
Measurement Error
Certain information is impossible or impractical to obtain directly. When surveys rely on self-reported information, the mode of data collection, the respondent to the survey, and or the survey itself can be possible sources of measurement error.