Chapter 7 Flashcards
Big Data
Any set of data that is too large or too complex to be handled by standard data-processing techniques and typical desktop software.
Central Limit Theorom
A theorem that enables one to use the normal probability distribution to approximate the sampling distribution of a whenever the sample size is large.
Cluster Sampling
A probability sampling method in which the population is first divided into clusters and then a simple random sample of the clusters is taken.
Convenience Sampling
A nonprobability method of sampling whereby elements are selected for the sample on the basis of convenience.
Coverage Error
Nonsampling error that results when the research objective and the population from which the sample is to be drawn are not aligned.
Finite population correction factor
The term (I - n)/(N - 1) that is used in the formulas for on and Of whenever a finite population, rather than an infinite population, is being sampled. The generally accepted rule of thumb is to ignore the finite population correction factor whenever n/N <.05.
Frame
A listing of the elements the sample will be selected from.
Judgement Sampling
A nonprobability method of sampling whereby elements are selected for the sample based on the judgment of the person doing the study.
Measurement Error
Nonsampling error that results from the incorrect or imprecise measurement of the population characteristic of interest.
Nonresponse Error
Nonsampling error that results when potential respondents that belong to some segment(s) of the population are less likely to respond to the survey mechanism than potential respondents that belong to other segments of the population.
Nonsampling Error
All types of errors other than sampling error, such as coverage error, nonresponse error, measurement error, interviewer error, and processing error.
Parameter
A numerical characteristic of a population, such as a population mean M, a population standard deviation o, or a population proportion p.
Point Estimate
The value of a point estimator used in a particular instance as an estimate of a population parameter.
Point Estimator
The sample statistic, such as I, s, or p, that provides the point estimate of the population parameter.
Random Sample
A random sample from an infinite population is a sample selected such that the following conditions are satisfied: (1) Each element selected comes from the same population; (2) each element is selected independently.
Sample Statistic
A sample characteristic, such as a sample mean x, a sample standard deviation s, or a sample proportion p. The value of the sample statistic is used to estimate the value of the corresponding population parameter.
Sampled Population
The population from which the sample is taken.
Sampling Distribution
A probability distribution consisting of all possible values of a sample statistic.
Sampling Error
The error that occurs because a sample, and not the entire population, is used to estimate a population parameter.
Simple Random Sample
A simple random sample of size n from a finite population of size N is a sample selected such that each possible sample of size n has the same probability of being selected.
Standard Error
The standard deviation of a point estimator.
Stratified Random Sampling
A probability sampling method in which the population is first divided into strata and a simple random sample is then taken from each stratum.
Systematic Sampling
A probability sampling method in which we randomly select one of the first k elements and then select every kth element thereafter.
Tall Data
A data set that has so many observations that traditional statistical inference has little meaning.
Target Population
The population for which statistical inferences such as point estimates are made. It is important for the target population to correspond as closely as possible to the sampled population.
Unbiased
A property of a point estimator that is present when the expected value of the point estimator is equal to the population parameter it estimates.
Variety
The diversity in types and structures of the data generated.
Velocity
The speed at which the data are generated.
Veracity
The reliability of the data generated.
Volume
The amount of data generated.
Wide Data
A data set that has so many variables that simultaneous consideration of all variables is infeasible.