Chapter 2: Data Collection Flashcards
Observation
A single member of a collection of items that we want to study, such as a person, firm, or region
Variable
Characteristic of the subject or individual, such as an employee’s income or an invoice amount
Data set
Consists of all the values of the variables for all of the observations we have taken as a whole
Data
Used as a plural. Data usually are entered into a spreadsheet or database as an n x m matrix
Specifically, each column is a variable (m columns) and each row is an observation (n rows)
Univariate data sets
Data sets with one variable
Bivariate data sets
Data sets with two variables
Multivariate data sets
Data sets with more than two variables
Types of data - Categorical
Qualitative.
Values that are described by words rather than by numbers.
Verbal label such as vehicle type, pay type )car, truck , salary, hourly, etc) or coded (1, 2, 3 )
Types of data - Numerical
Quantitative.
Values that are described by numbers rather than words, such as counting, measuring something.
Discrete (ie. broken eggs in a carton, annual dental visits) or Continuous (patient waiting time or customer satisfaction percentages)
Coding
When values of categorical variable are represented using numbers.
Ie. 1 = cash 2 = check 3 = credit etc
Binary variables
Categorical variables that only have two values
Discrete
A variable with a countable number of distinct values
Continuous
A numerical variable that can have any value within an interval
Time series data
If each observation in the sample represents a different equally spaced point in time (years, months, days)
Periodicity
The time between observations
Cross- sectional data
If each observation represents a different individual unit (a person, firm, geographic area) at thee same point in time
Sample
A subset of the population that we will actually analyze
Population
All of the items that we are interested in
Census
An example of all items in a defined population.
A sample involves looking only at some items selected from the population while the census is an examination of all the items.
Parameter
A measurement or characteristic of the population (eg. a mean or a proportion) Usually unknown because we can rarely observe the entire population.
Statistic
A numerical value calculated from a sample (eg. a mean or proportion)
Target population
Contains all the individuals in which we are interested
Sampling frame
The group from which we take the samples (ex. are phone directories, voter registration lists, alumni associations mailing lists, or marketing databases)
Random sampling
Items are chosen by randomization or a chance procedure
Non-random sampling
Less scientific but it is sometimes used for expediency
Simple random sample
Every item in the population of N items has the same chance of being chosen in the sample of n items
Random number
A sampling to chose at random
Excels function =RANDBETWEEN(1,4) or any set of numbers
Sampling without replacement
Once an item has been selected to be included in the sample, it can not be considered for the sample again
Sampling with replacement
Once an item has been selected, it can be selected again
=RANDBETWEEN (a,b) function uses sampling with replacement
Infinite population
When the sample is less than 5 percent of the population (ie. when n/N is less than or equal to .05), then the population is effectively infinite.
An equivalent statement is that a population is effectively infinite when it is at least 20 times as large as the sample (or when N/n is more than or equal to 20)
Systematic sampling
Choosing every kth item from a sequence or list, starting from a randomly chosen entry among the first k items on the list
A systematic sample of n items from a population of N items requires the periodicity k be approximately N/n
Strata
Homogeneous subgroups of known size
Stratified sampling
Within each stratum, a simple random sample of the desired size could be taken.
Alternatively, a random sample of the whole population could be taken, and then individual strata estimates could be combined using appropriate weights
Cluster samples
Taken from strata consisting of geographical regions
We divide a region (a city) into sub regions (blocks, subdivisions, school districts)
Judgement sampling
Non-random sampling method that relies on the expertise of the sampler to choose items that are representative of the population
Convenience sampling
Quick. Grabbing whichever sample is available and handy.
Focus group
A panel of individuals chosen to be representative of a wider population, formed for open-ended discussion and idea gathering about an issue
Non-response bias
Occurs when those who respond have characteristics different from those who don’t respond
Selection bias
Self selected samples, ie someone who volunteers for a survey
Response error
Occurs when respondents deliberately give false information to mimic socially acceptable answers, to avoid embarrassment, or protect personal information
Coverage error
Occurs when some important segment of the target population is systematically missed
Measurement error
Results when the survey questions do not accurately reveal the construct being assessed
Interviewer error
When the interviewer’s facial expressions, tone of voice, or appearance influences the responses
Sampling error
Uncontrollable random error that is inherent in any random sample
Valid survey
A survey that measures what the researcher wants to measure
Reliable survey
A survey that is consistent. In other words, will the responses from similar respondents stay the same over time?