CHAPTER 1 Flashcards

Question

sample without replacement

Answer 1

do not place the element chosen on a particular selection back into the population. => cannot choose again => best to sample w/o replacement

Answer 2

a list of all of the population elements

Answer 3

a table containing random digits that is often used to select a random sample

Answer 4

a process is a sequence of operations that takes inputs (labor, materials, methods, machines, and so on) and turns them into outputs (products, services, and the like)

Answer 5

a population that contains a finite number of elements

Answer 6

a population that is defined so that there is no limit the number of elements that could potentially belong to the population

Answer 7

sampling where we know the chance (prob.) that each population element will be included in the sample

Answer 8

sampling where we select elements because they are **easy or convenient to sample**

Answer 9

sampling in which the sample participants self-select

Answer 10

sampling where an **expert** **selects** population elements that he/she **feels** are **representative** **of the population** ## Footnote dangerous to use the sample to make stat inferences about the population because it depends upon the judgment of the person selecting the sample

Answer 11

purposely selecting a biased sample ## Footnote e.g: using a nonrandom sampling procedure that overrepresents population elements supporting a desired conclusion or that underrepresents the population not supporting the conclusion

Answer 12

unethical stat practice

Answer 13

select many different samples and running many different tests | produce a result that seems to be true but not

Answer 14

The use of traditional and more recently developed statistical graphics to present to executives (and sometimes customers) easy-to-understand visual summaries of up-to-the-minute information concerning the operational status of a business.

Answer 15

use the **traditional** **and/or newer graphics** to **present** to **executives** (and sometimes customers) easy-to-understand **visual summaries** of **up-to-the minute** info concerning the **operation status of a business**.

Answer 16

association learning, text mining, cluster analysis, and factor analysis.

Answer 17

identify items that tend to co-occur and finding the rules that describe their co-occurrence.

Answer 18

The science of **discovering knowledge, insights and patterns ** from a collection of **textual documents or databases** ## Footnote using latent semantic analysis

Answer 19

analyze the **relationship** between a collection of **documents and the words they contain** to produce a set of **key** **concepts or factors related to the documents and words**

Answer 20

Finding **natural grouping or clusters** within data **without having to prespecify a set of categories**

Answer 21

Start with a large number of **correlated variables and finding fewer underlying, uncorrelated factors** that describe the essential aspects of the large number of correlated variables ## Footnote reducing large number of variables to fewer underlying factors helps a business focus its activities and strategies

Answer 22

methods used to find anomalies, patterns, and associations in data sets, with the purpose of predicting future outcomes. The applications of predictive analytics include anomaly (outlier) detection, association learning, classification, cluster detection, prediction and factor analysis | supervised learning technique ## Footnote methods used to predict values of a response variable on the basis of one or more predictor variables.

Answer 23

assign items to a specificed categories or classes

Answer 24

- nonparametric predictive analytics - parametric

Answer 25

find a **math equation ** that **relates** the **response variable** to the **predictor variable(s)** and **involves unknown parameters** that must be **estimated and evaluated by using simple data;**

Answer 26

- classical linear regression - logistic regression - discriminate analysis - neureal networks - time series forecasting

Answer 27

combine **external and internal constraints** with **results from descriptive or predictive analytics** to **recommend an optimal course of action**

Answer 28

- decision theory methods - linear optimization - nonlinear optimization - simulation

Answer 29

uses a training set to teach models to yield the desired output

Answer 30

ratio and interval

Answer 31

- quantitative variable - measured on a scale such that ratios of its values are meaningful - there is an inherently defined zero value | distance of 0 miles = no distance at all 30 miles is twice as far as 15

Answer 32

- quantitative variable - ratios are not meaningful - no inherently defined zero value | 0 degree = cold

Answer 33

ordinal and nominative

Answer 34

- qualitative - meaningful ordering/ranking of the categories | good-average-poor/1->5

Answer 35

- qualitative variable - no meaningful ordering/ranking

Answer 36

methods for obtaining a sample

Answer 37

divide the pop. into nonoverlapping groups of similar elements (**strata**) - random sample is selected from each stratum - these samples are combined to form the full sample ## Footnote wise to stratify when the pop. consists of 2 or more groups that differ with respect to the variable of interest. (age, gender, ethnic group, income)

Answer 38

1. Stage 1: Randomly select a sample of counties from all of the counties in the US 2. Randomly select a sample of townships from each county in Stage 1 3. Randomly select a sample of voting precincts from each township selected in Stage 2 4. Randomly select a sample of registered voters from each voting precinct selected in Stage 3 | take a sample of registered voters from all registered voters in the US ## Footnote advantageous when selecting sample from a very large geographical region (a frame doesn't exist)

Answer 39

a sample taken by moving systematically through the population. - Select a sample of n elements w/o replacement from a frame of N elements: divide N by n (round **down** to nearest **whole** number) = l - Randomly select one element from the first *l* elements in the frame - The remaining elements in the sample are obtained by selecting every *l* th element following the first element

Answer 40

- dichotomous (yes/no) - MCQ - open-ended questions

Answer 41

- clearly stated - can be answered quickly - yield data that are easily analyzed - cons: info many be limited by the two-option format

Answer 42

- several different forms - either categorical or numerical

Answer 43

- most honest and complete information - no suggested answers to divert or bias a person's respone

Answer 44

- inexpensive - conducted by callers who have very little training - impersonal nature -> respondent may misunderstood some of the questions - some people cannot be reached and that others may refuse to some or all of the questions => low response rate

Answer 45

the **proportion** of **all** **people** whom we **attempt** to **contact** that **actually** **respond** to a survey.

Answer 46

- inexpensive - recipients often won't reply unless they receive some kind of financial incentive or other reward - the process can take significantly longer than a phone survey

Answer 47

- same problems as mail surveys - respondents may record their true reactions incorrectly because they have misunderstood some of the questions posed

Answer 48

- more control - more likely to respond (because of face-to-face) - questions are less likely to be misunderstood because the people conducting the interviews are typically trained employees who can clear up any confusion - cons: **interviewers** can potentially "**lead**" a respondent by **body language** + more costly ## Footnote mall survey, 50% response rate

Answer 49

the entire population of interest to us in a particular study

Answer 50

a** list of sampling elements** (people or things) from which the sample will be **selected** (should **closely agree with the target population**)

Answer 51

The difference between a numerical descriptor of the population and the corresponding descriptor of the sample

Answer 52

- errors of **nonobservation**: related to **population** **elements** that are **not** **observed** - errors of **observation**: occurs when the **data** collected in a survey **differs from the truth**

Answer 53

sample frame is different from the target population - undercoverage: some pop. elements are excluded from the process of selecting the sample

Answer 54

occurs whenever some of the individuals who were supposed to be included in the sample are not

Answer 55

- bias in the results - related to how survey applicants are selected

Answer 56

- bias results - related to how survey participants answer the survey questions

CHAPTER 1 Flashcards

(80 cards)