CHAPTER 1 Flashcards

1
Q

Data set

A

data collected to study info about element

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

variable

A

characteristic of an element

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

measurement

A

assigning a value of a variable to the element

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Quantitative/numerical

A

answer how much/how many

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

qualitative/categorical

A

record several categories an element fall

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

cross-sectional data

A

data collected at the same point in time (e,g in a month)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

time-series data

A

data collected over different time periods (e.g: 1999-3000)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

primary data

A
  • collected by individual/business
  • directly thru planned experimentation/observation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

secondary data

A

from existing sources (by public/private sections)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Steps to start a study

A
  • define variable of interest/response variable
  • other variables (factors)
    + can manipulate the value of these factors -> experimental
    + can not manipulate the value of these factors -> observational
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Performing survey/observe

A
  • ask abt behaviors, opinions, beliefs, characteristics
  • observe behaviors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

data warehousing

A

process of centralised data management -> maintenance + creation => central repository for all org’ data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

big data

A

massive amount of data
fast rates in real time and different forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

population

A

set of all elements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

population of measurements

A

carry out a measurement to assign a value of a variable to each and every population’s element

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Census

A

examine all population measurements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

sample

A

subset of the elements of population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

sample of measurement

A

measure a charac. of the elements in a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

descriptive stat.

A

science of describing the important aspects of a set of measurements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

stat. inference

A

science of using a sample of measurement to make Generalizations abt the important aspects of a population of measurements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

random sample

A

sample selected so that every set of n elements in the population has the same chance of being selected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

business analytics

A

the use of Traditional and newly developed stat. methods, advances in Information systems, and itech from Mana. Science to continuously and iteratively explore and investigate past business performance, with the purpose of gaining insight and improving business planning and operations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

data mining

A

the process of discovering useful knowledge in extremely large data sets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

sample with replacement

A

place the element chosen on any particular selection back into the population => give a chance to be chosen on any succeeding selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
sample without replacement
do not place the element chosen on a particular selection back into the population. => cannot choose again => best to sample w/o replacement
26
frame
a list of all of the population elements
27
random number table
a table containing random digits that is often used to select a random sample
28
Process
a process is a sequence of operations that takes inputs (labor, materials, methods, machines, and so on) and turns them into outputs (products, services, and the like)
29
finite population
a population that contains a finite number of elements
30
infinite population
a population that is defined so that there is no limit the number of elements that could potentially belong to the population
31
profitability sampling
sampling where we know the chance (prob.) that each population element will be included in the sample
32
convenience sampling ## Footnote not probability sampling
sampling where we select elements because they are **easy or convenient to sample**
33
Voluntary respnse sample | overrepresent people with strong (usually negative) opinions ## Footnote a type of convenience sampling
sampling in which the sample participants self-select
34
judgement sampling | not probability sampling
sampling where an **expert** **selects** population elements that he/she **feels** are **representative** **of the population** ## Footnote dangerous to use the sample to make stat inferences about the population because it depends upon the judgment of the person selecting the sample
35
improper sampling | unethical
purposely selecting a biased sample ## Footnote e.g: using a nonrandom sampling procedure that overrepresents population elements supporting a desired conclusion or that underrepresents the population not supporting the conclusion
36
misleading charts, graphs, and descriptive measures | unethical
unethical stat practice
37
inappropriate statiscal analysis or inappropriate interpretation of statiscal results
select many different samples and running many different tests | produce a result that seems to be true but not
38
descriptive analytics
The use of traditional and more recently developed statistical graphics to present to executives (and sometimes customers) easy-to-understand visual summaries of up-to-the-minute information concerning the operational status of a business.
39
graphical descriptive analytics
use the **traditional** **and/or newer graphics** to **present** to **executives** (and sometimes customers) easy-to-understand **visual summaries** of **up-to-the minute** info concerning the **operation status of a business**.
40
numerical descriptive analytics
association learning, text mining, cluster analysis, and factor analysis.
41
association learning
identify items that tend to co-occur and finding the rules that describe their co-occurrence.
42
text mining
The science of **discovering knowledge, insights and patterns ** from a collection of **textual documents or databases** ## Footnote using latent semantic analysis
43
Latent semantic analysis
analyze the **relationship** between a collection of **documents and the words they contain** to produce a set of **key** **concepts or factors related to the documents and words**
44
cluster analysis
Finding **natural grouping or clusters** within data **without having to prespecify a set of categories**
45
Factor analysis
Start with a large number of **correlated variables and finding fewer underlying, uncorrelated factors** that describe the essential aspects of the large number of correlated variables ## Footnote reducing large number of variables to fewer underlying factors helps a business focus its activities and strategies
46
predictive analytics
methods used to find anomalies, patterns, and associations in data sets, with the purpose of predicting future outcomes. The applications of predictive analytics include anomaly (outlier) detection, association learning, classification, cluster detection, prediction and factor analysis | supervised learning technique ## Footnote methods used to predict values of a response variable on the basis of one or more predictor variables.
47
classification
assign items to a specificed categories or classes
48
2 classes of predictive analytics
- nonparametric predictive analytics - parametric
49
parametric predictive analytics
find a **math equation ** that **relates** the **response variable** to the **predictor variable(s)** and **involves unknown parameters** that must be **estimated and evaluated by using simple data;**
50
parametric predictive analytics include
- classical linear regression - logistic regression - discriminate analysis - neureal networks - time series forecasting
51
prescriptive analytics
combine **external and internal constraints** with **results from descriptive or predictive analytics** to **recommend an optimal course of action**
52
Prescriptive analytics include
- decision theory methods - linear optimization - nonlinear optimization - simulation
53
supervised learning
uses a training set to teach models to yield the desired output
54
2 types of quantitative variables
ratio and interval
55
ratio variable
- quantitative variable - measured on a scale such that ratios of its values are meaningful - there is an inherently defined zero value | distance of 0 miles = no distance at all 30 miles is twice as far as 15
56
Interval variable
- quantitative variable - ratios are not meaningful - no inherently defined zero value | 0 degree = cold
57
2 types of qualitative variable
ordinal and nominative
58
ordinal variable
- qualitative - meaningful ordering/ranking of the categories | good-average-poor/1->5
59
nominal variable | gender, color.etc
- qualitative variable - no meaningful ordering/ranking
60
sampling design
methods for obtaining a sample
61
stratified random sample
divide the pop. into nonoverlapping groups of similar elements (**strata**) - random sample is selected from each stratum - these samples are combined to form the full sample ## Footnote wise to stratify when the pop. consists of 2 or more groups that differ with respect to the variable of interest. (age, gender, ethnic group, income)
62
multistage cluster sampling
1. Stage 1: Randomly select a sample of counties from all of the counties in the US 2. Randomly select a sample of townships from each county in Stage 1 3. Randomly select a sample of voting precincts from each township selected in Stage 2 4. Randomly select a sample of registered voters from each voting precinct selected in Stage 3 | take a sample of registered voters from all registered voters in the US ## Footnote advantageous when selecting sample from a very large geographical region (a frame doesn't exist)
63
systematic sampling
a sample taken by moving systematically through the population. - Select a sample of n elements w/o replacement from a frame of N elements: divide N by n (round **down** to nearest **whole** number) = l - Randomly select one element from the first *l* elements in the frame - The remaining elements in the sample are obtained by selecting every *l* th element following the first element
64
types of survey questions
- dichotomous (yes/no) - MCQ - open-ended questions
65
Dichotomous Questions
- clearly stated - can be answered quickly - yield data that are easily analyzed - cons: info many be limited by the two-option format
66
MCQ
- several different forms - either categorical or numerical
67
open-ended questions
- most honest and complete information - no suggested answers to divert or bias a person's respone
68
phone survey
- inexpensive - conducted by callers who have very little training - impersonal nature -> respondent may misunderstood some of the questions - some people cannot be reached and that others may refuse to some or all of the questions => low response rate
69
response rate
the **proportion** of **all** **people** whom we **attempt** to **contact** that **actually** **respond** to a survey.
70
mail surveys (self-administered surveys)
- inexpensive - recipients often won't reply unless they receive some kind of financial incentive or other reward - the process can take significantly longer than a phone survey
71
web-based surveys
- same problems as mail surveys - respondents may record their true reactions incorrectly because they have misunderstood some of the questions posed
72
personal interview
- more control - more likely to respond (because of face-to-face) - questions are less likely to be misunderstood because the people conducting the interviews are typically trained employees who can clear up any confusion - cons: **interviewers** can potentially "**lead**" a respondent by **body language** + more costly ## Footnote mall survey, 50% response rate
73
target population
the entire population of interest to us in a particular study
74
Sample frame
a** list of sampling elements** (people or things) from which the sample will be **selected** (should **closely agree with the target population**)
75
Sampling error
The difference between a numerical descriptor of the population and the corresponding descriptor of the sample
76
Two types of sample errors
- errors of **nonobservation**: related to **population** **elements** that are **not** **observed** - errors of **observation**: occurs when the **data** collected in a survey **differs from the truth**
77
Error of coverage
sample frame is different from the target population - undercoverage: some pop. elements are excluded from the process of selecting the sample
78
Nonresponse | problem
occurs whenever some of the individuals who were supposed to be included in the sample are not
79
selection bias
- bias in the results - related to how survey applicants are selected
80
response bias
- bias results - related to how survey participants answer the survey questions