Lecture 1 Flashcards
Tall Data
many observation, few variables
Wide data
few observations, many variables
Advantages of Big Data
1.Big
2.Always on
3.Nonreactive ( people are not aware they have been recorded)
Incomplete data
Some info is missing
Inaccessible data
ex.PPD
Outside of the organization:
business and ethical barriers to access the data
Inside of the organization:
databases are not integrated within the system
Unrepresentative data
Invalid data
Dirty data
Loaded with junk or spam
Ex. Twitter bots; Fake reviews
Sensitive data
releasing privacy or confidential details
Big data
a collection of complex data sets, which uses tools and models to extract insights from it
primary data
data collected to answer a research data
secondary data
data collected for non-research purposes
Uses of Big data
1.Personalization
(recommendation algorithms)
2.Boosting engagement
(Facebook likes)
3.New product development
- Reducing churn
( when a customer quits) - Public for economy
Customer churn
customer quits some service
Is Big Data biased or unbiased?
Biased
Insights of Big Data
Big is relative
Data quality
Calculation of the relative size
f = n/N
(sample size/ population size)
Disadvantages of Big Data
1.Inaccessible
2.Incomplete
3.Non-representative
4.Drifting
5. Algorithmically confounded
6.Dirty
7.Sensitive
Algorithmically confounded
the design of the platform can influence user’s behavior, introducing biases
ex. Recommended searches, will increase the magnitude of certain searches.
Drifting
“If you want to measure change, don’t change the measure”
the measure or method changes the system over the time
ex. Google changes the data-generating process, to improve its customer service.