Topic 1: Introduction to Data Science & Alternative Data Flashcards
What gave rise to the realm of data science?
- Investments in business infrastructure
- Volume and variety of data
- Powerful computers
Why is data mining used for customer relationship management?
To manage attrition and maximize expected customer value
Type I and Type II Data Driven Decision (DDD) Making Problems
Type I: Decisions for which discoveries need to be made within the data
Type II: Increase decision making accuracy based on data analysis
Why view data and data science capability as a strategic asset?
Viewing these as assets allows us to think explicitly about the extent to which one should invest in them
How do you transition a business problem into a data mining problem?
Convert the business problem into subtasks and match the subtasks to known tasks for which tools are available.
Difference between regression and classification
classification predicts whether something will happen, regression predicts how much something will happen.
Define Classification
predict, for each individual, in a population which set of classes this individual belongs to
What does a regression attempt to do?
attempts to estimate or predict, for each individual, the numeric value of some variable for that individual
Similarity matching
attempts to identify similar individuals based on data known about them
Clustering
attempts to group individuals in a population together by their similarity, but not driven by any specific purpose.
Co-occurrence grouping
attempts to find associations between entities based on transactions involving them
“what items are often purchased together”
Profiling (behavior description)
attempts to characterize the typical behavior of an individual, group, or population.
Link prediction
attempts to predict connections between data items
Data reduction
attempts to take a large dataset and replace it with a smaller set of data
Causal modeling
helps us understand what events or actions actually influence others