0 - Introduction Flashcards
What is the primary purpose of this book?
To prepare readers to work effectively with data science teams and maximize the value from their expertise.
What is necessary to thrive in a modern corporate environment?
Some understanding of data science and its applications.
What does ‘good questions’ refer to in the context of data science?
Questions that increase the chances that proposed solutions will solve problems and avoid unnecessary expenses.
What are the key elements for successful collaboration between customers and data science teams?
Understanding goals, communicating solutions, and collaborating to deliver value.
What basic knowledge should readers have before using this book?
Basic understanding of descriptive statistics, ability to read simple graphs, and experience with spreadsheet programs.
What is this book NOT intended to be?
A textbook for becoming a data scientist or a programming book.
What will the book discuss in relation to tools of the trade?
Basic information needed to become a good data science customer, including software and data storage.
Why is it important to use the right tools in data science projects?
To avoid wasting time and money on incorrect software and tools.
What is the composition of a data science team similar to?
A baseball team, with different players having different skills.
What are some job titles found on a data science team?
- Data scientist
- Data engineer
- Data analyst
- Machine learning engineer
- Statistician
What common failure do companies face when starting data science projects?
Trying to hunt mosquitoes with a machine gun, leading to distraction by advanced methods.
What is the focus of unsupervised machine learning?
Grouping people based on data rather than predicting an outcome.
Give an example of customer clusters in the restaurant industry.
- Weekly night-outers
- Anniversary diners
- Family mealers
- One-and-doners
What does supervised machine learning aim to predict?
An outcome of interest, such as response to an ad or hospital stay duration.
List some methods introduced in supervised machine learning.
- Linear regression
- Logistic regression
- Classification and regression trees
- Random forests
- Gradient-boosted machine learning
What specialized topics will the book touch on?
- Network analysis
- Spatial analysis
- Deep learning
- AI
What key metrics are used in network analysis?
- Density
- Centrality
What is one application of AI discussed in the book?
Computer vision, including image tagging and information extraction.
What question will senior managers likely ask regarding data science investments?
Are the millions of dollars spent providing a good return on the investment?
What methods will be discussed for measuring impact?
- A/B testing
- Difference in difference
- Interrupted time series
- Regression discontinuity
What is the primary question regarding data science investments?
Are the millions of dollars we spend on our data science investments providing a good return on the investment?
What are some methods to measure impact in data science?
A/B testing, causal inference, difference in difference, interrupted time series, regression discontinuity
What is an important ethical issue in data science?
Reinforcing racial, sexual, or other biases through algorithms
What should a data science team be aware of regarding modeling?
The ramifications of its modeling to ensure no explicit or implicit bias is included