Exam 1 Prep Flashcards
Q: What are the 5 common methods for collecting raw data?
A: Public data, data from an existing product, human-in-the-loop, brute force, buying data.
Why is data labeling important?
A: It allows the machine learning model to understand the data and make accurate predictions, like identifying objects in images.
Q: What is the difference between simple and advanced data labeling?
A: Simple labeling tags objects (e.g., drawing boxes around cars), while advanced labeling tags every pixel in an image (e.g., marking pixels as “road” or “pedestrian”).
Q: Name three ways to speed up data labeling.
A: External annotation services, internal annotation teams, and using tools like supervised prediction or active learning.
What is “human-in-the-loop” data collection?
A: It’s when humans help a system gather data and guide its learning, like a person controlling an autonomous robot until it’s more capable
Q: What is the main message Hans Rosling is trying to convey in the TED Talk?
The world is more developed than people realize, and many countries have made significant progress in health, income, and life expectancy. The global divide is not as sharp as often portrayed.
Q: What does Hans Rosling use to present data in his TED Talk?
He uses dynamic, animated data visualizations (interactive graphs) that show the progress of countries over time. This helps make complex data more accessible and engaging.
Q: What misconception does Rosling address about the world’s countries?
He challenges the idea that countries are either “developed” or “underdeveloped” (rich vs. poor). Instead, he shows how many countries have advanced and now fall into a middle-income bracket.
Q: How does Rosling emphasize global interconnectedness in his talk?
A: He shows that the progress in one country (e.g., improvements in health or education) can influence other countries, demonstrating how the world is interconnected and how global development is possible.
Q: What role does data visualization play in Rosling’s presentation?
Data visualization makes it easier for the audience to understand complex trends and see the changes over time. It helps turn raw numbers into an engaging story and brings clarity to the data.
Q: What does Rosling believe data can help do?
Data can help correct misconceptions, inspire optimism, and promote a more accurate understanding of global progress and development.
Q: What question does Rosling ask the audience during the talk?
He asks the audience to guess the life expectancy and income of countries at different points in history to challenge their assumptions and engage them in the topic.
Q: What is one key takeaway about the future of global development?
While challenges still exist, global development has made significant progress, and with continued efforts, things will continue to improve.
Q: How does data shape people’s worldview?
A: Properly presented data can change how people perceive the world. It can break stereotypes, correct false assumptions, and give a more hopeful, nuanced view of global development.
Q: What is the significance of data science in the context of Rosling’s talk?
A: Data science allows us to analyze and visualize data to uncover patterns, trends, and insights that can help people make informed decisions and understand global issues better.
Q: What is the main difference between data science, hacking, and statistics?
A: Data science combines practical tool knowledge (like hacking) and theoretical understanding (like statistics), whereas hacking focuses on quick coding and statistics on mathematical modeling without the coding part.
Q: What is Drew Conway’s Venn diagram used to explain?
: It explains the hybrid nature of data science, involving three core areas: hacking skills, machine learning, and math/statistics knowledge. The intersection is where data science exists.
Q: What key skills are often associated with data scientists?
A: Statistics, data munging (parsing, scraping, formatting data), machine learning, and data visualization.
Q: Why is there debate over whether data science is just a rebranding of statistics?
A: Some argue that traditional statistics already covers much of what data science does, making the term “data science” a rebranding of established fields, while others see it as a necessary evolution due to new technological tools and needs.
Q: What is the role of a social scientist in data science?
A: Social scientists are valuable in data science, especially when analyzing human behavior or solving problems related to social phenomena. Their skills in asking questions and understanding context complement data analysis.
Q: What makes data science a team effort?
Data science requires a wide range of skills (programming, statistics, communication, etc.), making it impractical for one person to master everything. Teams with varied expertise work best.
Q: Why is “data scientist” a job title mostly found in industry, not academia?
A: The role of a data scientist emerged in tech companies (like LinkedIn and Facebook) to tackle complex data problems, but it hasn’t yet become an official academic title, though it may evolve in the future.
Q: What is the primary role of a data scientist in industry?
A: Data scientists in industry extract meaning from data, clean and transform it, build models, perform exploratory data analysis, and communicate insights to decision-makers. They bridge the gap between technical analysis and practical application.
Q: What are the essential components of a data science team?
A data science team should include members with expertise in statistics, machine learning, computer science, data visualization, and domain knowledge to tackle diverse aspects of data problems
Q: What is the difference between a “data scientist” and a “data researcher”?
A: A data scientist typically works with real-time, operational data and focuses on practical solutions, while a data researcher may focus more on theoretical aspects and data exploration in an academic setting.
Q: What are some of the challenges data scientists face when working with data?
: Data scientists often deal with messy, incomplete, or inconsistent data. Cleaning and transforming data, as well as debugging code, require patience and technical skills.
Q: How can you define data science, based on its usage?
Data science can be defined by its application—extracting insights from data using a mix of statistical, computational, and domain-specific knowledge to solve real-world problems.
Q: What is the importance of exploratory data analysis (EDA)?
EDA is crucial for understanding the data, finding patterns, and identifying anomalies or trends, which informs decision-making and model building.
Q: How does the media influence the perception of data science?
Media often romanticizes data science, labeling it as “the sexiest job” or overhyping the field, which leads to misconceptions about the role and skillset of a data scientist.
A type of study where scientists make conclusions based on data they observed, but had no hand in generating.
Observational study
This type of trial lets us isolate one variable in order to establish causality and avoid interference from other variables.
Randomized Control Trial
A factor other than the one being studied which may be associated with the outcome.
Confounding
Known as the father of epidemiology for his brilliant cholera experiment
John Snow
In a RCT, when neither experiment designers nor participants know who receives the treatment
Double-blind RCT
This type of plot best looks at the distribution of one quantitative variable.
Histogram
What is plotted on the vertical axis of a histogram?
Frequency, counts
Data Science is a combo of three disciplines : Statistics, Domain Expertise, and ______.
Computer Science
A type of data collection where the sole purpose of an activity is data acquisition.
Brute Force