1: Lecture 1 (Notes) Flashcards
Define Data Science.
Data Science is an emerging field that utilizes computer science, statistics, machine learning, visualization, and human-computer interactions to collect, clean, integrate, analyze, visualize, and interact with data to create data products.
What is a Knowledge Base?
A Knowledge Base is a collection of entities, facts, and relationships that conforms with a certain data model, helping machines understand humans, languages, and the world.
What are the primary sources of Big Data?
Big Data sources include online activities (clicks, impressions), Internet of Things (machine-to-machine interactions like smart homes), scientific computing (genomic data), and user-generated content (social networks, reviews).
What are the 5 V’s of Big Data?
Volume (sheer size), Velocity (rate of change), Variety (types of data), Veracity (data quality), and Value (usefulness for decision-making).
How does Data Science contrast with Business Intelligence?
Business Intelligence queries the past, focusing on what has already happened. Data Science queries the past, present, and future, making predictions and suggesting actions.
Describe the Data Science Pipeline Process Model.
Includes Discover, Wrangle, Profile, Model, Evaluate, Visualize, Report, and Iterate to improve based on feedback.
What are the challenges in Data Science?
Challenges include preparing data (dealing with noise, diversity, incompleteness), analyzing data (ensuring scalability and accuracy), representing analysis results effectively, and workflow management.
What skills are essential for a Data Scientist?
Skills include Data Management (collection, storage, cleaning), Large-scale Parallel Data Processing, Statistics and Machine Learning, and Interface and Data Visualization.