Introduction to Data Science Part 1 Flashcards
What is Data Science?
A process of using data to understand different things and uncover insights using scientific tools like programming and statistics.
What are the key objectives of Data Science?
Extract knowledge from data, uncover insights, and make informed decisions.
What are common terms associated with Data Science?
Big Data, Machine Learning, Artificial Intelligence, Data Mining, Predictive Analytics.
What are the types of data in Data Science?
Qualitative (descriptive data) and Quantitative (measurable values).
What are the three main data formats?
Structured, Unstructured, and Semi-structured data.
What are examples of Structured Data?
Relational databases, spreadsheets, and data tables.
What are examples of Unstructured Data?
Images, videos, social media posts, and PDFs.
What are examples of Semi-structured Data?
JSON, XML, and HTML documents.
What are the major sources of data?
Web data, financial transactions, online trading, social networks, business records.
What is Big Data?
Data that is expensive to manage and difficult to extract value from.
What are the 5 V’s of Big Data?
Volume, Velocity, Variety, Veracity, and Value.
What does ‘Volume’ refer to in Big Data?
The size of data being generated.
What does ‘Velocity’ refer to in Big Data?
The speed at which data is processed and analyzed.
What does ‘Variety’ refer to in Big Data?
Different types of data sources and formats.
What does ‘Veracity’ refer to in Big Data?
Data quality and reliability.
What does ‘Value’ refer to in Big Data?
The potential business benefits derived from analyzing data.
What is Machine Learning?
A field of AI that enables systems to learn and improve from experience without explicit programming.
What are the three main types of Machine Learning?
Supervised Learning, Unsupervised Learning, and Reinforcement Learning.
What is the goal of Supervised Learning?
To learn a mapping from inputs to outputs using labeled data.
What is the goal of Unsupervised Learning?
To find patterns or structure in data without labeled responses.
What is Reinforcement Learning?
A type of learning where an agent interacts with an environment to maximize cumulative reward.
What is AI (Artificial Intelligence)?
The simulation of human intelligence processes by machines, including learning, reasoning, and self-correction.
What is the difference between Data Science and Machine Learning?
Data Science produces insights, while Machine Learning produces predictions.
What are the main application areas of Data Science?
Industrial processes, business, text data, image data, and medical data applications.
What are some industrial applications of Data Science?
Fault prediction, preventive maintenance, demand forecasting, inventory management, price optimization.
What are some business applications of Data Science?
Market trend analysis, churn analysis, credit risk modeling.
What are some text data applications of Data Science?
Sentiment Analysis, Topic Modeling, Conversational AI.
What are some image data applications of Data Science?
Computer Vision, Machine Vision.
What are some medical applications of Data Science?
Disease diagnosis, patient data analysis, medical imaging analysis.
What is the CRISP-DM process?
A standard for data mining with phases: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, Deployment.
What are the six phases of CRISP-DM?
Business understanding, Data understanding, Data preparation, Modeling, Evaluation, Deployment.
What is the TDSP (Team Data Science Process)?
A methodology developed by Microsoft for structuring data science projects.
What are the key steps in a Data Science project?
Problem definition, Data Collection, Data Processing, Model Building, Model Evaluation, Deployment.
What key questions does Data Science aim to answer?
What is the problem? What data is needed? Where does data come from? How should data be processed? How should models be evaluated?
What are common ways to visualize data for insights?
Charts, graphs, heatmaps, scatter plots, histograms.
What are key qualities of a good Data Scientist?
Inquisitive, knowledgeable, proficient in machine learning, statistics, and probability, skilled in coding, and strong in domain knowledge.
What coding skills should a Data Scientist have?
Python, R, SQL, and tools for data processing like Pandas, NumPy, and Scikit-Learn.
Why is domain knowledge important for a Data Scientist?
It helps in interpreting data correctly and making meaningful insights relevant to the industry.
What are some emerging research topics in Data Science?
Big Data Modeling, AI Ethics, Fairness in Machine Learning, Explainable AI, Edge Computing.
What are some challenges in Data Science?
Data privacy concerns, data bias, computational complexity, data storage and management.
What is the impact of Data Science in healthcare?
Predicting diseases, personalizing treatments, improving patient care, optimizing hospital operations.