Data Science Methodology Flashcards
What is the PPDAC model
The PPDDAC model is a structured approach used to carry out investigative research in general. It works well for data science due to its explorative and inquisitive nature
What does PPDAC stand for
Problem, Plan, Data, Analysis, Conclusion
What is the first step in PPDAC model
The first step is identifying the Problem or question that needs to be answered
What is the second step in PPDAC model
The second step is to create a Plan for how to approach the problem and gather data
What is the third step in PPDAC model
The third step is gathering and cleaning Data that is relevant to the problem being investigated
What is the fourth step in PPDAC model
The fourth step is conducting an Analysis of the data to draw insights and identify patterns
What is the final step in PPDAC model
The final step is drawing a Conclusion based on the analysis and using it to solve the original problem or answer the original question
What kinds of questions are asked to understand the problem
How much/many…, Which category does this belong to…, Are there similar kinds…, Is this strange…, What action should I take when…, …
What are some academic planning considerations in the PPDAC model
Identifying what data is needed, determining how much data will be needed and evaluating the solution
What are some practical considerations in the PPDAC model
Some practical considerations in the PPDAC model include determining where the data will come from and whether it needs to be collected, deciding where and how the data will be stored and considering any legal or ethical implications
What is the importance of identifying what data is needed in the PPDAC model
Identifying what data is needed is important in the PPDAC model because it helps ensure that the analysis is relevant and meaningful to the problem being investigated
Why is it important to consider legal and ethical implications in the PPDAC model
It is important to consider legal and ethical implications in the PPDAC model to ensure that the investigation conducted in a responsible and lawful manner and that any potential harm or negative impacted is minimized
What is the purpose of evaluating the solution in the PPDAC model
The purpose of evaluating the solution in the PPDAC model is to determine whether the solution effectively addresses the original problem or question and to identify any areas for improvement
What is the role of data storage in the PPDAC model
The role of data storage in the PPDAC model is to ensure that the data is easily accessible, organised, and secure throughout the investigation and analysis process
Why is it important to determine whether data needs to be collected in the PPDAC model
It is important to determine whether data needs to be collected in the PPDAC model to ensure that the investigation is conducted efficiently and effectively and that the analysis is based on relevant and accurate data
What are the steps involved in data processing in the PPDAC model
Obtaining the data, conducting quality checks, cleaning the data, and addressing any missing values
What is the role of data management in the PPDAC model
Determining how the data will be represented or stored, ensuring proper storage and maintenance of the data, and managing access rights to the data
What are some common ways that data is represented or stored in the PPDAC model
Tabular formats (2D table with each row as an observation and each column as a measurement), structured formats (each observation is represented by a dictionary of keys and values), and semi-structured formats (not all records are represented by the same keys