Section 2 Flashcards
—- focus on the benefits and implications of findings, while — focus on the business impact, risks, and return on investment
Business users, project sponsors
A situation in which the inputs to the model are outside the range it was trained on, potentially causing inaccurate or invalid outputs
Out-of-bounds operation
The system where the model is deployed and integrated with existing business processes as opposed to a sandbox or testing environment
Production environment
A small-scale deployment of the model in a live setting, allowing the data science team to manage risk, evaluate performance, and adjustments before a full-scale deployment
Pilot project
What is data? What is information?
Data is the raw material used by analysts, while information refers to processed or organized data
What order does the data analytics lifecycle follow?
Discovery phase, Data preparation phase, Model planning phase, Model execution phase, Communicate results phase, Operationalize phase
The data analytics team familiarizes themselves with the business domain, examines relevant historical data, and assesses available resources.It also involves framing the business problem as an analytics challenge and formulating initial hypotheses to test and explore the data
Discovery phase
Requires the establishment of an analytic sandbox where the team can work with data and perform analytics throughout the project
Data preparation
The team determines the methods, techniques, and workflow to be used during the subsequent model building phase
Model planning
The team develops datasets for testing, training, and production purposes, builds and executes models based on the planning phase and evaluates the need for more robust tools or environments for executing models and workflows
Model execution
Involves determining the project’s success or failure based on the criteria developed in the discovery phase. The team identifies key findings, quantifies the business value, and develops a narrative to summarize and communicate the results to stakeholders
Communicate results
The team delivers, reports, briefings, code, and technical documents. A pilot project may be implemented to test the models in a production environment, ensuring that the results are framed effectively and demonstrate clear value to stakeholders
Operationalization
Refers to the vast amount of information collected, stored and analyzed by businesses and organizations; its unique aspects can differ between organizations and include up to 7 characteristics; however, for this course, we will focus on the main 4 variety, velocity, veracity, and volume
Big data
The diverse types of data,including structured, semi-structured, and unstructured formats; big data comes from numerous sources
Variety
The speed at. which data is produced, collected and processed; in the context of big data, velocity refers to the need for quick analysis and decision-making based on the data gathered
Velocity
The accuracy, reliability and quality of the data collected and analyzed; ensuring data — is essential for gaining valuable insights and making informed decisions
Veracity
The sheer amount of data generaetd and handled by businesses; big data involves dealing with enormous quantities of data ranging from terabytes to petabytes and beyond, which can be challenging in terms of storage and processing
Volume