5. AI System Development Life Cycle Flashcards
what are the stages of AI System development life cycle?
- Planning
- Design
- Development
- Implementation
What should be considered in the planning phase of the AI system development life cycle?
Business objectives and requirements (successfully implementing an AI system will be difficult without first identifying the business problem).
What are the main business problems that may exist in the AI system development planning phase?
- Classification: A problem that requires using an AI system to classify data into different types
- Regression: A problem that requires using an AI system to predict what an organization should
do in the future based on past data - Recommendation: A problem that requires using an AI system to make a recommendation; e.g.,
viewer recommendations and product recommendations
What should be considered in the AI system development planning phase?
Focus on organizational mission and gap identification.
What questions should be asked about data in the AI system development lifecycle?
- Do you have the right data to make your AI system usable?
* AI systems are all about data
* If you don’t have the right, enough, or accurate data, it will not be usable or will not perform
well - What type of data is accessible to you and usable?
* Do you readily have access to data that is usable? - Do you need to look for new data?
How do you determine the scope of an AI project?
Prioritize the business problems to determine which use cases to do first.
Focus on three qualities:
- Impact of use of an AI system for the particular problem
* How big of an impact will it have?
* Will it solve a bigger problem or a smaller problem?
* What is it going to take to do that? - Effort
* What types of resources do you need available to implement the AI system?
* How long is it going to take? - Fit to prioritize the use case and business case
* How well does the use of an AI system fit with the goals of the organization and the identified business problem?
What is the best way to determine the governance structure for an AI project?
- Identify who has responsibilities for maintaining and implementing the AI governance structure
* Who writes the AI policies and procedures?
* Who oversees development and testing or selecting the AI system product?
* These decisions should be documented - Identify an executive within the organization to be the champion for development and implementation of the AI system
* Increases the impact
* Helps get other stakeholders to support the total effort
Describe the design phase of the AI system development life cycle:
The design phase includes implementing a data strategy, including data gathering and data collection
* Data is critical for an AI system
* Right data is required for the AI system to work well
What are the data gathering considerations for the design phase of the AI system development life cycle?
- Information systems development, in general, is concerned with data quality (“Garbage in, garbage out”: If you have bad data going into a system, you will end up with bad results coming out)
- Examine the quality of the data going into the AI design and the overall system and model
Describe data formats used for AI development:
- Structured and unstructured
*Structured or labeled data is usually data that can go into a spreadsheet with rows and categories
* Unstructured or unlabeled/uncategorized data may need to be structured to be put into a model (ex. a large data set that is just a collection of images)
- Static and streaming
- Static data does not change (ex. historical data such as records of past sales)
*Streaming data will change (ex. data about customers visiting a website that changes every time they visit)
What is data wrangling?
It involves taking raw data and converting it to valuable information (most raw data is not usable, it needs to be formatted a certain way to be used in the system).
It is an important step to ensure good output.
Time consuming (about 80 % of the AI system development life cycle)
What are the 5 V’s data preparation?
- Volume
* How much data do you have?
* How large is the data set or data sets that you’re going to be using? This is
necessary to understand how much preparation you’re going to need to do - Velocity
* How often does it get updated?
* Does it regularly change? - Variety
* What type of data is it?
* Is it structured, unstructured or another type of data? - Veracity
* How accurate is it?
* How trustworthy is it?
* Did you get it from a source that you know is reliable, so you don’t have to worry
that the data might not be correct? - Value
* What is the outcome that you want from the use of the AI system?
* Will the data get you there?
* Is it the right data to use?
What are the steps of data wrangling or data preparation phase?
- Cleansing
* Remove erroneous or irrelevant data from the data sets
* Some of the data may not be needed for the AI system and should be eliminated
* Also remove inaccurate data
* If personal data is in the data sets and is not needed for the AI model, remove it so it will not cause privacy issues later - Labeling includes tagging or annotating the data to identify what kind of data it is
- Anonymization
* One method for protecting privacy that involves removing identifiers from the data: name, SIN, phone number, address, or other PI that can identify an individual
* Completely anonymizing data is difficult because individuals can be identified in many ways and combining data sets can potentially reidentify them - Data Minimization
The concept that if you do not need the data for your specific application, you should not use it to train your model or use it as input
* Once again, for privacy, not including personal data will make the system more protective of the individual’s privacy - Privacy enchaining technologies (PETs)
* Differential privacy
* Federated learning
Describe privacy enhancing technologies (PETs):
- Differential privacy
* Blurs the data by using an algorithm that keeps the data meaningful but makes it
nonspecific
* Individuals are unidentifiable but the data is still usable - Federated learning
* A new way to train models/machine learning method that does not require sharing sensitive data among different locations
* The global model is in a central location; e.g., the cloud
* Different locations download the global model and train it on their own local data
* Only the updates of the local model, not the training data itself, are sent to the
central location where they are aggregated into the global model * The process is iterated until the global model is fully trained
* A great way to potentially solve problems, such as diagnosing a new illness - using data from different locations where they might have seen symptoms of the illness
How is the AI system architecture determined?
When selecting the model, choose an algorithm according to the desired level of accuracy and interpretability of the data.
Questions:
* What do you want to learn from the data?
* How is it going to help you solve your problem?
* What are the other requirements and constraints?
Examples:
* Do you have a time constraint for completing the model? How does that
impact the available training time?
* Are additional efforts needed to ensure the data is completely accurate?