5. AI System Development Life Cycle Flashcards

1
Q

what are the stages of AI System development life cycle?

A
  1. Planning
  2. Design
  3. Development
  4. Implementation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What should be considered in the planning phase of the AI system development life cycle?

A

Business objectives and requirements (successfully implementing an AI system will be difficult without first identifying the business problem).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the main business problems that may exist in the AI system development planning phase?

A
  1. Classification: A problem that requires using an AI system to classify data into different types
  2. Regression: A problem that requires using an AI system to predict what an organization should
    do in the future based on past data
  3. Recommendation: A problem that requires using an AI system to make a recommendation; e.g.,
    viewer recommendations and product recommendations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What should be considered in the AI system development planning phase?

A

Focus on organizational mission and gap identification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What questions should be asked about data in the AI system development lifecycle?

A
  1. Do you have the right data to make your AI system usable?
    * AI systems are all about data
    * If you don’t have the right, enough, or accurate data, it will not be usable or will not perform
    well
  2. What type of data is accessible to you and usable?
    * Do you readily have access to data that is usable?
  3. Do you need to look for new data?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you determine the scope of an AI project?

A

Prioritize the business problems to determine which use cases to do first.

Focus on three qualities:

  1. Impact of use of an AI system for the particular problem
    * How big of an impact will it have?
    * Will it solve a bigger problem or a smaller problem?
    * What is it going to take to do that?
  2. Effort
    * What types of resources do you need available to implement the AI system?
    * How long is it going to take?
  3. Fit to prioritize the use case and business case
    * How well does the use of an AI system fit with the goals of the organization and the identified business problem?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the best way to determine the governance structure for an AI project?

A
  1. Identify who has responsibilities for maintaining and implementing the AI governance structure
    * Who writes the AI policies and procedures?
    * Who oversees development and testing or selecting the AI system product?
    * These decisions should be documented
  2. Identify an executive within the organization to be the champion for development and implementation of the AI system
    * Increases the impact
    * Helps get other stakeholders to support the total effort
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe the design phase of the AI system development life cycle:

A

The design phase includes implementing a data strategy, including data gathering and data collection
* Data is critical for an AI system
* Right data is required for the AI system to work well

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the data gathering considerations for the design phase of the AI system development life cycle?

A
  • Information systems development, in general, is concerned with data quality (“Garbage in, garbage out”: If you have bad data going into a system, you will end up with bad results coming out)
  • Examine the quality of the data going into the AI design and the overall system and model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe data formats used for AI development:

A
  1. Structured and unstructured

*Structured or labeled data is usually data that can go into a spreadsheet with rows and categories
* Unstructured or unlabeled/uncategorized data may need to be structured to be put into a model (ex. a large data set that is just a collection of images)

  1. Static and streaming
  • Static data does not change (ex. historical data such as records of past sales)
    *Streaming data will change (ex. data about customers visiting a website that changes every time they visit)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is data wrangling?

A

It involves taking raw data and converting it to valuable information (most raw data is not usable, it needs to be formatted a certain way to be used in the system).

It is an important step to ensure good output.

Time consuming (about 80 % of the AI system development life cycle)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the 5 V’s data preparation?

A
  1. Volume
    * How much data do you have?
    * How large is the data set or data sets that you’re going to be using? This is
    necessary to understand how much preparation you’re going to need to do
  2. Velocity
    * How often does it get updated?
    * Does it regularly change?
  3. Variety
    * What type of data is it?
    * Is it structured, unstructured or another type of data?
  4. Veracity
    * How accurate is it?
    * How trustworthy is it?
    * Did you get it from a source that you know is reliable, so you don’t have to worry
    that the data might not be correct?
  5. Value
    * What is the outcome that you want from the use of the AI system?
    * Will the data get you there?
    * Is it the right data to use?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the steps of data wrangling or data preparation phase?

A
  1. Cleansing
    * Remove erroneous or irrelevant data from the data sets
    * Some of the data may not be needed for the AI system and should be eliminated
    * Also remove inaccurate data
    * If personal data is in the data sets and is not needed for the AI model, remove it so it will not cause privacy issues later
  2. Labeling includes tagging or annotating the data to identify what kind of data it is
  3. Anonymization
    * One method for protecting privacy that involves removing identifiers from the data: name, SIN, phone number, address, or other PI that can identify an individual
    * Completely anonymizing data is difficult because individuals can be identified in many ways and combining data sets can potentially reidentify them
  4. Data Minimization
    The concept that if you do not need the data for your specific application, you should not use it to train your model or use it as input
    * Once again, for privacy, not including personal data will make the system more protective of the individual’s privacy
  5. Privacy enchaining technologies (PETs)
    * Differential privacy
    * Federated learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe privacy enhancing technologies (PETs):

A
  1. Differential privacy
    * Blurs the data by using an algorithm that keeps the data meaningful but makes it
    nonspecific
    * Individuals are unidentifiable but the data is still usable
  2. Federated learning
    * A new way to train models/machine learning method that does not require sharing sensitive data among different locations
    * The global model is in a central location; e.g., the cloud
    * Different locations download the global model and train it on their own local data
    * Only the updates of the local model, not the training data itself, are sent to the
    central location where they are aggregated into the global model * The process is iterated until the global model is fully trained
    * A great way to potentially solve problems, such as diagnosing a new illness - using data from different locations where they might have seen symptoms of the illness
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How is the AI system architecture determined?

A

When selecting the model, choose an algorithm according to the desired level of accuracy and interpretability of the data.

Questions:
* What do you want to learn from the data?
* How is it going to help you solve your problem?
* What are the other requirements and constraints?

Examples:
* Do you have a time constraint for completing the model? How does that
impact the available training time?
* Are additional efforts needed to ensure the data is completely accurate?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does the AI system development phase consist of?

A

Building the model.

17
Q

How are the features of an AI model defined?

A
  1. Work - with subject matter experts to select features
    * A feature is a specific measurable aspect or characteristic, such as height, color or substance
    * Feature engineering involves identifying the set of features most important for the analysis being done (Ex. in calculating a credit score, it is not important to know a person’s height but it may be important to know their age)
  2. Use - the same features for training and testing the model to avoid inconsistencies between the two
  3. Avoid - unnecessary features that are not needed (Unnecessary features make testing more difficult and waste money/resources)
18
Q

Describe the purposes of effective feature engineering:

A
  1. Improving model performance: Improving AI model or pipeline performance is the most important purpose
    * Data scientists attempt to derive and structure datasets so a model can optimally learn the relationships of a feature to targets
    * Goal: curating and creating a subset of features providing the greatest predictive power for an AI model
  2. Reducing computational costs
    * Decreasing computational and storage costs of models and improving latency for training models and making predictions. Reduced cost is due to fewer computational requirements
    * Computational effectiveness is improved through:
    -Reducing the number of features, and thus the amount of data, to process and
    store for training
    - Reducing the number of features and data in an API call
    - Ensuring the data is valuable and provides predictive power for a model, which
    increases its usefulness to users and value for the business
    - Write once, serve twice: well-written feature definitions that are versioned and
    tested can be mirrored for both training and serving
    - Snapshotting a model’s business logic and definitions for future users and
    developers
  3. Boosting model explainability
    * Model explainability/interpretability: degree to which someone can consistently predict a model’s result; highly valuable and required in many AI use cases
    * Essential to help ensure fairness, privacy, reliability, robustness, causality and trust. In other words, it affects situations where models can significantly impact users and the larger society, directly or indirectly
19
Q

What is feature engineering?

A

Transforming data into useful representations (features).

20
Q

What is is required for model training, testing and validation?

A

Representational subsets of your original dataset.

  1. Training data - Used to train the machine learning model
  2. Test data - Used to test the performance of the machine learning model

Both should include all types of data used in the original dataset or to be used in the final product

21
Q

Describe model training, testing and validation:

A

Training:
* Train, test, evaluate and retrain different models to determine what the best model is to use
* Determine the best settings to achieve the desired outcome for your AI system (Iterative: fine tuning different models to help ensure the best possible outcome)

Testing and validation:
* Test models on relevant evaluation metrics for consistent and expected performance within identified metrics
* Based on previously developed metrics determined system requirements
* Develop metrics to determine how to evaluate that requirements were met
* Test on new data
* Helps to ensure your models generalize well and meet your business goals
overall

22
Q

What are the requirements of the implementation phase?

A

Continuous monitoring for:

  1. Deviations in accuracy
  2. Irregular decisions
  3. Drifts in data that might affect the performance of the model
23
Q

What does model deployment involve?

A

Involves transitioning from a development and testing environment to a real-world, operational setting to be used for its intended purpose (e.g., make business predictions based on customer data).

24
Q

What are model deployment requirements?

A

Deployment requirements vary based on many factors, including model type and proposed use case. Key considerations include:

  1. Choosing a deployment environment (ex. the infrastructure or platform for the model)
  2. Packaging the model
  3. Making the model accessible for real-world use (also called exposing the model)

The best option for your organization depends on many factors, including budget, IT expertise and resources, the model’s purpose and computational needs, and the type of data the model processes.

25
Q

What are the three most popular deployment environments?

A
  1. Cloud-based: a third-party cloud provider hosts the model and handles infrastructure. This option is easy to scale up or down and reduces the need to invest in hardware; however, there may be latency and security risks due to a third party handling the data.
  2. On-premise: hosting the model on servers and hardware owned and managed by your organization. Offers greater control over deployment infrastructure (especially important if you handle sensitive data or are in a regulated sector); however, it may require a greater up-front investment in hardware compared to cloud-based deployment.
  3. Edge: hosting the model on “edge” devices like smartphones. This option may have decreased latency and greater data privacy; however, the model may be limited by the edge device’s hardware, thereby limiting the model’s computational power.
26
Q

What is a common option for packaging the model?

A

“Containerization,” which involves packaging the model and dependencies (ex. everything the model needs to run effectively) into a self-contained unit. Containers can help reduce compatibility issues and make it easier to deploy the model in different environments (ex. development or testing).

27
Q

What are the options for making the model accessible for real world use?

A

Many options exist, including REST APIs and embedding into an application.

28
Q

What can occur after AI model implementation?

A

Changes to the model:
* Over time, the model could change due to data changes
* Because of the complexity of the environment in which it is implemented and the potential for data to change as the model is used, monitor and maintain the model to avoid model drift
* Continue to iterate the model to improve performance as the data changes
* Define a baseline to measure future iterations of the model as you iterate it
* AI systems potentially require more attention than other types of systems