Artificial intelligence Flashcards

Question 1

Q

Humans vs computers

Answer

A

Humans can solve problems quickly but take a long time to understand complex problem domains which we achieve through studying and building on an existing foundation of understanding.
Computers don’t have this similar foundation but instead can quickly learn a problem domain to solve very specific problems much faster than a human can.

Question 2

Q

What defines AI?

Answer

A

A solution may be considered intelligent if the computer-generated solution solves a complex problem that if a human were to carry out would be considered intelligent.
Intelligence exhibited by machines where a device will use information on its surroundings or a problem to maximize its chance of succeeding at a specific task.

Question 3

Q

Binary classification

Answer

A

Predict one of two classes (can all be expressed as 1 or 0)
• Yes/No
• Buy/Sell
• Healthy/Unhealthy

Question 4

Q

Multi-class classification

Answer

A

Predicting the presence of more than 1 class in some dataset.
The example below shows MRI training data (left) and brain tumour targets (right).
The system would learn to predict anything from 0 to 3 classes here depending on the presence of tumour and contained sub-tissues.

Question 5

Q

Types of AI algorithm

Answer

A

SUPERVISED
• The two previous examples of tabular data and brain tumour data with expert masks would be considered supervised datasets.
• This is because we have both the input data and the target data. i.e. we are helping the AI algorithm by providing the intuition to map directly between the input and the target.
• Supervised AI datasets take a long time to prepare and often require multiple experts to rate each sample.
• This expense results in a high accuracy

Unsupervised AI

Unsupervised AI is the idea that we have some input dataset, but we do not provide a target
The intention here is to allow the AI to create its own separation and learn to classify between the classes it defines.
These are often much more sophisticated algorithms, but they are more prone to training failures and classification errors.
Unsupervised AI can be very useful if we have a huge amount of information but aren’t sure what the exact answer is that we are looking for.

Question 6

Q

What is machine learning?

Answer

A

Machine learning is a branch of artificial intelligence which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy (IBM).
Enabling a computer to learn from data and improve upon itself without being explicitly told how to do so (Arthur Samuel).
Where artificial intelligence is the field, machine learning is a smaller area within artificial intelligence which focuses on applying specific algorithms that allow the computer to learn information without being told how to do so.

Question 7

Q

How do we train a machine learning algorithm

Answer

A

decision process = given some input data and target data - the algorithm will look at the info and identify some pattern within the input data from which it will make a target data prediction

error calculation = the machine learning algorithm prediciton is compared against the target and we calculate some error

optimisation = the error value is used to tweak the settings of the algorithm

How do we train a machine learning algorithm?
• We repeat these steps (decision process, error calculation, optimisation) until the machine learning algorithm is making decisions that result in a low error.
• This low error then results in a reduction in the amount that the optimiser updates the machine learning algorithm.
• The reduction in changes to the machine learning algorithm then results in a steady state of decision making. CALLED CONVERGENCE

Question 8

Q

CONVERGENCE

Answer

A

• The reduction in changes to the machine learning algorithm then results in a steady state of decision making.

Question 9

Q

K-means

Answer

A

machine learning model

Uses training data to evaluate how data can be clustered based on the target class.
K = is the number of clusters that we want to generate, not necessarily the number of classes.
This technique is useful after performing data visualisation techniques (plotting, PCA) to see how the information clusters.

Question 10

Q

Support vector machine

Answer

A

machine learning model

Linear separation Is where you draw a line between the 2 groups 
-	Learns to fit a separating hyper-plane between data features where the predicted class depends on which side of the plane the data falls

support vector machine can operate in dimensions we cant visualise

Question 11

Q

Decision tree

Answer

A

decision.

Prediction is made when a leaf node is reached.
Multiple decision trees can be created and their predictions averaged in a popular machine learning algorithm called Random Forest.
The idea here is that combining votes from multiple trees is better than a single opinion.
The basic idea of a decision tree is that we make yes/no decisions about each feature until we reach a target prediction.
In the example to the right these questions in orange relate to features of our input data. During the training stage the questions asked at each decision node are tuned to an optimal threshold that gives us a good result.
The intuition behind this can be compared to human decision making. For example, if we are determining whether to go for a walk we might ask ourselves questions on the weather, whether it’s cold, whether we’re tired, if we need to prioritise something over walking…
Decision trees are a really good starting point in machine learning as we can visualize their structures and understand how they have solved a problem, as opposed to being a black box like a lot of artificial intelligence algorithms.

Question 12

Q

leaf node

Answer

A

Prediction is made when a leaf node is reached

Question 13

Q

Random Forest.

Answer

A

Multiple decision trees can be created and their predictions averaged in a popular machine learning algorithm called Random Forest.

Question 14

Q

PRE-PROCESSING – how to treat dataset before machine learning model sees it - 5 steps

Answer

A

Remove redundant information (e.g. height and weight when we also have bmi)
Remove correlated information (e.g. if we were to keep height we could delete height in foot as it is the same information as height but on a different scale)
Handle missing data (sample 4 is missing feature 9 resting_heart_rate. This could be removed entirely if the missing variable is critical or we could handle another value. E.g. the average of this column across all samples)
Handle categorical data (We often want numerical information for our machine learning models so convert binary categoricals e.g. smoker to 0 and 1)
Standardisation (Machine Learning models like data to be normalised so that no one feature dominates the error calculation. Below we have implemented z-score normalisation but we can also range between [-1, 1] and [0, 1] or any other range)
Pre-processing is not limited to these above steps, nor do we have to implement any of the above if we have reason to believe they will impact the performance of our algorithm.

Question 15

Q

Validation – check whether or not out machine learning model is performing to a standard which we can use in the real world

Answer

A

• If we have a total of 1000 samples available, we don’t want to use all of these to train the model 1)
• Instead we should leave some of this data out of training so that we can evaluate how our model is performing throughout training 2)
• To do this we would first train the model on the 750 training samples calculate the error and update the algorithm parameters
• We would then get the 250 validation samples and apply these to the model and calculate the error.
§ This validation error represents how our algorithm performs on samples whose targets it hasn’t been trained on
§ This better represents how our trained algorithm might work on unseen data from different sources.
• We then unlock the model and re-apply the training data and continue repeating these steps until we finish training.

Question 16

Q

Training a machine learning algorithm

5 steps

Answer

A

1) We pre-process our dataset, for example:
• Convert text to numerical data where possible.
• Standardise to a particular range to avoid individual features dominating the error.
• Remove redundant information.
• Partition the dataset into training and validation
2) We use the training dataset on the machine learning model which predicts the labels associated with the samples we have given it.
3) We calculate the error of this decision.
4) We optimise the machine learning model parameters by adjusting the settings proportional to the error such that a larger error causes our model to be adjusted more, and a smaller error causes our model to be adjusted less.
5) We apply our model to the validation dataset that it hasn’t been optimised on to test how well it generalises to unseen information.
Repeat steps 2 to 4

Question 17

Q

Advantages of Artificial Intelligence

Answer

A

• Efficiency
– After an AI algorithm is trained they require few computational resources to work on real world data. The least efficient stage is during training.
– AI can run on datasets during times when humans aren’t working, or are working on other tasks leading to time/cost saving
• Error reduction
– Providing the algorithm is trained correctly they will provide repeatable performance and will only make mistakes if the information they see is very different from what they are trained on.
• Productivity
– A lot of smaller jobs can be taken from humans and given to AI to solve allowing us to think about bigger issues in our jobs/the world
• Transferability
– AI is similar to a toolbox of algorithms that can operate across a wide variety of problems and industries
– Learning how a few of these work allows researchers to translate across workplaces and focus on the field of AI as opposed to thinking of themselves as tied to a specific career path

Question 18

Q

Disadvantages of Artificial Intelligence

Answer

A

Disadvantages
• Redundancy
– Not all jobs are going to allow workers to move on to larger problems and the sad reality is that we are already seeing redundancies in sectors where AI has really taken off
– Similar to the car replacing the horse, we are slowly moving towards a world in which automation and AI based robotics can replace certain workforces and costs will continually decline for such changes
– https://www.theverge.com/2018/5/8/17331250/automated-warehouses-jobs-ocado-andover-amazon
• Datasets
– Data requirements for efficient and usable AI that can be deployed in the real world are tremendous and getting access to existing data sources can be very difficult.
– Manual collection of datasets is both time consuming and expensive, and is often only possible in large scale research projects where ethical considerations take a long time for approval and can tie datasets to specific single use cases.
• Understanding
– One way around the previous point is to be a large company like Facebook, Google, or Amazon who have access to serious Big Data sets
– Users willingly hand over information in user agreements and think no more about how their personal information is incorporated into unseen datasets for AI training behind these companies servers
– A scarier thought is that even though we are aware that these companies use our information to train AI for unknown tasks, we can’t even begin to understand how a lot of AI techniques arrive at the answer we are looking for
• If AI gets to the point of sci-fi movies, we have very little to no chance of understanding it as it will likely be self-developing (AI can currently write basic computer code and it’s only going to get better)

Question 19

Q

AI in research

Answer

A

AI can look for pneumonia on x-rays better than radiologists
AI algorithm was better at predicting the lymph node metastases in breast cancer than the pathologist – the approach is called lymph node assistance
Diabetic retinopathy (diabetes complication that affects eyes) screening – screening early for this disease is hard because there are so many patients, limited man power, it is expensive, low adherence from patients – Research studies have been looking to make AI algorithms for Diabetic retinopathy – demonstrated cost effectiveness and good diagnosis (AI is very sensitive and specific for detecting diabetic retinopathy

Question 20

Q

How does AI help with clinical trial design and monitoring?

Answer

A

1/10 molecules entering into clinical trials gains successful clearance which is a massive loss for industry
These failures can result from inappropriate patient selection, shortage of technical requirements and poor infrastructure
Failures can be reduced with implementation of AI because it can assist in selecting a specific diseased population into phase 2 and 3 by using patient specific genome profile analysis
This can help in early prediction of the available growth targets in the patient selected as well as predicting lead compounds before the start of the clinical trial that can pass through the process on selected patient population

Question 21

Q

• Another big problem in clinical industry is patient drop out

Answer

A

Causes massive failure for clinical trials
This ca be avoided by close monitoring of patients and helping them follow the protocol
There has been a mobile software developed by AI cure that monitors regular medication intake by patients with schizophrenia
This app increased patient adherence by 25%