10: Sources of Data, Big Data and Machine Learning Flashcards

Question 1

Q

What is structured data

Answer

A

Structured data refers to digital information that is organized and easily accessible. It is presented in a tidy format, making it convenient to find the necessary information.

Question 2

Q

What is unstructured data

Answer

A

Unstructured data refers to information that is not organized, such as paper records. Unstructured data is often messy, and the required information may be scattered throughout the records. It may contain irrelevant or unnecessary information, also known as “fluff.”

Question 3

Q

Give an example of structured data.

Answer

A

Laboratory results, such as blood test results, are examples of structured data. They are presented in a format that is easily interpretable and can be analyzed.

Question 4

Q

Give an example of unstructured data.

Answer

A

Imaging scans, such as MRI or X-ray images, are examples of unstructured data. The information in these images needs to be processed and analyzed to extract the relevant information. Unlike structured data, which can be understood by directly looking at it, imaging scans require interpretation by medical professionals.

Question 5

Q

What is genetic data

Answer

A

Genetic data refers to data related to an individual’s genes, such as DNA data. It is usually structured and follows specific formats and standards, allowing for easy storage and analysis.

Question 6

Q

What is routine data

Answer

A

Routine data refers to data collected regularly for record-keeping purposes but may not be immediately utilized for analysis or decision-making. It serves as a historical record and may become useful for research or analysis in the future.

Question 7

Q

What is data linkage

Answer

A

Data linkage involves combining different sets of data from various sources to create a comprehensive view or understanding. It can involve linking data from different healthcare settings, such as general practitioner (GP) consultations with surgery records.

Question 8

Q

What are clinical coding systems

Answer

A

Clinical coding systems are used to classify medical conditions, procedures, and other healthcare-related information into standardized codes. They help in organizing and categorizing data consistently across different healthcare settings.

Question 9

Q

What are the primary coding systems used in the UK

Answer

A

The primary coding systems used in the UK are Read codes, International Classification of Diseases (ICD) codes, and SNOMED CT. Read codes are primarily used in primary care, ICD codes are mainly used in secondary care, and SNOMED CT is a comprehensive clinical terminology system used in primary care.

Question 10

Q

What is machine learning

Answer

A

Machine learning refers to the use of artificial intelligence (AI) techniques that enable computers to learn from data without being explicitly programmed. It involves training models on large datasets and allowing them to learn patterns and make predictions or decisions based on the learned knowledge.

Question 11

Q

What is deep learning?

Answer

A

Deep learning is a subset of machine learning that focuses on training multi-layer neural networks. It enables the computation of complex models by leveraging hierarchical representations of data.

Question 12

Q

What is artificial intelligence (AI)

Answer

A

Artificial intelligence refers to any program or system that enables computers to imitate human behavior, such as problem-solving, learning, and decision-making.

Question 13

Q

What is supervised learning

Answer

A

Supervised learning is a machine learning approach where the model is provided with labeled training data that includes both input features and corresponding output labels. The goal is to learn a mapping function that can predict the output labels for new, unseen data.

Question 14

Q

What is regression in supervised learning?

Answer

A

Regression is a type of supervised learning where the machine learning model uses algorithms and data to predict a continuous numerical value. For example, predicting the temperature for tomorrow based on historical weather data.

Question 15

Q

What is classification in supervised learning

Answer

A

Classification is a type of supervised learning where the outcome is a label or category. The machine learning model learns to assign input data to specific classes. For example, determining whether an email is spam or not.

Question 16

Q

What is unsupervised learning

Answer

Study These Flashcards

A

Unsupervised learning is a machine learning approach where the model learns patterns and structures in unlabeled data without explicit output labels. The goal is to discover hidden relationships or clusters in the data.

Question 17

Q

What are some examples of unsupervised learning tasks

Answer

Study These Flashcards

A

Clustering, dimensionality reduction, and anomaly detection are common examples of unsupervised learning tasks. Clustering aims to group similar data points together, dimensionality reduction reduces the number of input features while preserving essential information, and anomaly detection identifies unusual or abnormal data points.

Question 18

Q

What is model evaluation in machine learning

Answer

Study These Flashcards

A

Model evaluation involves assessing the performance and quality of a trained machine learning model on unseen data. It helps determine how well the model generalizes to new data and whether it achieves the desired accuracy or other performance metrics.

Question 19

Q

What is training data in model evaluation

Answer

Study These Flashcards

A

Training data refers to the labeled dataset used to train the machine learning model. It is the initial input provided to the model, and the model learns patterns and relationships from this data during the training process.

Question 20

Q

What is testing data in model evaluation

Answer

Study These Flashcards

A

Testing data, also known as a holdout sample, is a separate set of data used to assess the performance of a trained machine learning model. It consists of unseen data that the model did not encounter during training, allowing evaluation of its ability to generalize and make accurate predictions.

10: Sources of Data, Big Data and Machine Learning Flashcards

(20 cards)