10: Sources of Data, Big Data and Machine Learning Flashcards

1
Q

What is structured data

A

Structured data refers to digital information that is organized and easily accessible. It is presented in a tidy format, making it convenient to find the necessary information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is unstructured data

A

Unstructured data refers to information that is not organized, such as paper records. Unstructured data is often messy, and the required information may be scattered throughout the records. It may contain irrelevant or unnecessary information, also known as “fluff.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Give an example of structured data.

A

Laboratory results, such as blood test results, are examples of structured data. They are presented in a format that is easily interpretable and can be analyzed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Give an example of unstructured data.

A

Imaging scans, such as MRI or X-ray images, are examples of unstructured data. The information in these images needs to be processed and analyzed to extract the relevant information. Unlike structured data, which can be understood by directly looking at it, imaging scans require interpretation by medical professionals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is genetic data

A

Genetic data refers to data related to an individual’s genes, such as DNA data. It is usually structured and follows specific formats and standards, allowing for easy storage and analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is routine data

A

Routine data refers to data collected regularly for record-keeping purposes but may not be immediately utilized for analysis or decision-making. It serves as a historical record and may become useful for research or analysis in the future.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is data linkage

A

Data linkage involves combining different sets of data from various sources to create a comprehensive view or understanding. It can involve linking data from different healthcare settings, such as general practitioner (GP) consultations with surgery records.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are clinical coding systems

A

Clinical coding systems are used to classify medical conditions, procedures, and other healthcare-related information into standardized codes. They help in organizing and categorizing data consistently across different healthcare settings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the primary coding systems used in the UK

A

The primary coding systems used in the UK are Read codes, International Classification of Diseases (ICD) codes, and SNOMED CT. Read codes are primarily used in primary care, ICD codes are mainly used in secondary care, and SNOMED CT is a comprehensive clinical terminology system used in primary care.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is machine learning

A

Machine learning refers to the use of artificial intelligence (AI) techniques that enable computers to learn from data without being explicitly programmed. It involves training models on large datasets and allowing them to learn patterns and make predictions or decisions based on the learned knowledge.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is deep learning?

A

Deep learning is a subset of machine learning that focuses on training multi-layer neural networks. It enables the computation of complex models by leveraging hierarchical representations of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is artificial intelligence (AI)

A

Artificial intelligence refers to any program or system that enables computers to imitate human behavior, such as problem-solving, learning, and decision-making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is supervised learning

A

Supervised learning is a machine learning approach where the model is provided with labeled training data that includes both input features and corresponding output labels. The goal is to learn a mapping function that can predict the output labels for new, unseen data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is regression in supervised learning?

A

Regression is a type of supervised learning where the machine learning model uses algorithms and data to predict a continuous numerical value. For example, predicting the temperature for tomorrow based on historical weather data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is classification in supervised learning

A

Classification is a type of supervised learning where the outcome is a label or category. The machine learning model learns to assign input data to specific classes. For example, determining whether an email is spam or not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is unsupervised learning

A

Unsupervised learning is a machine learning approach where the model learns patterns and structures in unlabeled data without explicit output labels. The goal is to discover hidden relationships or clusters in the data.

17
Q

What are some examples of unsupervised learning tasks

A

Clustering, dimensionality reduction, and anomaly detection are common examples of unsupervised learning tasks. Clustering aims to group similar data points together, dimensionality reduction reduces the number of input features while preserving essential information, and anomaly detection identifies unusual or abnormal data points.

18
Q

What is model evaluation in machine learning

A

Model evaluation involves assessing the performance and quality of a trained machine learning model on unseen data. It helps determine how well the model generalizes to new data and whether it achieves the desired accuracy or other performance metrics.

19
Q

What is training data in model evaluation

A

Training data refers to the labeled dataset used to train the machine learning model. It is the initial input provided to the model, and the model learns patterns and relationships from this data during the training process.

20
Q

What is testing data in model evaluation

A

Testing data, also known as a holdout sample, is a separate set of data used to assess the performance of a trained machine learning model. It consists of unseen data that the model did not encounter during training, allowing evaluation of its ability to generalize and make accurate predictions.