10: Sources of Data, Big Data and Machine Learning Flashcards
What is structured data
Structured data refers to digital information that is organized and easily accessible. It is presented in a tidy format, making it convenient to find the necessary information.
What is unstructured data
Unstructured data refers to information that is not organized, such as paper records. Unstructured data is often messy, and the required information may be scattered throughout the records. It may contain irrelevant or unnecessary information, also known as “fluff.”
Give an example of structured data.
Laboratory results, such as blood test results, are examples of structured data. They are presented in a format that is easily interpretable and can be analyzed.
Give an example of unstructured data.
Imaging scans, such as MRI or X-ray images, are examples of unstructured data. The information in these images needs to be processed and analyzed to extract the relevant information. Unlike structured data, which can be understood by directly looking at it, imaging scans require interpretation by medical professionals.
What is genetic data
Genetic data refers to data related to an individual’s genes, such as DNA data. It is usually structured and follows specific formats and standards, allowing for easy storage and analysis.
What is routine data
Routine data refers to data collected regularly for record-keeping purposes but may not be immediately utilized for analysis or decision-making. It serves as a historical record and may become useful for research or analysis in the future.
What is data linkage
Data linkage involves combining different sets of data from various sources to create a comprehensive view or understanding. It can involve linking data from different healthcare settings, such as general practitioner (GP) consultations with surgery records.
What are clinical coding systems
Clinical coding systems are used to classify medical conditions, procedures, and other healthcare-related information into standardized codes. They help in organizing and categorizing data consistently across different healthcare settings.
What are the primary coding systems used in the UK
The primary coding systems used in the UK are Read codes, International Classification of Diseases (ICD) codes, and SNOMED CT. Read codes are primarily used in primary care, ICD codes are mainly used in secondary care, and SNOMED CT is a comprehensive clinical terminology system used in primary care.
What is machine learning
Machine learning refers to the use of artificial intelligence (AI) techniques that enable computers to learn from data without being explicitly programmed. It involves training models on large datasets and allowing them to learn patterns and make predictions or decisions based on the learned knowledge.
What is deep learning?
Deep learning is a subset of machine learning that focuses on training multi-layer neural networks. It enables the computation of complex models by leveraging hierarchical representations of data.
What is artificial intelligence (AI)
Artificial intelligence refers to any program or system that enables computers to imitate human behavior, such as problem-solving, learning, and decision-making.
What is supervised learning
Supervised learning is a machine learning approach where the model is provided with labeled training data that includes both input features and corresponding output labels. The goal is to learn a mapping function that can predict the output labels for new, unseen data.
What is regression in supervised learning?
Regression is a type of supervised learning where the machine learning model uses algorithms and data to predict a continuous numerical value. For example, predicting the temperature for tomorrow based on historical weather data.
What is classification in supervised learning
Classification is a type of supervised learning where the outcome is a label or category. The machine learning model learns to assign input data to specific classes. For example, determining whether an email is spam or not.
What is unsupervised learning
Unsupervised learning is a machine learning approach where the model learns patterns and structures in unlabeled data without explicit output labels. The goal is to discover hidden relationships or clusters in the data.
What are some examples of unsupervised learning tasks
Clustering, dimensionality reduction, and anomaly detection are common examples of unsupervised learning tasks. Clustering aims to group similar data points together, dimensionality reduction reduces the number of input features while preserving essential information, and anomaly detection identifies unusual or abnormal data points.
What is model evaluation in machine learning
Model evaluation involves assessing the performance and quality of a trained machine learning model on unseen data. It helps determine how well the model generalizes to new data and whether it achieves the desired accuracy or other performance metrics.
What is training data in model evaluation
Training data refers to the labeled dataset used to train the machine learning model. It is the initial input provided to the model, and the model learns patterns and relationships from this data during the training process.
What is testing data in model evaluation
Testing data, also known as a holdout sample, is a separate set of data used to assess the performance of a trained machine learning model. It consists of unseen data that the model did not encounter during training, allowing evaluation of its ability to generalize and make accurate predictions.