10: Sources of Data, Big Data and Machine Learning Flashcards
What is structured data
Structured data refers to digital information that is organized and easily accessible. It is presented in a tidy format, making it convenient to find the necessary information.
What is unstructured data
Unstructured data refers to information that is not organized, such as paper records. Unstructured data is often messy, and the required information may be scattered throughout the records. It may contain irrelevant or unnecessary information, also known as “fluff.”
Give an example of structured data.
Laboratory results, such as blood test results, are examples of structured data. They are presented in a format that is easily interpretable and can be analyzed.
Give an example of unstructured data.
Imaging scans, such as MRI or X-ray images, are examples of unstructured data. The information in these images needs to be processed and analyzed to extract the relevant information. Unlike structured data, which can be understood by directly looking at it, imaging scans require interpretation by medical professionals.
What is genetic data
Genetic data refers to data related to an individual’s genes, such as DNA data. It is usually structured and follows specific formats and standards, allowing for easy storage and analysis.
What is routine data
Routine data refers to data collected regularly for record-keeping purposes but may not be immediately utilized for analysis or decision-making. It serves as a historical record and may become useful for research or analysis in the future.
What is data linkage
Data linkage involves combining different sets of data from various sources to create a comprehensive view or understanding. It can involve linking data from different healthcare settings, such as general practitioner (GP) consultations with surgery records.
What are clinical coding systems
Clinical coding systems are used to classify medical conditions, procedures, and other healthcare-related information into standardized codes. They help in organizing and categorizing data consistently across different healthcare settings.
What are the primary coding systems used in the UK
The primary coding systems used in the UK are Read codes, International Classification of Diseases (ICD) codes, and SNOMED CT. Read codes are primarily used in primary care, ICD codes are mainly used in secondary care, and SNOMED CT is a comprehensive clinical terminology system used in primary care.
What is machine learning
Machine learning refers to the use of artificial intelligence (AI) techniques that enable computers to learn from data without being explicitly programmed. It involves training models on large datasets and allowing them to learn patterns and make predictions or decisions based on the learned knowledge.
What is deep learning?
Deep learning is a subset of machine learning that focuses on training multi-layer neural networks. It enables the computation of complex models by leveraging hierarchical representations of data.
What is artificial intelligence (AI)
Artificial intelligence refers to any program or system that enables computers to imitate human behavior, such as problem-solving, learning, and decision-making.
What is supervised learning
Supervised learning is a machine learning approach where the model is provided with labeled training data that includes both input features and corresponding output labels. The goal is to learn a mapping function that can predict the output labels for new, unseen data.
What is regression in supervised learning?
Regression is a type of supervised learning where the machine learning model uses algorithms and data to predict a continuous numerical value. For example, predicting the temperature for tomorrow based on historical weather data.
What is classification in supervised learning
Classification is a type of supervised learning where the outcome is a label or category. The machine learning model learns to assign input data to specific classes. For example, determining whether an email is spam or not.