Module 1 Vocab Flashcards
Technical Vocabulary
Definition
Egeria
An open-source data governance platform for integrating and managing metadata across tools. It provides automated metadata exchange, compliance enforcement, and lineage tracking to ensure data quality and discovery.
Kylo
An open-source data lake management platform that simplifies the development of data ingestion pipelines. Built on Apache NiFi, it offers features like data quality monitoring and integrated security.
Atlas
Apache Atlas is an open-source metadata management and data governance tool. It integrates with the Hadoop ecosystem to support metadata classification, auditing, and lineage tracking.
Git
A distributed version control system widely used to manage changes in source code. It allows multiple developers to collaborate on a single project while maintaining version history.
GitLab
A comprehensive DevOps platform that extends Git with CI/CD pipelines, issue tracking, and access control. It supports both cloud-hosted and self-hosted deployments.
AI Fairness 360
An open-source toolkit designed to measure and mitigate bias in machine learning models. It provides fairness metrics and mitigation algorithms for equitable predictions.
AI Explainability 360
An open-source library offering tools to interpret and explain machine learning models. It includes methods for feature importance, rule-based explanations, and counterfactual reasoning.
Adversarial Robustness 360
A toolkit for testing and improving machine learning models against adversarial attacks. It includes defenses like adversarial training and preprocessing techniques.
Prometheus
An open-source monitoring system that collects and stores metrics, provides powerful querying capabilities, and supports alerting. Ideal for cloud-native environments.
ModelDB
A tool for tracking, versioning, and visualizing machine learning experiments. It integrates with frameworks like TensorFlow and PyTorch to ensure reproducibility.
Apache Spark
An open-source distributed computing system for batch processing. It provides scalability and efficiency in handling large datasets.
Apache Flink
An open-source platform for stream processing and real-time analytics. It is optimized for low-latency data streams.
Node-RED
A visual programming tool for connecting APIs, hardware, and services. It enables event-driven workflows through an intuitive drag-and-drop interface.
TensorFlow Lite
A lightweight version of TensorFlow designed for deploying machine learning models on mobile and embedded devices.
Apache Kafka
A distributed event-streaming platform for building real-time data pipelines and applications. It ensures fault tolerance and scalability.