Onboarding Term Glossary Flashcards by Aiden Kahn

Alert

An alert is the report of a potential problem that an integrated monitoring system has sent to BigPanda.

Monitoring tools generate events when potential problems are detected in your infrastructure. Over time status updates and repeat events may occur from the same system issue.In BigPanda, raw event data is merged into a singular alert so that you can visualize the life cycle of a detected issue over time.

For example, a CPU load alert may start with a warning event, then increase in severity with a critical event, and finally get resolved with a resolution event. All three of these events will be merged into a single alert.Common events that are sent to BigPanda include: “CPU > 95% for more than 5 minutes” and “Port X on Router ABC down”

BigPanda correlates related alerts into incidents for visibility into high-level, actionable problems.

NOTE: Some monitoringtools refer to ‘events’as ‘alarms’ or ‘alerts.’ In BigPanda documentation ‘alert’is always used to refer to the complete lifecycle of an event

How well did you know this?

Not at all

Perfectly

Alert Correlation (Pt 1)

Alert correlation is a process of grouping related alerts into a single, high-level incident. BigPanda uses pattern recognition to automatically process the data generated by your monitoring systems and to dynamically cluster alerts into meaningful, actionable incidents. BigPanda provides default correlation patterns as well as the option to tailor patterns to your organization.

BigPanda ingests the raw event data from monitoring systems such as Nagios, CloudWatch, and systems integrated via the Alerts API. The data is normalized into standard tags and enriched with configuration information, operational categories and other custom tags. Then, the BigPanda alert correlation engine merges the events into alerts and clusters the alerts into high-level, actionable incidents by evaluating the properties against patterns in:

Topology -The host, host group, service, application, cloud, or other infrastructure element that emits the alerts. Alerts are more likely to be related when they come from the same area in your infrastructure.

Time -The rate at which related alerts occur. Alerts occurring around the same time are more likely to be related than alerts occurring far apart.

Context -The type of alerts. Some alert types imply a relationship between them, while others don’t

How well did you know this?

Not at all

Perfectly

Alert Correlation (Pt 2)

As new alerts are received, BigPanda evaluates all matching patterns, and determines whether to update an existing incident or create a new incident. With this powerful algorithm, BigPanda can effectively and accurately correlate alerts to dramatically reduce your monitoring noise by as much as 90 –99% in some environments. Correlations occur in under 100ms so you see updates in real time for maximum visibility into critical problems.

You can customize correlation patterns to tailor alert correlation to the specifics of your infrastructure. Learn more about customizing alert correlation in the Managing Correlation Patterns documentation.

Understanding how BigPanda determines which events are correlated into an alert and which alerts are grouped together into incidents can help you configure and use BigPanda more effectively. Particularly if you are using theAlerts REST APIto develop a custom integration or the correlation editor to modify a correlation pattern. Learn more about the way BigPanda correlates alerts together in the Alert Correlation Logic documentation

How well did you know this?

Not at all

Perfectly

Agile

Agile is a software development philosophy defined by core iterative development. There are many agile methods, but most of them entail short engineering cycles that include all main stages: planning, development itself, testing, and deployment. Each cycle takes one or two weeks. The idea behind Agile is shipment of the product as quickly as possible and incrementally updating it based on customer feedback. Agile methods remain the mainstream in modern software development as they support product adaptivity to the constantly changing market and customer needs

How well did you know this?

Not at all

Perfectly

API

Application Program Interfaces(APIs) are software intermediary tools that allow applications to talk to each other. BigPanda has several APIs available that allow you to integrate with external tools and manage incidents and BigPanda elements in bulk. They are core tools for self-service driven customers, and empower custom solutions and deep 2-way integrations.

BigPanda API specifications can be found in the API Reference hub.

With each request to the BigPanda API, you must include an HTTP header with the authentication token for your organization. BigPanda APIs use two different types of authentication tokens, an organization-wide bearer token or a user-specific API Key.

The Alerts API builds a custom integration between BigPanda and your monitoring system. The Alerts API allows you to easily integrate a monitoring system with BigPanda. Monitoring systems generally send out events when problems are detected and when problems have been resolved (fixed)

How well did you know this?

Not at all

Perfectly

Artificial Intelligence (AI)

Also known as machine intelligence, artificial intelligence(AI) is the ability for machine systems to mimic human cognitive functions such as learning and problem solving. The goal of artificial intelligence is to create machines or programs that can work, react, and respond to complex situations.

For most business initiatives, the focus of artificial intelligence is to design programs that can develop and progress in a specific task without using explicit instructions, allowing the program to rely on patterns and inference instead. Machine learning allows for a machine or program to develop and create a solution on its own, once limitations and standards are set, rather than simply following programing.

BigPanda’s Open Box Machine Learning combines the power of AI with transparency and customization through explainable AI. With BigPanda Open Box Machine Learning, the logic is explained to IT Operations teams in plain English. Teams can then edit this logic to add situational and tribal knowledge to strengthen it on their own, without requiring expert data scientists. From there, teams can test and run what-if experiments on real live production data to make sure their changes work as intended, before deploying them, promoting higher trust and adoption of machine learning throughout the organization.

The BigPanda Machine Learning Engine runs during alert correlation to suggest patterns that may improve correlation and during root cause analysis to highlight potential root causes of incidents

How well did you know this?

Not at all

Perfectly

BigPanda Dashboards

Provide easy-to-read operational health metrics in a consolidated view. Ideal for NOC displays and status monitoring, each Dashboard is made up of a series of widgets showing color-coded key information on incident severity and status.
Each widget shows information for a single environment, making it easy to track incident metrics by region, team, or infrastructure types. For example, you might have environments for each business service so that you can track metrics on each separately

How well did you know this?

Not at all

Perfectly

Description

Each monitoring tool is configured to send specific data in the description field of the event payload. This description data will be included with alerts and appear in incident details.For many integrations, the default description can be configured to include additional information. See the specific Integration instructions on the documentation site or in BigPanda for information about configuring the description field.

NOTE: Description is a reserved system word within BigPanda and cannot be changed or redefined for use in custom enrichment. When sending description fields to BigPanda ensure that description is lowercase only

How well did you know this?

Not at all

Perfectly

Environment

Environments group related incidents together for improved automation and visibility.

Environments filter incidents on properties such as source and priority and group them together for easy visibility and action. Environments make it easy for your team to focus on the incidents relevant to their role and responsibilities.

BigPanda’s default environment is the All Incidents Environment. This environment includes every incident in BigPanda with no filter or limitations.Environments can be used to filter the Incident Feed, define AutoShare rules, create Dashboards, and view specific Analytics. Learn more about how environments enable BigPanda’s automation and advanced tools in the Environments documentation.

Your BigPanda environment groups can be customized to better fit the organizational structure and processes of your organization. Create, edit, or delete environment groups to help your teams stay focused on the most relevant information to them. Environment groups are managed from the Environments pane. Learn more in the Managing Environments documentation

How well did you know this?

Not at all

Perfectly

Event

Monitoring tools generate events when potential problems are detected in your infrastructure. Over time status updates and repeat events may occur from the same system issue. In BigPanda, raw event data is merged into a singular alert so that you can visualize the life cycle of a detected issue over time.

For example, a CPU load alert may start with a warning event, then increase in severity with a critical event, and finally get resolved with a resolution event. All three of these events will be merged into a single alert. Common events that are sent to BigPanda include: “CPU > 95% for more than 5 minutes” and “Port X on Router ABC down”

BigPanda correlates related alerts into incidents for visibility into high-level, actionable incidents.

NOTE: Some monitoring tools refer to events as ‘alarms’ or ‘alerts.’ In BigPanda documentation ‘alert’ is always used to refer to the complete lifecycle of an event.

How well did you know this?

Not at all

Perfectly

Flapping

Flapping occurs when a monitored object (ie: a service or host) changes state too frequently, making the cause and severity of the incident unclear. Flapping can be indicative of configuration problems (ie: thresholds set too low), troublesome services or real network problems.

When an alert changes states frequently, it may generate numerous events that are not immediately actionable.

In BigPanda, an incident enters the flapping state when one or more of the related alerts are flapping. By default, an alert is considered to be flapping when it has changed states more than 4 times in one hour. Contact BigPanda support if you need to configure custom logic (number of state changes within a period of time) for your organization or for a specific integration.

When an incident enters the flapping state, all subscribed users are notified and no additional state change notifications are sent. Subscribed users still receive a daily email reminding them about the incident. An incident exits the flapping state when all related alerts stop flapping (no longer meet the criteria for number of state changes in a period of time). BigPanda checks the flapping criteria every 15 minutes

How well did you know this?

Not at all

Perfectly

Incident

An incident is essentially an unplanned interruption to an IT service or reduction in the quality of an IT service. It represents a high-level issue in your infrastructure. In BigPanda, incidents are created automatically by grouping together related alerts from your monitoring tools.

A single production issue often manifests itself in multiple alerts. For example, a disk issue can trigger a disk IO alert that, in turn, triggers a series of CPU, memory, database, and application alerts. Additionally, each alert may change as an issue progresses. An alert may start as a warning, and then increase in severity to a critical status. In these cases, diagnosing and fixing the issue requires up-to-date information from multiple sources, which is very difficult to gather and maintain manually.

BigPanda digests all of the raw data from your integrated monitoring systems and automatically correlates this complex data into single issue incidents, which gives you the visibility you need to investigate and resolve issues quickly.

All active and recently resolved incidents appear on the Incidents tab, where you can manage incidents through the operations workflow with BigPanda as your unified console. You can also escalate incidents through external ticketing and/or collaboration systems—manually as needed, or automatically as a smart ticketing solution—and BigPanda will keep the external systems up to date with the latest information.

The life cycle of an incident is defined by the life cycle of the alerts it contains. The incident feed provides a consolidated view of all active incidents from any integrated monitoring systems. After you’ve configured your integrations, you can use the incident feed to manage your incidents. The Incidents API allows you to manage BigPanda incidents externally, and can be configured with external ticketing and monitoring tools. It provides the Incidents object, which represents a BigPanda incident containing correlated alerts from your integrated monitoring systems

How well did you know this?

Not at all

Perfectly

Incident_identifier

During alert correlation, BigPanda assigns correlated events an incident identifier. This id is used throughout the BigPanda system to recognize if two events are related to each other. Incident identifiers are created from the tags and event data sent to BigPanda for each event. By default, the incident identifier is a combination of the event’s host and check but it could be other fields depending on the properties of the correlating alerts The incident_identifier may also be called the incident_key.

NOTE: Incident-identifier is a reserved system word within BigPanda and cannot be changed or redefined for use in custom enrichment. When sending incident_identifier fields to BigPanda ensure that incident_identifier is lowercase only

How well did you know this?

Not at all

Perfectly

Machine Learning

Machine learning is an important element of artificial intelligence. Machine learning focuses on the ability of a program to develop and progress in a specific task without using explicit instructions, allowing the program to rely on patterns and inference instead. Machine learning allows for a machine or program to develop and create a solution on its own once limitations and standards are set, rather than simply following programing.

BigPanda’s Open Box Machine Learning combines the power of AI with transparency and customization through “explainable AI”. With BigPanda Open Box Machine Learning, the logic is explained to IT Operations teams in plain English. Teams can then edit this logic to add situational and tribal knowledge to strengthen it on their own, without requiring expert data scientists. From there, teams can test and run what-if experiments on real live production data to make sure their changes work as intended, before deploying them, promoting higher trust and adoption of machine learning throughout the organization.

The BigPanda Machine Learning Engine runs during alert correlation to suggest patterns that may improve correlation and during root cause analysis to highlight potential root causes of incidents

How well did you know this?

Not at all

Perfectly

MTTR

Mean time to repair/resolve(MTTR) is a maintenance metric that measures the average time required to troubleshoot and repair failed systems and equipment.

BigPanda’s AIOps combines your best-of-breed monitoring tools with automation, a single pane view, and collaborative streamlining to shorten your incident management lifecycle and dramatically improve your MTTR.

How well did you know this?

Not at all

Perfectly

OOTB

Study These Flashcards

“Out of the Box”or“Off the Shelf” features are ready-made, plug-and-play optionsare available without the need for extensive customization or complex configuration. Instead of making the brownies from scratch, you’re using the mix (either way, it’s still tasty).

BigPanda offers nearly 50 OOTB integrations ready to collect and normalize monitoring, system, and change data from your existing tools. OOTB integrations make it easy for administrators to connect BigPanda to all of your organization’s change feeds.

Primary_property

Study These Flashcards

The primary property is one of the key data fields for incoming alerts. Primary property is used extensively for correlation within BigPanda and is the field which will be displayed as the main title of alerts and incidents in the UI. Alerts must include a primary property when sent to BigPanda in order to be received.

By default, the primary property is defined as one of the following: host, service, application, or device. Some integrations may allow you to customize which field a tool uses as the primary property. See the integration-specific instructions for details on primary property field defaults and customization.

NOTE: primary_property is a reserved system word within BigPanda and cannot be changed or redefined for use in custom enrichment. When sending primary_property fields to BigPanda ensure that primary_property is lowercase only

Report

Study These Flashcards

BigPanda Analytics are centered in 8 reports designed to provide insights that you can use to make better decisions about your infrastructure. BigPanda reports draw on alert and incident metadata to create on-demand snapshots of your data for a specific period of time. By default users can retrieve data on alert types, incident actions, and key statistics like MTTA and MTTR. Each report helps you visualize trends in your monitoring data and root cause changes to help you identify hot spots in your environment and see how BigPanda is correlating alerts into actionable incidents.

BigPanda’s reports can each be customized to reflect the data important to your organization’s success. In addition, custom reports can be created to help your team answer specific questions about your system or team data.

Reports enable you to understand and proactively monitor your infrastructure for increased up-time and reduced mean time to resolution (MTTR)

Root Cause Analysis

Study These Flashcards

Root cause analysis is the process of identifying the root causes of system errors or problems. Identifying the root cause of a poorly performing application is one of the biggest challenges for enterprise IT Ops, NOC, DevOps and SRE teams. Rapid Root Cause Analysis dramatically condenses the time it takes to resolve incidents/outages.

BigPanda includes several key features to help your root cause analysis efforts:

-Aggregate and correlate alerts from every monitoring tool in your environment
-Enrich alerts with changes & topology data from every change and topology tool in your environment
-Use AI/ML to correlate all of this data together to identify the probable root cause of a problem, incident or outage.

In BigPanda, root cause analysis is done in real-time and can go a long way in helping IT Ops teams resolve incidents/outages

Root Cause Changes

Study These Flashcards

BigPanda’s Root Cause Changes (RCC) feature integrates a customer’s change information into BigPanda, to highlight changes that might be related to incoming incidents.BigPanda integrates with your change feeds to collect change data such as managed changes, code deployments, software updates, configuration changes, and upgrades, and organizes them in the Related Changes table within the Incidents tab. Once integrated with all your change feeds/tools, BigPanda’s OBML (Open Box Machine Learning) algorithms detect connections between changes made to the system and incidents in real-time, identifying changes that may have caused the outage.

Key Features:

-Integration: Funnel all your change integrations into BigPanda’s Open Integrations Hub to see all your changes organized and correlated in one place
-Visualization: See a consolidated list of all the system changes related to each incident
-Correlation: Use BigPanda’s OBML or manually correlate changes to incidents to enable Root Cause Analysis
-Collaboration: Collaborate with other users to investigate which change is the Root Cause of the incident

Secondary_property

Study These Flashcards

The secondary property is one of the key data fields for incoming alerts. Secondary property is used extensively for correlation within BigPanda and is the field which will be displayed as the secondary title of alerts and incidents in the UI.

By default, the secondary property is defined as one of the following: check or sensor.

Some integrations may allow you to customize which field a tool uses as the secondary property. See the integration-specific instructions for details on secondary property field defaults and customization.

NOTE: secondary_property is a reserved system word within BigPanda and cannot be changed or redefined for use in custom enrichment. When sending secondary_property fields to BigPanda ensure that secondary_property is lowercase only.

Severity

Study These Flashcards

Incident severity determines the seriousness and urgency of a BigPanda incident. Severity determines incident priority within BigPanda, and helps your team triage and focus on the most important outages first. Incident severity is determined by the highest severity status of any of the active alerts within an incident. As each alert enters BigPanda, it will include a status for the event, from: critical, warning, ok, or acknowledged. The highest status in the incident will set the severity.

Severity is a useful tool and can be configured with the priority tag to help your team work on the most important incidents first.

NOTE: severity is a reserved system word within BigPanda and cannot be changed or redefined for use in custom enrichment. When sending severity fields to BigPanda ensure that severity is lowercase only.

Source

Study These Flashcards

For each incoming alert, BigPanda records the name of the integrated tool as part of the alert data. Source is a particularly useful tag for creating environments, searching incidents, and creating reports.

NOTE: source is a reserved system word within BigPanda and cannot be changed or redefined for use in custom enrichment. When sending source fields to BigPanda ensure that source is lowercase only

SSO

Study These Flashcards

Single Sign-On (SSO) is an authentication process that allows users to log in to multiple systems via a single service. You can configure an SSO integration to manage your organization’s entire membership via a third-party identity provider. When SSO is configured for your BigPanda account, all authentication requests are routed through the third-party, and users cannot log in directly to BigPanda.

After an administrator successfully authenticates on the BigPanda website via basic authentication, they can configure their organization to use a SAML 2.0-compliant, third-party identity provider for delegated authentication. When SSO is configured for your organization, users will be prompted simply to enter their username when logging into BigPanda. If a user does not have an active, valid session with the identity provider, they will be redirected to a login page for the third party provider

Status

As each alert enters BigPanda, it will include a status for the event, from: critical, warning, ok, or acknowledged. This status is based on specific requirements set within your monitoring tools and the normalization between your tool and BigPanda. The highest status alert will set the severity for an incident. Status is a useful tool that can be configured with the priority tag to help your team work on the most important incidents first. Some integrations may allow you to customize which fields and values a tool uses to determine status. See the integration-specific instructions for details on status field defaults and customization. NOTE: status is a reserved system word within BigPanda and cannot be changed or redefined for use in custom enrichment. When sending status fields to BigPanda ensure that status is lowercase only

Tag

BigPanda normalizes alert data from integrated monitoring systems into standard key-value pairs, called tags. Each tag has two parts: the tag name and the tag value. Tags are the fundamental data model for your alerts and incidents and provide vital incident enrichment.The values of these tags are used to correlate highly related alerts into incidents and allow BigPanda’s Open Box Machine Learning engines to find connections between alerts and changes through your system. Alert tags enable your team to: -Search the incident feed -Define filter conditions for Environments -Search with BigPanda Query Language (BPQL)View incident information in the UI -Collect analytics. Incident tags are key-value pairs that can be added to incidents for additional incident enrichment, giving your teams insight into priority and business impacts of incidents. To learn more about how incident tags work, please see the Incident Tags documentation.

Timestamp

The time a monitoring tool triggered an event or an action happened in BigPanda in unix epoch format. All timestamps are stored in unix epoch format within BigPanda. When displayed in the UI, timestamps will be automatically converted to date-time format in the timezone of the user. NOTE: Timestamp is a reserved system word within BigPanda and cannot be changed or redefined for use in custom enrichment. When sending timestamp fields to BigPanda ensure that timestamp is lowercase only

Virtualization

Virtualization is the development of a virtual version of an IT resource, such as a server, storage, device, or even operating system. It simulates software and hardware that allows a software to run. Virtualization gives rise to virtual machines where you can run programs very much like on a physical machine. Virtual machine is the process of running another operating system on a machine using virtualization software. The virtual system is segregated from the main system. Reasons to run a virtual machine include trying a new operating system before installing it, running old or incompatible software, and testing suspicious files. Cloud virtualization enables companies to unlock scalability, business continuity, and cost saving measures, but dramatically increases the difficulty of monitoring and management for IT Ops teams. The added complexity, layers, and additional tooling needed to manage cloud and hybrid systems can rapidly overwhelm Ops teams. BigPanda is designed with the complexities of modern virtualization tools in mind. Learn more about how BigPanda can help your teams make sense of the complexity of modern IT Ops in our Getting Started documentation.

Widget

BigPanda Dashboards provide easy-to-read operational health metrics in a consolidated view. Ideal for NOC displays and status monitoring, each Dashboard is made up of a series of widgets showing color-coded key information on incident severity and status for incidents in a specific environment. Each widget shows information for a single environment, making it easy to track incident metrics by region, team, or infrastructure types. For example, you might have environments for each business service so that you can track metrics on each separately. In addition BigPanda Analytics Reports are made up of individual widgets that display visualizations of your monitoring data.Reporting widgets may include charts, graphs, and tables. Each report is configured to show a specific group of widgets to make visualizing business impact easy. Report widgets may be configured to hone in on tags, resources, and KPIs of special interest to your team

Onboarding Term Glossary Flashcards

(29 cards)