Cribl Admin CCOE Flashcards
Which of the following is a valid JavaScript method?
.startswith
.endswith
.match
Which of the following logical operator are used as an “and” operator?
&&
Value Expressions can be used in the following locations
Capture Screen and Routes Filtering Screen
Routes Filtering Screen and Pipeline Filtering
Pipeline Filtering and Capture Screen
None of the above! is the correct answer
Value Expressions are used to evaluate true or false.
False
Which of the following logical operator are used as an “not” operator?
!
Git
What command shows you the files that have changed, been added, or are tracked?
Status
What order must you use to add a new file to a remote repository?
add, commit, push
Which command allows you to see a history or commits?
git log
Which command allows you to add a file to the respository?
add
Worker Process
A process within a Single Instance, or within Worker Nodes, that handles data inputs, processing, and output. Worker Processess operate in parallel. Each Worker Process will maintain and manage its own outputs.
Worker Node
An instance running as a managed worker, whose configuration is fully managed by the Leader Node
Worker Group
A collection of Worker Nodes that share the same configuration
Leader Node
an instance running in Leader mode, used to centrally author configurations, and monitor a distributed deployment
Mapping Ruleset
an ordered list of Filters, used to map Workers to Worker Groups
Which of the following is not a Worker responsibility?
Back up to Git (local only)
Which of the following is not an advantage of a Distributed deployment over a single instance?
Advanced data processing capabilities
Advantages include - Higher reliability, unlimited scalability
Load Balancing among the Worker Processes is done the following way:
The first connection will go to a random Worker Process, and the remaining connection will go in increasing order to the following Worker Processes.
All Cribl Stream deployments are based on a shared-nothing architecture pattern, where instances/Nodes and their Worker Processes operate separately
True!
The Single Stream instance is valid for dev, QA or testing environments
True
In Distributed Mode, the Worker Node…
is Stateless
Can continue running even without communication to the Leader with limitations
Can be accessed from inside the Leader
The main path between Sources and Destinations
Which of the following is true regarding Worker and Leader communication?
Worker initiates the communication between Leader and Workers
Worker processes within a Node are distributed using a round robin process based on connections
True
Which of the following are valid Stream deployment options?
Single Instance (software loaded on single host)
Distributed Deployment (Leader and Workers)
Stream deployed in Cribl’s cloud (SaaS)
Stream deployed in customers own cloud instance
Worker Group to Worker Group communication is best done by using…
Stream TCP
and
Stream HTTP
Cribl.Cloud advantages
Simplified administration
Simplified distributed architecture
Git preconfigured
Automatic restarts and upgrades
Simplified access management and security
Transparent licensing
Cribl.Cloud does not provide TLS encryptionon any Sources
False
Cribl.Cloud allows for Stream to Stream communication from Cloud Worker Groups to on-prem Worker Groups
True
Cribl.Cloud allows for restricted access to certain IP adresses
True
When using Stream in Cribl.Cloud, how do you get data into the cloud?
Using common data sources that are pre-configured (TCP, Splunk, Elastic, etc)
Using ports 200000-200100 that are available to receive data
Cribl.Cloud has preconfigured ports you can use to bring in data
True
Which of the following is not valid for a Cribl.Cloud deployment?
Single Stream instance
Distributed Stream instance with Leader on-prem & workers in the Cribl.Cloud
Which of the following are benefits when using Cribl.Cloud?
Simplified administration
Git preconfigured
Automatic upgrades
Cribl.Cloud cannot integratte with an on-prem Cribl Worker Group
False
Cribl.Cloud allowed ports include
20000-20010
Cribl.Cloud does not provide any predefined sources
False
What affects performance/sizing?
Event Breaker Rulesets
Number of Routes
Number of Pipelines
Number of Clones
Health of Destinations
Persistent Queueing
Estimating Deployment Requirements
Allocate 1 physical core for each 400GB/day of IN & OUT throughput
100GB in -> 100GB out to 3 destinations=400GB total. 400GB/400GB=1 physical core
Which of the following will impact your choice for amount of RAM?
Persistent Queueing requirements
Cribl Worker Process default memory is
2GB RAM
How many Worker Nodes, each with 16vCPU is needed to Ingest 10TB and Send out 20TB?
11 Worker Nodes
Cribl recommends you use the following specifications?
16vCPU per Worker Node
How can a Stream deployment be scaled to support high data and processing loads?
Scale up with higher system performance (CPU, Ram, Disk) on a single platform
Scale out with additional platforms
Add more worker groups
With a very large # of sources (UFs), it is possible to exhaust the available TCP ports on a single platform
True
Leaders require higher system requirements than workers
False
Persistent Queueing (Source & Destination) might impact performance
True
Cribl scales best using…
Many medium size Worker nodes
Remote Repository Recovery - Overview
- System Down
- Install Git on Backup Node
- Recover configuration from remote repository
- Restart Leader Node
- Back Operational :)
Setting up and Connecting to Git Hub
- Set up GitHub
- Create an empty crypto repository
- Generate keys to connect Stream to GitHub (Public key>GitHub/Private Key>Stream)
- Configure Stream UI to connect to Remote Git
- Once connected, each time a change is made to local to sync with the remote repository
When using this commandto generate SSH Public and Private keys: ssh-keygen -ted25519 -C “your_email@example.com”, which file contains the public key
id_ed25519.pub
A remote repository on GitHub is a mandatory requirement when installing Cribl Stream
False
A Remote Git instances is
Optional for all Stream Deployments
What are the methods to backup Cribl Leader Node?
Rsync
Tar / untar
Copy configuration files to S3, rehydrate configuration files from S3
Git and Git Hub provides backup and rollback of Cribl Stream configurations
True
Cribl Stream fault tolerance requires the use of a remote Git repository
True
What is a true statement about GitHub acounts?
Requires manual configuration outside of Cribl Stream configuration
Stream disaster recovery requires a dedicated standby backup Leader
False
Which Git commands are part of the recovery steps?
Git init
Git fetch origin
What is the purpose of using Git?
To provide a backup of configuration files
To provide a history of changes within Stream
./cribl help -a
Displays a list of all the available commands
Common Cribl Stream commands
./cribl start
./cribl stop
./cribl restart
./cribl status (shows Stream status)
./cribl diag (manages diagnostic bundles)
Cribl Stream CLI
CLI gives you the ability to run commands without needed access to the GUI
Helps in creating automated scripts if needed
Gives you the ability to run diagnostics and send them to Cribl Support
What command is used to configure Cribl Stream to start at boot time?
boot-start
What format are the diag files in?
.tar.gz
What does the command ‘cribl diag’ create command do?
Creates a gzip file with configuration information and system state
What command is used to configure Cribl Stream as a leader?
./cribl mode-master
Once you run ‘cribl boot-start enable -m systemd’, you will need to use what command to start/stop Stream?
systemctl start cribl
The configuration files created with the diag command are in .js format?
False
You cannot export packs using the command line
False
What types of files are in the diagnostic file?
Files in the local directory
Log files
State of the system
Details about the system running Stream
You can use the ‘mode’ command to configure a Cribl Stream instance into a Cribl Edge Node?
True
You cannot install Packs using the CLI
False
Troubleshooting Source Issues
What is the status of the source?
Sources will have a red status on Leader until they are deployed to a worker group. Status can still be red if there are binding issues
Troubleshooting Source Issues
If you do a live cpature on the Source, are there any events?
Make sure JavaScript filter set for the live capture is correct. If no data is returned, the problem is likely with the network or further upstream
Troubleshooting Source Issues
Is the Source operational/reachable?
Ping the server?
Using nc or telnet command, test the connection source
Troubleshooting Source Issues
Is the Destination triggering backpressure?
Check by going to the Destination in Monitoring>Destinations and clicking on Status.
If the Source is connected via a Route to a Destination that is triggering backpressure, set to Block to stop sending data.
Troubleshooting Source Issues
Check Source config
Typos? Proper authentication?
Stream Sources
Summary
Stream can accept data pushed to it, or pull data via API calls
Open protocols, as well as select proprietary products, are supported
Pulling data falls into two categories
* Scheduled pulls for recurring data (think tailing a file)
* Collector jobs intended for ad hoc runs as in Replay scenario
Push Sources push to us such as Splunk, TCP
Internal sources are internal to us such as Datagens or Internal logs/metrics
Low-code interface eases management
Capture sample data at any stage to validate and test
Stream Syslog Sources
Stream Syslog Sources Summary
Stream can process a syslog stream directly
Moving to Cribl Stream from existing syslog-ng or rsyslog servers fully replaces those solutions with one that is fully supported and easily managed
Optimze syslog events
Syslog data is best collected closest to the source
Use a load balancer to distribute load across multiple worker nodes
Reduce management conplexity while ensuring reliable and secure delivery of Syslog data to chosen systems
Configuring Elastic Beats
Beats are open-source data shippers that act as agents. Most popular with Cribl customers:
Filebeat - filebeat.yml
Winlogbeat - Winlogbeat.yml
Change control is built into the system via Git
True
Users are independent Cribl Stream objects that you can configure even without RBAC enabled
True
URL of the Elastic server that will proxy non-bulk requests
Proxy URL
While Splunk Search collector is a powerful way to discover new data in realtime, you should update the Request Timeout Parameter to stop the search after a certain period of time to avoid…
Having the collector stuck in a forever running state
Senders with load balancers built in include:
Elastic Beats
Splunk Forwarder
When considering Filebeat, to ensure data is received at Stream, change the filebeat .yml to
‘setup.ilm.enabled: false’
If Stream receives an event from Elastic Beats, we can deliver the event to
Any destination
Roles are a set of permissions
False
Cribl Stream ships with a Syslog Source in_syslog, which is preconfigured to listen for
Both UDP and TCP traffic on Port 9514
All syslog senders have built-in load balancing
False
Review of Collectors
Stream Collectors are a special group of inputs that are designed to ingest data intermittently rather than continuously.
Collectors can be scheduled or run ad-hoc
Cribl Stream Collectors supports the following data types
Cribl Stream Collectors supports the following data types:
Azure Blob
Google Cloud Storage
REST
S3
Splunk Search
Health Check
Database
File System
Script
Collectors in Single Deployments
When a Worker node receives the job:
-Prepares the infrastructure to execute a collection job
-Discovers the data to be fetched
-Fetches the data that match the run filter
-Passes the results either through the Routes or into a specific Pipeline
Collectors in Distrubuted Deployments
In a distributed deployment, collectors are configured per Worker Group (within the Leader)
-The Worker Node execute the tasks to its entirety
-The Leader Node oversees the task distribution and tries to maintain a fair balance across jobs
-Cribl Stream uses “Least-In-Flight Scheduling”
-Because the Leader manages Collectors’ state, if the Leader instance fails, the Collection jobs will fail as well.\
Worker Processes
A Worker Node can have multiple worker processes running to collect data.
Since the data is spread across multiple worker processes, an alternative like Redis is required to perform stateful suppression and stateful aggregation
Discovery Phase
Discovers what data is available based on the collection settings
Collection Phase
Collects the data based on the settings of the discovery phase
Workers will continue to process in flight jobs if the Leader goes down.
True
If skippable is set to yes, jobs can be delayed up to their next run time if the system is hitting concurrency limits.
True
Worker Nodes have
Multiple processes that process data independently
Worker Nodes keep track of state when processing data?
False
What happens after the Worker Node asks the Leader what to run?
The Leader Node sends work to Workers based on previous distributions of work.
Workers will stop processing collector jobs that are currently running if the Leader goes down
False
Filesystem collectors and Script collectors can only run in a on-prem Stream environment
True
What are the ways you can run a collection job?
Scheduled or AdHoc
The following collectors are available in Cribl Cloud
S3 Collector
and
REST Collector
You can run a scheduled collection job in preview mode
False
Streaming Destinations
Accept events in real time
Non-streaming Destinations
accept events in groups or batches
Configuring Destinations
For each destination type, you can create multiple definitions, depending on your requirements. Definitions include Block, Drop, Queue
Value of Destinations
Support for many destinations
Not all data is of equal value. High volume low value data can be sent to less expensive destinations
Value of Destinations
Send data from the same source to multiple destinations
- Simplify data analytics tools migration
- Store everything you may need in the future, analyze only what you need now
Value of Destinations
No extra agents required
Data collected once can be sent to multiple destinations without extra operations cost to run new agents
Value of Destinations
Integrations with common destinations
- Quick time to value
- Operations cost reduction
Value of Destinations
Live data capture shows what’s sent to destinations
Reduce troubleshooting effort
Value of Destinations
Persistent Queue
- Minimize data loss
- Eliminate/minimize the need to introduce separate buffering/queueing tools
Multiple Splunk Streaming Destinations
Splunk Single Instance - Stream data to a single Splunk instance
Splunk Load Balanced - Load balance the data it streams to multiple Splunk receivers (indexers)
Splunk HEC - Can stream data to a Splunk HEC (HTTP Event Collector) receive through an event endpoint
Splunk Destinations Tips
Enabling Multi-Metrics
Multi-metrics is data sent in JSON format which allows for each JSON object to contain measurements for multiple metrics.
Takes up less space and improves search performance
Splunk Destinations Tips
Adjusting timeouts and Max connections
Adjust timeout settings for slow connections. Increase request concurrenct based on HEC receivers
Splunk Destinations Tips
_raw Fields and index Time Fields in Slpunk
-Everything that is in _raw is viewable as event content
-outside of _raw is metadata which can be searched with tstats or by including :: instead of =
-Fields outside of _raw are viewe when event is expanded
-If events do not have a _raw field, they’ll be serialized to JSON prior to sending to Splunk
Splunk Destinations
Summary
-Cribl Stream can send data to Splunk using a variety of different options
-Data can be sent securely over TLS
-Enabling multi-metrics can save space and perform better
Elastic Destinations
Bulk API - Performs multiple indexing or delete operations in a single API call
Elastic Destinations
Data Structure Best Practice
Put all fields outside of _raw. use JSON
Elastic Data Stream
- Create a policy > an index templatw
- Each data stream’s index template must include name or wildcard pattern, data stream’s timestamo field, and mappings and settings applied to each
- Source for data stream
- Destination for data stream
- Support for ILM
Elastic Destinations
Key Use Cases
-Route data from multiple existing data sources or agents
-Migrate data from older versions
-Optimize data streams and send data in the right form to Elastic
Splunk > Elasticsearch
Step 1: Configure Splunk Forwarder
Step 2: Configure Splunk Source in Stream
Step 3: Configure Elasticsearch Destination
Step 4: Configure Pipeline (regex extract function, lookup function, GeoIP function)
Step 5: Results
Destination: Amazon S3
Stream does NOT have to run on AWS to deliver data to S3
Destination S3
Partitioning Expression
Defines how files are partitioned and organized - Default is date-based
Destination S3
File Name Prefix Expression
The output filename prefix - Defaults to CriblOut
Use only with low cardinality partitions and understand impact to open files & AWS API
Destination S3
Cardinality
=Max Unique Values
Number of Staging Sub-directories or S3 Bucket prefixes
Cardinality too high?
When writing to S3 - too many open files and directories on worker nodes
When reading from S3 - Less chance of hitting S3 read API limits
Destination S3
Cardinality too Low?
When writing to S3 - bigger files written to fewer directories in S3
When reading from S3 - Less filtering ability during replays, more data downloaded so larger data access charges, larger changer of hitting S3 read API limit
Cardinality General Guidance
Plan for cardinality of no more than 2000 / partition expression
Stream to Stream
Sending data from Stream Worker to Stream Worker, not Worker to Leader
Internal Cribl Sources
Receive data from Worker Groups or Edge Nodes
Common for Customer-managed (on-prem) Worker sends data to a Worker in Cribl.Cloud
Internal Cribl Sources treat internal fields differently than other Sources
Internal Cribl Destinations
Enables Edge nodes, and/or Cribl Stream instances, to send data to one or multiple Cribl Stream instances
Internal fields loopback to Sources
Stream Best Practices
-For maximum compression, it is best to change the data to JSON format
-Internal Cribl Destinations must be on a Worker Node that is connected to the same leader as the internal Cribl Source
-For minimum data transfer, process data on source workers instead of destination workers
-For heavy processing, process data on destination workers
When setting up an S3 destination the file name prefix expression:
Can negatively impact both read and write API count
Can dramatically increase number of open files
Generally avoid unless you’ve done your due diligence and have low cardinality partition expressions
All of the above
It is not recommened to enable Round-Robin DNS to balance distribution of events between Elasticsearch cluster nodes
False
What are two benefits of a worker group to worker group architecture?
Compress data and reducing bandwidth
Reducing Cloud provider egress costs
For heavy processing, a recommendation best practice is to process data on
Destination workers
When tuning settings for an S3 destination, a good way to avoid any “too many open files” errors is to decrease the number of max open files.
False
Which of the following allows you to configure rules that route data to multiple configured Destinations?
Output router
Parquet Formation
Which is an ideal scenario for worker group to worker group architecture?
Capturing data from overseas sources that is destined to local destinations
Reducing the number of TCP connections to a destination
Capturing data from a cloud provider and shipping it to an on-prem destination to avoid engress costs
all of the above
With Exabeam, it is important to figure out what syslog format/content needs to be in place
true
What are the two main considerations for S3 Destinations?
Cardinality of partition and file name expressions
Max open files on system
Stream S3 destination setting raw means
Less processing, smaller events, no metadata
Routes
-Allow you to use filters to send data through different pipelines.
-Filtering capabilities via JavaScript expression and more control
-Data Cloning allows events to go to subsequent route(s)
-Data Cloning can be disabled with a switch toggle
Routes
Dynamic Output Destinations
-Enable expression > Toggle Yes
-Enter JavaScript expresion that Stream will evaluate as the name of the Destination
Routes
Final Toggle
Allows you to stop processing the data depending on the outcome. If an event matches the filter, and toggle is set to Yes, those events will not continue down to the next Route. Events that do not match that filter will continue down the Route
Routes
Final Flag and Cloning
-Follow “Most Specific First” when using cloning
-Follow “Most General First” when not using cloning
-At the end of the route, you will see the “endRoute” bumper reminder
Routes
Unreachable Routes
Route unreachable waarning indicator: “This route might be unreachable (blocked by a prior route), and might not receive data.
Occurs when matching all three conditions:
-Previous Route is enabled
-Previous Route is final
-Previous Route’s filter expression evaluates to true
Routes
Best Practices
Filter Early and Filter fast!
-you want to quickly filter out and data you do not want to process
Routes
Best Practices continued
-Certain JavaScript string operators run faster than others
-Each of these functions operates similarly to each other, but slighty different:
-indexof, includes and startswith use strings as their function parameter
-match, search, and test use regular expressions
Routes
Best Practices: Most Specific/Most General
Most General: If cloning is not needed at all (all Final toggles stay at default), then it makes sense to start with the broadest expression at the top, so as to consume as many events as early as possible
Most Specific: If cloning is needed on a narrow set of events, then it might make sense to do that upfront, and follow it with a Route that consumes those clones immiediately after
Object Storage (S3 buckets): Since most data going to object storage is data being cloned, it is best to put routes going to object storage at the top.
Filter on common fields. Filter on fields like inputid, and metadata fields, rather than _raw.includes
You created a QuickConnect against a source and now you want to create a route against a subset of that source’s events - to a different destination. What are the steps you need to take?
Navigate to the Source. Go to ‘Connected Destinations’. Click on ‘Routes’ to revert to using them instead of QuickConnect. Create 2 routes: one to replace the old QuickConnect that was deleted, and a new route with a filter to map to the events of interest.
Both QuickConnect and Routes can be used against the same source.
False
What’s the general rule for having a performant system?
Filter early and filter fast!
Which is true?
-Routes have drag and drop capabilities to connect to a source to a destination; QuickConnect doesn’t (FALSE)
-QuickConnect has advanced capabilities to assign for assigning pre-processing pipelines to a source and post-processing pipelines to a destinations (FALSE)
-QuickConnect does not allow mapping a Pack between sources and destinations (FALSE)
-Routes map to a filter; QuickConnect maps a source to a destinatiosn (TRUE!!!!)
Which is the most performant JavaScript function?
indexOf
Which is a good use case for QuickConnect?
-Stream Syslog Source receiving events from hundreds of device types and applications (NOOOOOOOO)
-Stream Splunk Source receiving events from Windows and Linux hosts with Splunk Universal Forwarders (NOOOOOO)
-REST API Collector polling Google APIs with JWT authentication (NOOOOOO)
-Palo Alto devices sending to a dedicated Stream Syslog Source mapping to a different port than other syslog events (YESSSSS)
Filter Expressions
Filter Expressions are used to decide what events to act upon in a Route or Function. Uses JavaScript language
Value Expressions
typically used in Functions to assign a value. Uses JavaScript language
There are 3 types of expressions
-Assigning a Value
-Evaluating to a Value
-Evaluating to true/false
Filter Expressions Usage
Filter Expressions can be used in multiple places:
-Capture
-Routing
-Functions within Pipelines
-Monitoring Page
Special Use Expressions
Rename Function - Renaming Expression
name.toLowerCase(): any uppercase characters in the field name get changed to lowercase
name.replace(“geoip_src_country”, “country”): This is useful when JSON objects have been flattened (as in this case)
Filter Expression Methods
Expression methods can help you to help determine true or false. Here is a list of commonly used methods:
.startswith: Returns true if a string start with the specified string
.endswith: Returns true if a string ends with the specified string
.includes: Returns true if a string contains the specified string
.match: Returns an array containing the results if the string matches with a regular expression
.indexOf: returns the position of the first occurrence of the substring
Cribl Expressions Methods
Cribl Expressions are native methods that can be invoked from any filter expression. All methods start with C.
Examples: C.Crypto or C.Decode
What operators are available to be used in Filter Expressions?
&&
||
()
The Filter Expression Editor allows you to
Test your expression against sample data
Test your expression against data you have collected
Test your expression against data to see if it returns true or false
Ensure your expresison is written correctly
Filter Expressions are only used in Routes
False
Select all the Fitler Expression operators you can use
”>”
“<”
“==”
“!==”
Filter Expressions can be used in the following places
Functions within Pipelines
Routes
Monitoring Page
Capture Page
You can combined two Filter expression
True
What is the difference between using “==” or “===”
”==” checks that the value is equal but “===” checks that the value and type are equal
You can use .startsWith and .beginWith in filter expressions
False
Pipelines
Pipelines are a set of functions that perform transformations, reduction, enrichment, etc.
Benefits of pipelines
-Can improve SIEMs or analytics platforms by ingesting better data
-Reduce costs by reducing the amount of data going into a SIEM
-Simplifies getting data in (GDI)
Pipelines are similar to
Elastic LogStash
Splunk props/transforms
Vector Programming
Types of Pipelines
Pre-Processing - Normalize events from a Source
Processing - Primary pipeline for processing events
Post-Processing - Normalize events to a Destination
Type of Pipelines
Pre-Processing
This type is applied at the source
Used when you want to normalize and correct all the data coming in
Examples:
-Syslog Pack pre-processing all syslog events coming from different vendors; specific product packs/pipelines can then be mapped to a route
-Microservices pack pre-shapes all k8s, docker, container processed logs
-Specific application pipeline/packs can then be mapped to routes
Types of Pipelines
Processing Pipelines
Most common use of pipelines
you can associate pipeline to routes using filters
Types of Pipelines
Post-Processing
Maps to Destinations
Universally post-shape data before it is routed
Examples:
-Convert all fields to JSON key value pairs prior to sending to Elastic
-Convert all logs to metrics prior to sending to Prometheus
-Ensure all Splunk destined events have the required index-time fields (index, source, sourcetype, host)
Pipelines
Best Practices!
Name your pipeline and the route that attaches to it similarly
-Create different pipelines for different data sets. Creating one big pipeline can substaintially use more resources, become unmanagable, and look confusing and complicated.
-Filter early and filter fast!
-Do not reuse pipelines. Do not use the same pipeline for both pre-processing and post-processing. Can make it hard to identify a problem and where it stems from
-Capture sample events to test. Allows you to visualize the operation of the functions within a pipeline.
-Test! Use data set to test and validate your pipeline
-Use statistics. Use Basic Statistics to see how well your pipelines are working
-Pipeline Profiling - determine performance of a pipeline BEFORE it is in production
You should create different pipelines for different data sets
True
Pipelines contain Functions, Routes and Destinations
False
Stream Functions Overview
-Functions act on received events and transform the received data to a desired output.
-Stream ships with several functions that allow you to perform transformations, log to metrics, reduction, enrichment, etc.
-Some expressions use JavaScrip
-For some functions, knowning Regex will be required
5 Key Functions
Eval
Sampling
Parser
Aggregations
Lookup