Indexing API's Flashcards
What is the use case for pipelines?
It’s a saved script that can be stored and reused in different API calls.
How do you create a pipeline?
PUT _ingest/pipeline/< pipeline name > { description: ".....", processors: [ ] }
What are some processors available?
TODO: Check on elastic website and make a list of important ones.
- remove { field: “….” },
- set { field: “_source.< field > “, value: { “{{ _source…..}}” } # mustache notation
- convert { field: …, type: …. } # type cast
- script { ….. } # more generic processor
What’s the gotcha between using scripts in the ingest pipeline and outside of it?
Ingest pipeline script uses “ctx.< field >” while outside you need to use “ctd._source.< field >”.
TODO: is this still the case in latest version?
What’s the requirement to use a pipeline?
You need to have an ingest node type running.
What are the use cases for the re-index API?
- Copy data across clusters
- Re-process and modify data into a new index
What’s the API structure for the re-index API?
POST _reindex { "source": { index: ... }. "dest": { index: < new index > }, "script": { .... } }
How do you enable remote re-index across nodes?
Whitelist the SRC on the DEST config file:
reindex. remote.whitelist: “< ip >:< port >, ….” (comma separated list)”
reindex. ssl.verification_mode: certificate
reindex. ssl.truststore.type: PKCS12
reindex. ssl.keystore.type: PKCS12
reindex. ssl.truststore.path: certs/node-1
reindex. ssl.keystore.path: certs/node-1
$ bin/elasticsearch-keystore add reindex.ssl.truststore.secure_password
$ bin/elasticsearch-keystore add reindex.ssl.keystore.secure_password
How do you do remote re-index (across clusters)?
POST _reindex { source: { remote: { host: "https:///< ip >:< port >", username: < user >, password: < password > } index: ...., } "dest": { index: .... } }
How to re-index only a subset of the data?
Add a query section to the source section.
POST _reindex { source { query: { .... } } }
How do you mutate the data while copying it?
Add a “script” section:
POST _reindex
{
source: { … },
dest: { … },
script: { …. }
}
What’s the update by query api structure?
POST < index name >/_update_by_query { script: { .... }, query: { ... } }
In what instances would you want to simply increment the version of all the objects in an index?
TODO: this was mentioned in the video, but why?
How do you add multi line scripts?
You can use triple quotes in the script (“””) to have multi line scripts from kibana.
This doesn’t seem to be a standard JSON feature. (TODO: confirm)
How would you increase a value by X percent with the _update by query api?
script: {
lang: “painless”,
source: “””
ctx. _source.field += ctx._source.balance * X
if (ctx._source.transactions == null) { } """ }
TODO: move this to a painless deck of cards.
TODO: what about concurrent updates?
How do you use reindex and update by query with a pipeline?
_update_by_query:
Add the “?pipeline=< pipeline >” param
re-index:
{
dest: {
pipeline: “< pipeline >”
What are the use cases for dynamic templates?
- Allows you to specify how new fields are to be mapped into an index.
- Create patterns so you don’t need to specifiy every new field. For example: “text” is mapped as text, “is” is mapped as boolean, etc. This allow you to use convention instead of explicitly specifying everything.