Indexing API's Flashcards
What is the use case for pipelines?
It’s a saved script that can be stored and reused in different API calls.
How do you create a pipeline?
PUT _ingest/pipeline/< pipeline name > { description: ".....", processors: [ ] }
What are some processors available?
TODO: Check on elastic website and make a list of important ones.
- remove { field: “….” },
- set { field: “_source.< field > “, value: { “{{ _source…..}}” } # mustache notation
- convert { field: …, type: …. } # type cast
- script { ….. } # more generic processor
What’s the gotcha between using scripts in the ingest pipeline and outside of it?
Ingest pipeline script uses “ctx.< field >” while outside you need to use “ctd._source.< field >”.
TODO: is this still the case in latest version?
What’s the requirement to use a pipeline?
You need to have an ingest node type running.
What are the use cases for the re-index API?
- Copy data across clusters
- Re-process and modify data into a new index
What’s the API structure for the re-index API?
POST _reindex { "source": { index: ... }. "dest": { index: < new index > }, "script": { .... } }
How do you enable remote re-index across nodes?
Whitelist the SRC on the DEST config file:
reindex. remote.whitelist: “< ip >:< port >, ….” (comma separated list)”
reindex. ssl.verification_mode: certificate
reindex. ssl.truststore.type: PKCS12
reindex. ssl.keystore.type: PKCS12
reindex. ssl.truststore.path: certs/node-1
reindex. ssl.keystore.path: certs/node-1
$ bin/elasticsearch-keystore add reindex.ssl.truststore.secure_password
$ bin/elasticsearch-keystore add reindex.ssl.keystore.secure_password
How do you do remote re-index (across clusters)?
POST _reindex { source: { remote: { host: "https:///< ip >:< port >", username: < user >, password: < password > } index: ...., } "dest": { index: .... } }
How to re-index only a subset of the data?
Add a query section to the source section.
POST _reindex { source { query: { .... } } }
How do you mutate the data while copying it?
Add a “script” section:
POST _reindex
{
source: { … },
dest: { … },
script: { …. }
}
What’s the update by query api structure?
POST < index name >/_update_by_query { script: { .... }, query: { ... } }
In what instances would you want to simply increment the version of all the objects in an index?
TODO: this was mentioned in the video, but why?
How do you add multi line scripts?
You can use triple quotes in the script (“””) to have multi line scripts from kibana.
This doesn’t seem to be a standard JSON feature. (TODO: confirm)
How would you increase a value by X percent with the _update by query api?
script: {
lang: “painless”,
source: “””
ctx. _source.field += ctx._source.balance * X
if (ctx._source.transactions == null) { } """ }
TODO: move this to a painless deck of cards.
TODO: what about concurrent updates?
How do you use reindex and update by query with a pipeline?
_update_by_query:
Add the “?pipeline=< pipeline >” param
re-index:
{
dest: {
pipeline: “< pipeline >”
What are the use cases for dynamic templates?
- Allows you to specify how new fields are to be mapped into an index.
- Create patterns so you don’t need to specifiy every new field. For example: “text” is mapped as text, “is” is mapped as boolean, etc. This allow you to use convention instead of explicitly specifying everything.
How do you create a dynamic template mapping?
- Set it to the index:
PUT < index name >
{
“mappings”: {
“dynamic_templates”: [
“< template name > “: {
“match_mapping_type”: “< type to match >”,
“match”: “< filter on field name >”,
“unmatch”: “ < filter on what NOT to match >”,
“mapping”: {
… < mapping definition > …
“type”: “…. type … “
}
}
]
}
}
How to filter on field names on dynamic template mappings?
Use the “match” or “unmatch” fields with a wildcard.
What are the use cases for index templates?
- Time series data: as we routinely create new indexes to store data like log for example. We might create a new index every day so we want that index to follow a template.
How do you create an index pattern / template?
PUT _templates/< template name > { "aliases": ...., "mappings:": ...., "settings": ...., "index_patterns": ["< wildcard > "] }
Explain how does index template works
- Whenever a new index is created, it is matched against the index pattern of the templates
- If it matches, the index is created using that template.
What are some use cases for aliases?
- Create a filter that return only a subset of the data (like a saved query).
- Aggregate data (same alias, multiple indexes). TODO: test this and learn more about it.
How do you create an alias?
POST _aliases { "actions": [ { "add": { "index": "< index name >", "alias": "< alias name >" } } ] }
How to access an alias?
It behaves the same way as a regular index.
How to remove an alias?
POST _aliases { "actions": [ { "remove": { "index": "< index name >", "alias": "< alias name >" } } ] }
How do you create a filtered alias?
POST _aliases POST _aliases { "actions": [ { "add": { "index": "< index name >", "alias": "< alias name >", "filter": { .... < filter definition > .... } } } ] }
What are the 3 main components of indexes?
- Aliases
- Mappings
- Settings
How do you list indexes?
GET _cat/indices?v
What happens to fields (mappings) you don’t define when you post new data?
The mappings are auto filled.
TODO: Can this be disabled? How to control this?
How do you create an index?
# empty index PUT
# with options PUT { }
Explain dynamic vs explicity mapping
TODO
Where do the default settings for an index come from?
TODO
How to index an object?
# auto generated id PUT /_doc
# With a given id PUT /_doc/
How to fetch an object?
With metadata
GET /_doc/
Without metadata (source only)
GET /_source/
How ids are auto generated for elastic objects?
A UUID is generated.
What are the 2 types of updates you can perform to an object?
- Doc Update
POST /_update/ { "doc": { "lastname": "new last name" } }
- Script update { "script": { "lang": "painless", "source": "ctx._source.remove('field')" } }
What scripting languages are supported by elastic?
- painless
TODO
How to remove fields from an object?
Use the update api with a script.
For example:
ctx._source.remove(‘fieldname’)
How do you delete an object from the index?
DELETE [INDEX-NAME]/_doc/[ID]
What’s the file format for bulk indexing?
NDJSON (newline delimited json) (http://ndjson.org/)
first line: metadata
{ “index”: { “_id”: “….” } }
Second line: source
{“field”: …., “field2”: ….}
and so on
How to bulk index object?
curl -u -k -H ‘Content-Type: application/x-ndjson’ -X POST ‘https://localhost:9200/[index-name]/_bulk?pretty’ –data-binary @file.json > output.json
What formats does the bulk api support?
- ndjson
- Apparently that’s the only one (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html)