Indexing API's Flashcards

1
Q

What is the use case for pipelines?

A

It’s a saved script that can be stored and reused in different API calls.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you create a pipeline?

A
PUT _ingest/pipeline/< pipeline name >
{
   description: ".....",
   processors: [
   ]
}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some processors available?

A

TODO: Check on elastic website and make a list of important ones.

  • remove { field: “….” },
  • set { field: “_source.< field > “, value: { “{{ _source…..}}” } # mustache notation
  • convert { field: …, type: …. } # type cast
  • script { ….. } # more generic processor
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What’s the gotcha between using scripts in the ingest pipeline and outside of it?

A

Ingest pipeline script uses “ctx.< field >” while outside you need to use “ctd._source.< field >”.

TODO: is this still the case in latest version?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What’s the requirement to use a pipeline?

A

You need to have an ingest node type running.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the use cases for the re-index API?

A
  • Copy data across clusters

- Re-process and modify data into a new index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What’s the API structure for the re-index API?

A
POST _reindex
{
   "source": { index: ... }.
   "dest": { index: < new index > },
   "script": { .... }
}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you enable remote re-index across nodes?

A

Whitelist the SRC on the DEST config file:

reindex. remote.whitelist: “< ip >:< port >, ….” (comma separated list)”
reindex. ssl.verification_mode: certificate
reindex. ssl.truststore.type: PKCS12
reindex. ssl.keystore.type: PKCS12
reindex. ssl.truststore.path: certs/node-1
reindex. ssl.keystore.path: certs/node-1

$ bin/elasticsearch-keystore add reindex.ssl.truststore.secure_password

$ bin/elasticsearch-keystore add reindex.ssl.keystore.secure_password

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you do remote re-index (across clusters)?

A
POST _reindex
{
    source: {
        remote: {
            host: "https:///< ip >:< port >",
            username: < user >,
           password: < password >
      }
      index: ....,
   }
   "dest": {
     index: ....
   }
}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to re-index only a subset of the data?

A

Add a query section to the source section.

POST _reindex
{
source {
    query: { .... }
}
}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you mutate the data while copying it?

A

Add a “script” section:

POST _reindex
{

source: { … },
dest: { … },
script: { …. }
}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What’s the update by query api structure?

A
POST < index name >/_update_by_query
{
   script: { .... },
   query: { ... } 
}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In what instances would you want to simply increment the version of all the objects in an index?

A

TODO: this was mentioned in the video, but why?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you add multi line scripts?

A

You can use triple quotes in the script (“””) to have multi line scripts from kibana.
This doesn’t seem to be a standard JSON feature. (TODO: confirm)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How would you increase a value by X percent with the _update by query api?

A

script: {
lang: “painless”,
source: “””
ctx. _source.field += ctx._source.balance * X

 if (ctx._source.transactions == null) {
 }
""" }

TODO: move this to a painless deck of cards.
TODO: what about concurrent updates?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you use reindex and update by query with a pipeline?

A

_update_by_query:

Add the “?pipeline=< pipeline >” param

re-index:

{

dest: {
pipeline: “< pipeline >”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the use cases for dynamic templates?

A
  • Allows you to specify how new fields are to be mapped into an index.
  • Create patterns so you don’t need to specifiy every new field. For example: “text” is mapped as text, “is” is mapped as boolean, etc. This allow you to use convention instead of explicitly specifying everything.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How do you create a dynamic template mapping?

A
  • Set it to the index:

PUT < index name >
{
“mappings”: {
“dynamic_templates”: [
“< template name > “: {
“match_mapping_type”: “< type to match >”,
“match”: “< filter on field name >”,
“unmatch”: “ < filter on what NOT to match >”,
“mapping”: {
… < mapping definition > …
“type”: “…. type … “
}
}
]
}
}

19
Q

How to filter on field names on dynamic template mappings?

A

Use the “match” or “unmatch” fields with a wildcard.

20
Q

What are the use cases for index templates?

A
  • Time series data: as we routinely create new indexes to store data like log for example. We might create a new index every day so we want that index to follow a template.
21
Q

How do you create an index pattern / template?

A
PUT _templates/< template name >
{
  "aliases": ....,
  "mappings:": ....,
  "settings": ....,
  "index_patterns": ["< wildcard > "] 
}
22
Q

Explain how does index template works

A
  • Whenever a new index is created, it is matched against the index pattern of the templates
  • If it matches, the index is created using that template.
23
Q

What are some use cases for aliases?

A
  • Create a filter that return only a subset of the data (like a saved query).
  • Aggregate data (same alias, multiple indexes). TODO: test this and learn more about it.
24
Q

How do you create an alias?

A
POST _aliases
{
 "actions": [
   {
      "add": {
         "index": "< index name >",
         "alias": "< alias name >"
     }
   }
]
}
25
Q

How to access an alias?

A

It behaves the same way as a regular index.

26
Q

How to remove an alias?

A
POST _aliases
{
 "actions": [
   {
      "remove": {
         "index": "< index name >",
         "alias": "< alias name >"
     }
   }
]
}
27
Q

How do you create a filtered alias?

A
POST _aliases
POST _aliases
{
 "actions": [
   {
      "add": {
         "index": "< index name >",
         "alias": "< alias name >",
         "filter": {
              .... < filter definition > ....
          }
     }
   }
]
}
28
Q

What are the 3 main components of indexes?

A
  • Aliases
  • Mappings
  • Settings
29
Q

How do you list indexes?

A

GET _cat/indices?v

30
Q

What happens to fields (mappings) you don’t define when you post new data?

A

The mappings are auto filled.

TODO: Can this be disabled? How to control this?

31
Q

How do you create an index?

A
# empty index
PUT 
# with options
PUT 
{
}
32
Q

Explain dynamic vs explicity mapping

A

TODO

33
Q

Where do the default settings for an index come from?

A

TODO

34
Q

How to index an object?

A
# auto generated id
PUT /_doc
# With a given id
PUT /_doc/
35
Q

How to fetch an object?

A

With metadata

GET /_doc/

Without metadata (source only)

GET /_source/

36
Q

How ids are auto generated for elastic objects?

A

A UUID is generated.

37
Q

What are the 2 types of updates you can perform to an object?

A
  • Doc Update
POST /_update/
{
  "doc": {
      "lastname": "new last name"
  }
}
- Script update
{
   "script": {
      "lang": "painless",
     "source": "ctx._source.remove('field')"
    }
}
38
Q

What scripting languages are supported by elastic?

A
  • painless

TODO

39
Q

How to remove fields from an object?

A

Use the update api with a script.

For example:

ctx._source.remove(‘fieldname’)

40
Q

How do you delete an object from the index?

A

DELETE [INDEX-NAME]/_doc/[ID]

41
Q

What’s the file format for bulk indexing?

A

NDJSON (newline delimited json) (http://ndjson.org/)

first line: metadata
{ “index”: { “_id”: “….” } }

Second line: source
{“field”: …., “field2”: ….}

and so on

42
Q

How to bulk index object?

A

curl -u -k -H ‘Content-Type: application/x-ndjson’ -X POST ‘https://localhost:9200/[index-name]/_bulk?pretty’ –data-binary @file.json > output.json

43
Q

What formats does the bulk api support?

A
  • ndjson

- Apparently that’s the only one (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html)