Mappings and text analysis Flashcards

1
Q

How do you set the mappings of an index and what is the structure?

A
PUT < index >
{
   mappings: {
       properties: {
            < field > : {
               type: < type >
}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

How do you set the mappings of an index and what is the structure?

A
PUT < index >
{
   mappings: {
       properties: {
            < field > : {
               type: < type >
}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What’s the difference between keyword and text types?

A
  • Text is analysed, broken down into individual tokens.

- Keyword is store as is, as a full token. It’s not analysed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name the top elastic data types and its applications.

A

Numerical

integer, short, long

Floating point

float, double, scaled_float

text

text, keyword

Specific purpose

geo_point
ip
date (stored in utc).

Other

boolean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How are date types stored in elastic?

A

Date is always stored as utc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a text analyser in elastic?

A

It’s a way to process a string and break it down into token that are used for indexing and searching.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you use the analyse api and what is it for?

A

Use is for testing analyser outputs given an input.

POST _analyze
{
   analyzer: "standard",
   text: "The 3 QUICK BRown-fox jumped".
}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does the “standard” analyser works?

A
  • breaks down hyphens
  • keeps apostrophes (doesn’t assume the text is of any specific language)
  • lower cases the tokens
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does the “english” analyser work?

A
  • downcase tokens
  • remove english stop words (THE, of, etc…)
  • convert words into their base form (stemming)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a STOP WORD?

A

Common words that are not relevant for searching like “the”, “of”, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is stemming?

A

The process of converting a word into its base form, example: “jumped” -> “jump”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the “simple” analyser?

A
  • splits any non digit letters and punctuation (space, -, ‘, etc)
  • downcase the words
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the “whitespace” analyser?

A
  • DOESN’T lowercase. keeps the case
  • Only splits by white spaces
  • Keeps punctuaction.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the 3 components of an analyser?

A
  • Token filters
  • Character filters
  • Tokenizers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you define an analyser?

A

In the settings section of the index:

PUT < index >
{
     settings: {
         analysis: {
           analyser: {
               "< new analyser name" : {
                    type: "...",
                    tokenizer: "< tokenizer > ",
                     filter: [" < token filter name > "],
                    char_filter : [" < character filter > "
               }
}

TODO: what is the type in the analyser?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are some of the most common tokenizers for elastic?

A

TODO: Check out the documentation.

16
Q

How do you specify the analyser of a field?

A

int the mappings properties:

”< field >”: {

    type: ....,
    analyser: "< analyser name >",
17
Q

How do you define a tokenizer?

A

In the settings section of the index:

PUT < index >
{
     settings: {
         analysis: {
            "filter": {
               "< tokenizer name > ": {
                     type: "stop",
                     stopwords: "_english_"

TODO: check the documentation on this.

18
Q

How do you define a new character filter?

A

In the settings section of the index:

PUT < index >
{
     settings: {
         analysis: {
            "char_filter": {
               "< character filter name > ": {
                     type: "mapping",
                     mappings: [":) => happy", ":( => sad"]

TODO: check the documentation on this.

19
Q

What is a multi field?

A

It’s a way to index the same field in different ways, using different analysers.

20
Q

How do you define a multi field?

A
{
properties: {
    "< field >": {
        type: ....,
        fields: {
           "< multi field name >": {
               type: ....
           }
}
}
}
21
Q

How do you reference a multi field in a query?

A

Use a dot:

for example:

field.subfield

22
Q

How do setup a field with a nested array (array of objects) ?

A

In the mappings:

{
< field >: {
type: “nested”
}

23
Q

What is the problem of not specifying nested arrays?

A

Arrays of objects are flattened by default.

For example:
field: [
    {a: 1, b: 2},
    {a: 10, b: 20}
]

Effectively becomes:

field. a: [1. 10]
field. b: [2, 20]

So it loses the relationship to the objects and may return confusion search results.

24
Q

How do you search nested objects?

A
query: {
     "nested": {
       "path": < field >,
       "query": {
         ..... < actual query > ....
25
Q

How do you specify a relationship between objects (like a join table)?

A
  • Use sparingly as it’s not very performant
  • Use the “join” type
{
   type: "join",
   relations: {
      "< parent name >: "< child name >"
   }
}
26
Q

What’s the limitation of using join fields?

A

Connected objects need to be indexed in the same shard, so it when indexing you need to specify “?routing=< id of the parent >”, so that the object is routed to the same shard as the parent node.

27
Q

How do you index the parent and child object of a join document?

A

{
< field >: {
“name”: < relationship field >”
}

example:

# parent
PUT < index >/_doc/< parent id >
{
    "qa": {
         "name": "question"
# child
PUT < index >/_doc/< child id >?routing=< parent id >
{
    "qa": {
         "name": "answer",
         "parent": "< parent id >"