Mappings and text analysis Flashcards by Giorgenes G

How do you set the mappings of an index and what is the structure?

PUT < index >
{
   mappings: {
       properties: {
            < field > : {
               type: < type >
}

How well did you know this?

Not at all

Perfectly

How do you set the mappings of an index and what is the structure?

PUT < index >
{
   mappings: {
       properties: {
            < field > : {
               type: < type >
}

How well did you know this?

Not at all

Perfectly

What’s the difference between keyword and text types?

Text is analysed, broken down into individual tokens.

- Keyword is store as is, as a full token. It’s not analysed.

How well did you know this?

Not at all

Perfectly

Name the top elastic data types and its applications.

Numerical

integer, short, long

Floating point

float, double, scaled_float

text

text, keyword

Specific purpose

geo_point
ip
date (stored in utc).

Other

boolean

How well did you know this?

Not at all

Perfectly

How are date types stored in elastic?

Date is always stored as utc.

How well did you know this?

Not at all

Perfectly

What is a text analyser in elastic?

It’s a way to process a string and break it down into token that are used for indexing and searching.

How well did you know this?

Not at all

Perfectly

How do you use the analyse api and what is it for?

Use is for testing analyser outputs given an input.

POST _analyze
{
   analyzer: "standard",
   text: "The 3 QUICK BRown-fox jumped".
}

How well did you know this?

Not at all

Perfectly

How does the “standard” analyser works?

breaks down hyphens
keeps apostrophes (doesn’t assume the text is of any specific language)
lower cases the tokens

How well did you know this?

Not at all

Perfectly

How does the “english” analyser work?

downcase tokens
remove english stop words (THE, of, etc…)
convert words into their base form (stemming)

How well did you know this?

Not at all

Perfectly

What is a STOP WORD?

Common words that are not relevant for searching like “the”, “of”, etc.

How well did you know this?

Not at all

Perfectly

What is stemming?

The process of converting a word into its base form, example: “jumped” -> “jump”.

How well did you know this?

Not at all

Perfectly

What is the “simple” analyser?

splits any non digit letters and punctuation (space, -, ‘, etc)
downcase the words

How well did you know this?

Not at all

Perfectly

What is the “whitespace” analyser?

DOESN’T lowercase. keeps the case
Only splits by white spaces
Keeps punctuaction.

How well did you know this?

Not at all

Perfectly

What are the 3 components of an analyser?

Token filters
Character filters
Tokenizers

How well did you know this?

Not at all

Perfectly

How do you define an analyser?

In the settings section of the index:

PUT < index >
{
     settings: {
         analysis: {
           analyser: {
               "< new analyser name" : {
                    type: "...",
                    tokenizer: "< tokenizer > ",
                     filter: [" < token filter name > "],
                    char_filter : [" < character filter > "
               }
}

TODO: what is the type in the analyser?

How well did you know this?

Not at all

Perfectly

What are some of the most common tokenizers for elastic?

Study These Flashcards

TODO: Check out the documentation.

How do you specify the analyser of a field?

Study These Flashcards

int the mappings properties:

”< field >”: {

    type: ....,
    analyser: "< analyser name >",

How do you define a tokenizer?

Study These Flashcards

In the settings section of the index:

PUT < index >
{
     settings: {
         analysis: {
            "filter": {
               "< tokenizer name > ": {
                     type: "stop",
                     stopwords: "_english_"

TODO: check the documentation on this.

How do you define a new character filter?

Study These Flashcards

In the settings section of the index:

PUT < index >
{
     settings: {
         analysis: {
            "char_filter": {
               "< character filter name > ": {
                     type: "mapping",
                     mappings: [":) => happy", ":( => sad"]

TODO: check the documentation on this.

What is a multi field?

Study These Flashcards

It’s a way to index the same field in different ways, using different analysers.

How do you define a multi field?

Study These Flashcards

{
properties: {
    "< field >": {
        type: ....,
        fields: {
           "< multi field name >": {
               type: ....
           }
}
}
}

How do you reference a multi field in a query?

Study These Flashcards

Use a dot:

for example:

field.subfield

How do setup a field with a nested array (array of objects) ?

Study These Flashcards

In the mappings:

{
< field >: {
type: “nested”
}

What is the problem of not specifying nested arrays?

Study These Flashcards

Arrays of objects are flattened by default.

For example:
field: [
    {a: 1, b: 2},
    {a: 10, b: 20}
]

Effectively becomes:

field. a: [1. 10]
field. b: [2, 20]

So it loses the relationship to the objects and may return confusion search results.

How do you search nested objects?

``` query: { "nested": { "path": < field >, "query": { ..... < actual query > .... ```

How do you specify a relationship between objects (like a join table)?

- Use sparingly as it's not very performant - Use the "join" type ``` { type: "join", relations: { "< parent name >: "< child name >" } } ```

What's the limitation of using join fields?

Connected objects need to be indexed in the same shard, so it when indexing you need to specify "?routing=< id of the parent >", so that the object is routed to the same shard as the parent node.

How do you index the parent and child object of a join document?

{ < field >: { "name": < relationship field >" } example: ``` # parent PUT < index >/_doc/< parent id > { "qa": { "name": "question" ``` ``` # child PUT < index >/_doc/< child id >?routing=< parent id > { "qa": { "name": "answer", "parent": "< parent id >" ```

Mappings and text analysis Flashcards

(28 cards)