general Flashcards

Question 1

Q

What is the latency of various artifacts?

Answer

A

emrs are hashtables so they are constant time,
fsts are linear in utt length,
and pmrs are very very slow

Question 2

Q

what are the only 2 supported two model variants/endpoints? How is traffic splitted?

Answer

A

Control and Treatment. Control models received 99% of production traffic. Treatment models received 100% of beta population traffic plus 1% of production traffic (randomly selected utterances from general population via Project Guardian). This single Treatment set of models was deployed monolithically, shared by all domains, and used both for testing new features and verifying Control candidates

Question 3

Q

What is the The Parallel Weblabs and what does it consists of?

Answer

A

The program consists of two main phases:

Multiple Treatment Models and Nanobots Experimentation.
Multiple Treatment Models as first part of the Parallel Weblabs program will leverage the current NLU Service architecture to enable experimentation.

Nanobots Experimentation will leverage the Nanobots architecture to run multiple versions of machine learning models in a horizontally scalable fleet.

Question 4

Q

What are eeep data?

Answer

A

Like CLUE but collected after the feature is released already. ADS DL sits down in a room with a device and try to interact using same feature. For this reason is not critical

Question 5

Q

What are CLUE?

Answer

A

Clue data is non-critical and typically stored on S3. like EEEP so collected by ads DLS but before a feature goes out

Question 6

Q

What are CLEO data?

Answer

A

CLEO comes from an alexa skills asking people to say thing to the device. For this reason is critical.

Previously stored in a deprecated s3 bucket, now accessible from CleoView.

Question 7

Q

Can you concatenate data set in dory?

Answer

A

YES

A new data loader type — DATA_CONCATENATION — is now available to combine a list of processed annotations into one single file.

With this data loader, you can concatenate different data sets with the same data format into a single file that can be later processed and used in a modifier that utilizes the DATA & PROCESSORS scheme: “modify_exact_match_rules”, “add_training_data”, “add_synthetic_reranker_training_data”, and “add_test_data”.

“data”: {
“goldens_garbage_concatenation”: {
“annotations”: {
“type”: DATA_CONCATENATION,
“data_sets”: [“goldens”, “garbage”],
“data_format”: “tsv”
},
PROCESSOR_ORDER: [‘no_dip’]
},
“goldens”: {
“annotations”: CONFIG_GOLDEN,
PROCESSOR_ORDER: [
‘dip_train’,
‘remove_partial’
]
},
‘garbage’: {
‘annotations’: CONFIG_GARBAGE,
PROCESSOR_ORDER: [“no_dip”,],
},
}
PROCESSORS: {
“no_dip”: {PROCESSOR_TYPE: NO_DIP},
“remove_partial”: {
PROCESSOR_TYPE: FILTER_NLU_ANNOTATIONS,
TOKEN_REGEXES: (“^-.”, “.-$”),
REVERT_MATCH: True
},
}

Question 8

Q

Where are DIP stored?

Answer

A

In BluDIP

Question 9

Q

Where can you find enum_converters?

Answer

A

In BluModel data