Sign Language Flashcards

Question 1

Q

Video recognition: what is the problem with convnet + pooling of features

Answer

A

No temporal info

Question 2

Q

What are main type of networks for video recognition?

Answer

A

2d ore train convnet with temporal aggregation via pooling or lstm
3 D model

One of the best is i3d an inflated model created from 2d inception model

Question 3

Q

What is an important data preprocess on video recognition?

Answer

A

A subsample from the 25 video frame per second to something like 2 -5

Question 4

Q

What is the problem with 3d video recognition models?

Answer

A

They have a lot of parameters and so they are hard to train so usually they use shallow architectures.
The video are usually subsample both in pixels resolution and time.
also look at temporal strides

Question 5

Q

What feature were used in Oscar paper?

Answer

A

Full body
Hands
Mouth

Question 6

Q

Sign paper: what models did they use to embed each feature?

Answer

A

I3d for video
Avhubert for non manual sign
Deep hand on hands

Sign Language Flashcards

(6 cards)