Sign Language Flashcards

1
Q

Video recognition: what is the problem with convnet + pooling of features

A

No temporal info

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are main type of networks for video recognition?

A

2d ore train convnet with temporal aggregation via pooling or lstm
3 D model

One of the best is i3d an inflated model created from 2d inception model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an important data preprocess on video recognition?

A

A subsample from the 25 video frame per second to something like 2 -5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the problem with 3d video recognition models?

A

They have a lot of parameters and so they are hard to train so usually they use shallow architectures.
The video are usually subsample both in pixels resolution and time.
also look at temporal strides

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What feature were used in Oscar paper?

A

Full body
Hands
Mouth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sign paper: what models did they use to embed each feature?

A

I3d for video
Avhubert for non manual sign
Deep hand on hands

How well did you know this?
1
Not at all
2
3
4
5
Perfectly