TinyML Flashcards
What is TinyML?
Running ML at less than one milliwatt.
This means that TinyML
1) Enables battery or energy-harvesting devices
2) Power constraints mean we can’t be connected most of the time
3) Scales to trillions of cheap, independent sensors
4) Requires code that can run just kilobytes of memory
Challenges
Deeplearning is everywhere but they are computationally costly. How to make them light-weighted and fast?
We have several models for vision, language and multimodal.
The error rates for image classification has been reduced drastically over the past years but it came at a cost because of using large computation power with large amount of data . According to a graph, it’s almost 9 billion MACs.
Our goal is to reduce computing without reducing accuracy performance and deploy it at a lower cost e.g. less than 1 billion MACs.
Why do we need Model Compression? Why do we need to make AI more efficient?
The computing capacity of modern TPUs (Tensor Processing Unit) is less than the model size which requires billions of parameters. This is the supply and demand problem. This gap will be larger if we don’t compress models since number of parameters size is going faster than what a hardware can offer.
How can we compress a model?
We can apply different techniques such as sparsity, pruning, quantization and build efficient hardware that can run compressed models and we don’t have to decompress and can run it directly on the device. Deep Compression and EIE (Electronics & Instrumentation Engineering) has opened up new opportunity for building hardware accelerator for sparse and compressed neural networks.
What’s the impact of pruning and sparsity?
Publications on pruning and sparse neural networks have increased since 2015 and it’s rapidly growing. This also resulted in a real world application i.e. design of NVIDIA Ampere Sparse Tensor Core.
How much computing power is behind several of the vision models?
+ The error rate has been drastically decreased with the help of large scale data, powerful computing and advanced algorithms.
+ Amazing applications e.g. categorizing photos on phone but it has limiting computing power around 10 watts of computer.
+ on device pose estimation
+ IOT devices e.g. microcontrollers are pretty small and low powered. How can we make models small enough to be used on such devices? e.g. person detection, if a stranger is at home.
+ On device training to customize a model e.g. sometimes we don’t have a connection to the internet.
+ On device training can enable better privacy, lower cost, customization, life-long learning. Training is more expensive than inference and it is hard to fit it on limited memory
+ Prompt segmentation e.g. you can give an instruction to segment anything on the image based on a pointer. (Segment Anything Model)
How to accelerate from 12 images per second to 800 images per second?
+ Efficient Image Generation: GAN compression in video e.g. alternating horse to zebra or photo editing on mobile phones or tablets.
+ Stable Diffusion: Masking objects in images
+ Latency and memory is the good metric rather than reduction of the MAC (Multiply and Accumulate)
+ Creating personalized images based on user specified inputs
Computationally expensive, limited composability, limited editability
+ 3D Generation: Diffusion models create realistic videos from a natural language description. 5.6 B model size i.e. parameters.
Discrimitive Model vs Generative Model
Discrimitive models are the ones where you e.g. give an image and predict something.
Generative Models means that we can generate new images and not only just describe what’s in an image.
And we can give a natural language prompt to generate an image.