Kaoken explores the performance of handcrafted layer implementation of common PyTorch layers.
The results show that for smaller models, using these "baked" layers enables real time inference without the need for a GPU.
ore details in the README.
2 comments
Show HN: Speed up model inference on CPU with hand crafted layer implementations | Heykuki News