Optimizing LLMS for efficient inference on Edge

References

2024

  1. err_sens_plot.png
    Optimized Transformer Models: ℓ′ BERT with CNN-like Pruning and Quantization
    Muhammad Hamis Haider, Stephany Valarezo-Plaza, Sayed Muhsin, and 2 more authors
    In 2024 IEEE International Symposium on Circuits and Systems (ISCAS), 2024