Pruning for Large Language Models — From SparseGPT to KV-Cache Pruning31 March 2026AI Accelerator Pruning LLM SparseGPT Wanda Model Compression Sparsity KV-Cache Transformer Inference Optimization Structured Pruning Unstructured Pruning 2:4 Sparsity SliceGPT Attention Head Pruning Dynamic Sparsity
Post-Training Quantization (PTQ): A Comprehensive Deep Dive31 March 2026AI Accelerator Quantization PTQ Model Compression Inference Optimization TensorRT GPTQ SmoothQuant AWQ LLM Edge Deployment
RNN - LSTM - LLM Summary21 June 2024Artificial Intelligence Deep Learning Basic RNN LSTM LLM Transformer