Abhimanyu Talwar's website

Quantization algorithms and tradeoffs for efficient LLM inference.

Benefits of RLHF and mathematical foundations of learning algorithms for preference data.

My notes and CUDA kernels for joint shared memory and register tiling for matrix multiplication.

Notes about positional embeddings choices such as sinusoidal and RoPE.

Calculations behind some integrals presented in the seminal paper “Auto-Encoding Variational Bayes”

SVD as a tool to explain what L2 Regularization ‘does’ for Linear Regression.

I derive equations for Backpropogation-Through-Time (BPTT) for an LSTM and implement one using NumPy.

I derive equations for Backpropogation-Through-Time (BPTT) for an RNN and implement one using NumPy.

Interpreting matrix factors provided by SVD.

I show gradient calculations for a BatchNorm layer.

I show gradient calculations for Softmax.