\(\rightarrow\) Transformer series
- Why is it called KV cache: and not QKV cache
- RoPE: details, block expansion, and code
- Scalable Softmax
- [Not Ready!] Some trends to speed up autoregressive inference of LLMs (unifinished).
\(\rightarrow\) Some CUDA/C++ learning notes:
- GPU architecture and warp scheduling.
- Occupancy, Compute intensity, and Tiling.
- DRAM banks and why it matters for code optimization.
\(\rightarrow\) Random C++:
\(\rightarrow\) Technical
- Handling checkpoints from terminal - some useful tricks.
- Quick reset of my compute pod.
- Speed up your migration to VIM.
\(\rightarrow\) Math:
Some GitHub repos:
- Training ML models with CUDA/C++: \(\rightarrow\) Github link.
- GPT2 factorized with (multiple) Kronecker Factors: \(\rightarrow\) Github link.
- Backpropagation from scratch \(\rightarrow\) Github link.
- Randomized Algorithms \(\rightarrow\) Github link.
- Reinforcement Learning \(\rightarrow\) Github link.
- I’m based in Passau, Germany.
- I graduated with a software engineering master’s degree in 2019, and currently pursuing a second Master’s degree.
- Gmail:
ayoub.benayad.467
; LinkedIn; Instagram.