Jianyu Huang's Blog
About
  • Jan 23, 2026

    Open Source Model Quantization Strategies

  • Jan 18, 2026

    MoE Parallel Folding

  • Jan 13, 2026

    Engram: Scaling Large Language Models via Conditional Memory Lookup

  • Jan 6, 2026

    Manifold-Constrained Hyper-Connections

  • Dec 29, 2025

    High-Performance Matmul Kernels on NVIDIA Hopper

  • Dec 28, 2025

    Rollout Routing Replay: Stabilizing MoE Reinforcement Learning

  • Dec 27, 2025

    LLM Agent Memory

  • Dec 26, 2025

    Self-Play SWE-RL: Superintelligent Agents via Autonomous Bug Discovery

  • Dec 25, 2025

    LLMs as Improvement Operators & Parallel-Distill-Refine (PDR)

  • Dec 24, 2025

    Reading Note on Performance Hints

  • Dec 19, 2025

    Reading Note on SonicMoE

  • Dec 15, 2025

    Reading Note on Nvidia Nemotron 3

  • Dec 14, 2025

    Interplay of Training Stages

  • Dec 13, 2025

    Linear Attention: Kimi Delta Attention

  • Dec 11, 2025

    Pure BF16 Training via Stochastic Rounding

  • Dec 10, 2025

    Adaptive NVFP4 Quantization

  • Dec 8, 2025

    LongCat Flash

  • Dec 7, 2025

    MXFP8 Training

  • Dec 6, 2025

    Tokenizer Learning

  • Dec 5, 2025

    LLM Architecture Evolution

  • Dec 2, 2025

    First-Order Approximation for Stable LLM-RL Training

  • Dec 1, 2025

    DeepSeek-V3.2 Reading Note

  • Nov 30, 2025

    vLLM V1 Understanding

  • Nov 29, 2025

    Smol Training Playbook Reading Note

  • Nov 28, 2025

    Infra Math for LLM Training

  • Nov 23, 2025

    Predictable Scaling of Reinforcement Learning for LLMs

  • Nov 22, 2025

    Reasoning Limits of LLMs Under RLVR

  • Nov 21, 2025

    LoRA, Manifolds, and OPD

  • Nov 21, 2025

    NVFP4: Stable 4-Bit Training at 10 Trillion Tokens

  • Aug 4, 2025

    Kimi-K2 Reading Note

  • Jun 25, 2025

    Recent RL Infra Related Papers

  • Jun 23, 2025

    High Precision Used for Reasoning Recipes

  • May 15, 2025

    DeepSeek-V3's Hardware-Aware Design

  • May 5, 2025

    Summary on Llama-Nemotron

  • May 4, 2025

    Summary on StreamRL

  • May 1, 2025

    DAPO Reading Note

  • Mar 30, 2025

    Disaggregate Prefill and Decoding

  • Jan 20, 2025

    Summary on DeepSeek R1 and Kimi k1.5

  • Jan 18, 2025

    Summary on MiniMax-01

  • Dec 30, 2024

    Summary on Zero Bubble

  • Dec 29, 2024

    CUDA H100 GEMM Optimization

  • Dec 28, 2024

    Summary of SemiAnalysis o1 Reasoning Report

  • Dec 27, 2024

    People Retrospective: Communication, Growth, Collaboration, and Challenges

  • Dec 26, 2024

    Summary on DeepSeek V3

  • Dec 23, 2024

    Scaling Law

  • Dec 15, 2024

    Understand Speculative Decoding for LLM Inference

  • Nov 12, 2024

    Notes on Reading Hunyuan Model

  • Nov 11, 2024

    Educational Materials for GEMM Optimizations on CPUs and GPUs

Subscribe

  • Jianyu Huang
  • jianyu0huang@gmail.com

Record the technical thoughts.