ARGUS: Production-Scale Tracing and Performance Diagnosis for 10,000+ GPU Clusters
Weekly Paper Notes — one of the top picks from the 2026-06-20 CS paper digest. Area: Distributed Computing. Authors: Jiasheng Zhou, Longbin Zeng, Clavis Chen, Ruiming Lu et al. arXiv: 2606.20374 · PDF TL;DR ARGUS is a tracing and performance-diagnosis system designed for always-on operation on production LLM training clusters with more than 10,000 GPUs. The central insight is that no single profiler can be cheap, deep, and continuous all at once — so ARGUS decomposes observation along the training call hierarchy into three independent collection channels: CPU call stacks, framework semantics, and GPU kernel execution....
Building Software Systems at Google and Lessons Learned — Jeff Dean (Stanford, 2010)
This is the talk every backend engineer eventually watches. Jeff Dean walks Stanford’s distinguished lecture audience through eleven years of evolution in Google’s search infrastructure — from a single-machine inverted index in 1999 to a planet-scale system serving thousands of queries per second with sub-second updates. The value of the lecture isn’t the specific numbers; it’s how he reasons about each rewrite as a response to one constraint becoming unbearable, and the design patterns that survived across seven major rewrites....
Sierra's Voice Agent Architecture — Zach Reneau-Wedeen on Modular Multi-Model Pipelines
Sierra powers customer-experience voice agents for a large chunk of the Fortune 20, and in this Interrupt-26 conversation Zach Reneau-Wedeen (Head of Product) walks through what their production agent harness actually looks like. The headline: a voice agent in production does not look like the canonical “LLM-in-a-loop calling tools” diagram everyone draws on whiteboards. It looks like a multi-model ensemble pipeline with speculative execution baked in. “Coding agents are good at file systems — let’s materialize everything into a file system” The opening framing is a useful contrarian take: coding agents have a runaway lead on capability because they happen to operate on substrates — file systems, Git, grep — that the underlying models were already extremely good at....
The Bi-Channel Networking Paradigm for Database Systems in the Cloud
Weekly Paper Notes — one of the top picks from the 2026-06-20 CS paper digest. Area: Databases / Systems. Authors: Georg Kreuzmayr (TigerBeetle), Muhammad El-Hindi (TUM), Benjamin Wagner (Firebolt), Tobias Ziegler (TigerBeetle), Viktor Leis (TUM) arXiv: 2606.19969 · PDF TL;DR For two decades distributed database systems treated the network as an opaque, kernel-managed pipe and the kernel TCP stack was fast enough that this abstraction was free. It isn’t anymore....
The Google File System (2003)
Seminal Paper of the Week — the paper that quietly defined what “cloud storage” looks like from the inside. Authors: Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung (Google) Published: SOSP ‘03 — 19th ACM Symposium on Operating Systems Principles, October 2003. Canonical link: The Google File System (Google research mirror) · ACM DOI 10.1145/945445.945450 TL;DR In 2003, Ghemawat, Gobioff and Leung described how Google was running a multi-thousand-node, petabyte-scale distributed file system on commodity hardware — and how the design assumptions diverged so sharply from the established POSIX-file-system lineage that almost every architectural decision in the paper looks like a heresy until you read the workload section....
Why AI Labs With Unlimited GPUs Still Fail — Anjney Midha on Culture, Mission, and Execution
Anjney Midha (AMP, formerly a16z, board member at several frontier labs) sits down with Latent Space for an hour on a question that wouldn’t have made sense in 2023: why are well-funded AI labs with all the compute they need failing to ship? His answer isn’t compute, it isn’t talent density, and it isn’t model architecture — it’s culture, mission alignment, and the boring details of execution. The diagnosis: culture, not capital Midha opens with the observation that has been circulating quietly inside frontier-lab boards for months: many of the best-funded labs of the 2024–2025 cohort have all the cash and all the compute they need and still can’t ship competitive models....
AgileOS: A GPU Operating System Layer for Protected CUDA Services
Weekly Paper Notes — one of the top picks from the 2026-06-13 CS paper digest. Area: Operating Systems / Systems. Authors: Zhuoping Yang, Yiyu Shi, Alex Jones arXiv: 2606.06697 · PDF TL;DR The GPU has quietly become a multi-tenant device — applications no longer just dispatch compute kernels, they call into vendor libraries (cuFFT, cuBLAS, NCCL), interact with GPU-resident services, and touch storage and network adapters through GPUDirect paths. But the CUDA programming model still hands each process the full keys to the device: its own context, raw device pointers, runtime handles, module loader, and direct kernel launch....
Clipping Makes Distributed and Federated Asynchronous SGD Robust to Stragglers
Weekly Paper Notes — one of the top picks from the 2026-06-13 CS paper digest. Area: Distributed Computing. Authors: Samuel Erickson, Mikael Johansson (KTH) arXiv: 2606.13287 · PDF TL;DR In asynchronous SGD (ASGD), workers compute gradients on possibly stale parameters and push updates without waiting for slow peers. That’s how you keep all the GPUs busy, but it’s also how slow workers (“stragglers”) inject large delays into the update stream, which classical analyses say should slow convergence in proportion to the maximum delay across the workers....
End-to-End Arguments in System Design (1984)
Seminal Paper of the Week — a foundational systems paper that quietly shapes how every distributed system you use is layered. Authors: Jerome H. Saltzer, David P. Reed, David D. Clark (MIT) Published: ACM Transactions on Computer Systems 2(4), November 1984. Canonical link: End-to-End Arguments in System Design (MIT) · ACM DOI 10.1145/357401.357402 TL;DR The end-to-end argument is a layering principle: a function should be implemented in a lower layer of a system only when it can be completely and correctly implemented at that layer, and when implementing it there provides a clear performance benefit over implementing it at the endpoints....
RAG Is Dead, Right? Why Hybrid, Tool-Rich Retrieval Is the New Default for Agentic Search
Kuba Rogut, deployed engineer at Turbopuffer, gave one of the more refreshingly direct takes on the “RAG is dead” meme that’s been making the rounds on X. His argument, in one sentence: RAG isn’t dead — what’s dead is the strawman version where RAG means “embed everything once, run a single vector lookup, dump it into the LLM context.” The actual frontier is hybrid, tool-rich retrieval, where embeddings, BM25, grep, glob, regex, and filters are all tools an agent can compose iteratively....