llm
2 Posts
Monitoring the Context Window in LLM Applications
A 2025 guide to measuring, managing, and gating LLM context usage—tokens, occupancy, truncation, and drift. Practical patterns: slot-based memory, RAG, summaries, hard caps, and provider-aware telemetry.…
Best Choices for Streaming Responses in LLM Applications: A Front-End Perspective
A practical front-end guide to streaming LLM responses—SSE vs WebSockets vs fetch streaming, event protocols, interruptibility, UX patterns, and production SLOs.…