The Problem: Currently, if a teacher (or lead dev) wants 50 students (or junior devs) to use an LLM with a specific, deep context (e.g., a 50-page curriculum or a complex repo), all 50 users have to re-upload and re-tokenize that context. It’s redundant, expensive, and forces everyone to have a high-tier subscription.
The Solution: USST allows a "Sponsor" (authenticated, paid account) to run a Deep Research session once and mint a signed Context Token. Downstream users (anonymous/free tier) pass this token in their prompt. The provider loads the pre-computed KV cache/context state without re-processing the original tokens.
Decouples payment from utility: Sponsor pays the heavy compute; Users pay the inference. Privacy: Users don't need the Sponsor's credentials, just the token. Efficiency: Removes the "Linear Bleed" of context re-computation.
I wrote up the full architecture and the "why" here: https://medium.com/@madhusudan.gopanna/the-8-6-billion-oppor...
The Protocol Spec / Repo is the main link above.
Would love feedback on the abuse vectors and how this fits with current provider caching (like Anthropic’s prompt caching).