Show HN: Open Responses – Drop-In OpenAI Responses API Alternative for Any LLM

13 points

a year ago

Hello HN! I just open-sourced Open Responses, a self-hosted implementation of OpenAI’s new Responses API that works with any LLM backend. It lets you self-host your own server compatible with the official API, but you’re free to plug in Claude, Qwen, R1, or other models. It’s a drop-in replacement.

The motivation: I wanted to use the awesome new agents SDK & the Responses API from OpenAI but with other models, including locally.

To try it out, just run:

  npx -y open-responses init
  # or uvx/pipx

This is an early release, and I’d really love feedback and help from the community to make it better.

Docs: https://docs.julep.ai/responses Repo: https://github.com/julep-ai/open-responses

Why:

  - Use any model
  - Self-hosted and Private
  - Drop-in compatibility
  - Easy to deploy via docker-compose or our CLI
  - Built-in support for tools

Whenever the model wants to use a tool (like web_search), the server automatically executes those with open & pluggable alternatives (e.g. Brave Search API).

For example, once you have the service up, to use Agents SDK with Claude, you can do:

  from openai import AsyncOpenAI
  from agents import set_default_openai_client

  # Create and configure the OpenAI client
  custom_client = AsyncOpenAI(base_url="http://localhost:8080/", api_key="your_api_key")
  set_default_openai_client(custom_client)

  from agents import Agent, Runner
  
  agent = Agent(
      name="Test Agent",
      instructions="You are a helpful assistant.",
      model="claude-3.5-sonnet",
  )
  
  result = await Runner.run(agent, "Hello! Are you working correctly?")

This will route the call to Claude (through its API) and return the summary, just as if OpenAI’s API handled it!

Roadmap:

  - Full support for streaming, and voice agents.
  - Add file search integration using pgvector.
  - Support more models out-of-the-box.
  - Helm chart for Kubernetes.

Please give feedback! I’d love to know what features or improvements are most important. For example, is fine-tuning or vector DB integration something you’d want? Does this sound useful for your projects?

Thanks for reading, and I hope some of you will try it out or even contribute!

5 comments

from openai import AsyncOpenAI from agents import set_default_openai_client # Create and configure the OpenAI client custom_client = AsyncOpenAI(base_url="http://localhost:8080/", api_key="your_api_key") set_default_openai_client(custom_client) from agents import Agent, Runner agent = Agent( name="Test Agent", instructions="You are a helpful assistant.", model="claude-3.5-sonnet", ) result = await Runner.run(agent, "Hello! Are you working correctly?")