Server Response Format
# support
a
Call ID: ed43ae5a-2a31-4a17-80d3-ae5c9ef5028d Dear Vapi Support: This question is in regard to a Custom LLM implementation. The documentation (1) states: 5. Receive and Format Response: The response from OpenAI, containing the generated text, is received and formatted according to Vapi’s expected structure. 6. Send Response to Vapi: The formatted response is sent back to Vapi as a JSON object." What format am I supposed to send it back as? I hear the initial "Hello." message. My server receives the followup message from Vapi. However, when I send a response, the message isn't played back to me. I have tried returning a response in the same format as returned by OpenAI; however, I don't hear the agent's message played back. (1) https://docs.vapi.ai/customization/custom-llm/using-your-server
v
To send a response back to Vapi in a Custom LLM implementation, format your server's reply as a JSON object with the generated message in a field like "content" or "text". For example:
Copy code
json
{
  "content": "Your generated message here"
}
or
Copy code
json
{
  "text": "Your generated message here"
}
If one doesn't work, try the other, as the documentation does not specify a single required field name. Source: - [Vapi Custom LLM server response format](https://docs.vapi.ai/customization/custom-llm/using-your-server)
a
When I look at the logs, I don't see that Vapi is receiving my response. Please let me know if you have any guidance on this.
This is my code: import { once } from "node:events"; import * as https from "node:https"; import { readFileSync } from "node:fs"; import { IncomingMessage, ServerResponse } from "node:http"; import { OpenAI } from "openai"; import { StreamBuffer } from "./stream_buffer.js"; const { OPENAI_API_KEY = "", } = process.env; const openAI = new OpenAI({ "apiKey": OPENAI_API_KEY }); const options = { key: readFileSync("/home/farar/.certs/farar_net.key"), cert: readFileSync("/home/farar/.certs/farar_net.pub"), }; const server = https.createServer(options); server.on("error", console.error); server.on("request", (req: IncomingMessage, res: ServerResponse & { req: IncomingMessage; }) => { void (async () => { const url = new URL(
https://farar.net${req.url ?? ""}
);; if (!(url.pathname == "/chat/completions" && req.method == "POST")) { res.writeHead(404); res.end(); return; } const streamBuffer = new StreamBuffer(); req.pipe(streamBuffer); await once(req, "end"); const requestBody = JSON.parse(streamBuffer.buffer.toString("utf-8")); console.log("requestBody", requestBody); const { model, messages, temperature, max_tokens } = requestBody; const data = { model: model, messages: messages, temperature: temperature, max_completion_tokens: max_tokens } as OpenAI.Chat.Completions.ChatCompletionCreateParamsNonStreaming; console.log("data", data); const responseBody = JSON.stringify(await openAI.chat.completions.create(data)); res.setHeader("Content-Length", Buffer.byteLength(responseBody)); res.setHeader("Content-Type", "application/json"); res.end(responseBody); console.log("responseBody", responseBody); })(); }); server.listen(8443, "0.0.0.0");
p
Do you have ngrok or something similar running? You also need to stream the response in chunks.
a
@Pigeon Thank you for providing guidance. I am not using ngrok. The Node server is binding to a public interface on the server - the server is a VPS on the public network. I tried streaming the response without ending the HTTP connection - that didn't seem to work either. Are you aware of a Node.js example that I can study?
For clarification, I receive the request for Vapi and the OpenAI request completes normally; however, when I send the response back to Vapi, I don't hear the Voice - and the response doesn't show up in the logs.
p
Ok, I understand your problem. I don't know about node example, but https://github.com/VapiAI/example-custom-llm python one works well, you need to return same chunks as openAI returns. It's python, but its not that different:
Copy code
chunk = ChatCompletionChunk(
            id=f"chatcmpl-{call_id}",
            created=int(time.time()),
            model=groq_settings.GROQ_MODEL_LLAMA_70_V,
            choices=[ChunkChoice(index=0, delta=delta, finish_reason=finish_reason)],
        )
It returns chunks like this:
return f"data: {json.dumps(chunk.model_dump())}\n\n"
You can read about streaming response here: https://platform.openai.com/docs/guides/streaming-responses?api-mode=chat
You can chose JS for code in openai docs
you need to return deltas
chunk.choices[0].delta.content
My initial delta looks like this:
Copy code
{
  "id": "chatcmpl-x",
  "object": "chat.completion.chunk",
  "created": 1747224956,
  "model": "llama-3.3-70b-versatile",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": "assistant",
        "content": "",
        "refusal": null,
        "tool_calls": null,
        "function_call": null
      },
      "logprobs": null,
      "finish_reason": null
    }
  ]
}
Chunk with content:
Copy code
{
  "id": "chatcmpl-x",
  "object": "chat.completion.chunk",
  "created": 1747224966,
  "model": "llama-3.3-70b-versatile",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": "LLM RESPONSE TEXT",
        "refusal": null,
        "tool_calls": null,
        "function_call": null
      },
      "logprobs": null,
      "finish_reason": null
    }
  ]
}
Finally chunk with finish_reason: stop
Copy code
{
  "id": "chatcmpl-x",
  "object": "chat.completion.chunk",
  "created": 1747224966,
  "model": "llama-3.3-70b-versatile",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": null,
        "refusal": null,
        "tool_calls": null,
        "function_call": null
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ]
}
And this works
I mean just give chat gpt this info and ask to update your code 🙂
a
@Pigeon I'm really grateful for your help. It is working perfectly now that I am using streaming.
p
Nice! Happy building!
a
This is the implementation that is working (which I adapted based on your guidance):
Copy code
/* eslint-disable @typescript-eslint/no-unsafe-assignment */
import { once } from "node:events";
import * as https from "node:https";
import { readFileSync } from "node:fs";
import { IncomingMessage, ServerResponse } from "node:http";
import { OpenAI } from "openai";
import { StreamBuffer } from "./stream_buffer.js";

const {
  OPENAI_API_KEY = "",
} = process.env;


const openAI = new OpenAI({ "apiKey": OPENAI_API_KEY });

const options = {
  key: readFileSync("/home/farar/.certs/farar_net.key"),
  cert: readFileSync("/home/farar/.certs/farar_net.pub"),
};

const server = https.createServer(options);

server.on("error", console.error);

let mutex = Promise.resolve();

server.on("request", (req: IncomingMessage, res: ServerResponse & { req: IncomingMessage; }) => {

  mutex = (async () => {
    await mutex;
    const url = new URL(`https://farar.net${req.url ?? ""}`);;
    if (!(url.pathname == "/chat/completions" && req.method == "POST")) {
      res.writeHead(404);
      res.end();
      return;
    }

    const streamBuffer = new StreamBuffer();

    req.pipe(streamBuffer);

    await once(req, "end");

    const requestBody = JSON.parse(streamBuffer.buffer.toString("utf-8"));

    const { model, messages } = requestBody;

    const data = {
      model: model,
      messages: messages,
      temperature: 1,
      stream: true
    } as unknown as OpenAI.Chat.Completions.ChatCompletionCreateParamsStreaming;
    
    const stream = await openAI.chat.completions.create(data);

    res.setHeader("Content-Type", "text/event-stream");
    res.statusCode = 200;

    for await (const chunk of stream) {
      const responseBody = `data: ${JSON.stringify(chunk)}\n\n`;
      console.log(responseBody);
      res.write(responseBody);
    }
    res.end("data: [DONE]\n\n");
  })();
});

server.listen(8443, "0.0.0.0");
v
Awesome solution!
a
The streaming approach seems to work well; however, does anyone know if it is possible to return the entire chat completion chunk in a single response (and have it processed by Vapi)?
v
Hey, Could you please schedule a brief call at your convenience so we can discuss this matter in detail? Kindly use the following link to book a suitable time: .
a
Kyle: It appears that Vapi expects a streaming chunk response that is modeled using the OpenAI model. We would like to use Anthropic. Does Vapi provide a function that will convert the Anthropic streaming responses to the OpenAI format? I can transform the messages from Anthropic's model to OpenAI's model; however, I just wanted to check if this work has already been done by Vapi.
I scheduled the meeting as you suggested. The earliest meeting availability is in late June.
I resolved the race condition in the mvp.
v
Glad to hear your issue was resolved. Please reach out to us if you have any other questions regarding this issue
10 Views