Assistant transcriptions being used in LLM calls Vapi AI #support

Assistant transcriptions being used in LLM calls

lucalp

09/27/2024, 2:51 PM

Hey! I'm analysing some of my VAPI call logs and I'm noticing that

bot

messages sent to the LLM seem like they are being transcribed from the TTS-produced audio? Is this the case? It's causing me quite a bit of pain when attempting to produce certain types of behaviour w.r.t TTS pronounciation.

Vapi

09/27/2024, 2:53 PM

@lucalp Hello! I can assure you that the messages sent to the Language Learning Model (LLM) are not transcribed from the Text-to-Speech (TTS) audio. The LLM receives the text input directly as it is provided. If you're experiencing issues with TTS pronunciation, it might be related to the way the text is being formatted or the specific phonetics of the words in question. If you need assistance with improving TTS pronunciation or have any other concerns, please let me know, and I'll do my best to help you out!

Vapi

09/27/2024, 2:53 PM

You can continue this conversation by mentioning me in the message.

Vapi

09/27/2024, 2:55 PM

Got it. Tagging @User for additional help.

Shubham Bajaj

09/28/2024, 1:37 PM

can you explain your issue with expected vs actual responses along with call id and timestamp.

lucalp

09/30/2024, 10:43 AM

The issue is that things that are generated by the LLM are being then passed in the next turn with transcription errors. An example is

4eb67fd4-3baf-447b-9a8f-a3eb33b7a649

where an entire word is missing in the message timestamped as

1724776190900.997

. I'm going off of the fact that the previous turn of the LLM didn't generate that text since it's not what gets synthesised by the TTS.

Shubham Bajaj

10/01/2024, 5:55 AM

set in your assistant config modelOutputInMessagesEnabled to true. This determines whether the model's output is used in conversation history rather than the transcription of assistant's speech.

lucalp

10/01/2024, 1:44 PM

Great thanks, will give this a go - out of curiosity, what's the rationale for using transcription over output by default?

Shubham Bajaj

10/02/2024, 4:14 PM

It's the actual output which gets voiced out and differs from llm generated response, even llm responses are transformed for voice providers.

Ashhar

09/07/2025, 1:17 PM

@Shubham Bajaj Which model is used to transcribe the agent speech? We're actually seeing lots of errors in the agent's transcribed text, which is giving us a wrong picture of the call quality from the transcript. Also, is this same transcribed input fed back to LLM in the next turn or it's only to display the transcript in logs?

Ashhar

09/09/2025, 6:01 AM

@Vapi Can you assign this issue to a team member?

Aniah

09/10/2025, 3:15 AM

The model used for transcribing the agent speech is what is configured as the transcriber. So if your assistant uses 11Labs as the transcriber, 11Labs is transcribing the user speech as well as the agent speech. This is the behavior observed when

modelOutputInMessagesEnabled

is false. So the answer to your question is yes, it is fed back to LLM when the setting is false, but wont be fed back if it's true.

2 Views

Previous Next