### Answer
When addressing the issue of "9:00 am" being read as "9 double 0", it's important to understand the components involved in processing and generating speech responses in Vapi. The process typically involves a transcription step (Speech-to-Text, STT) followed by processing through a Large Language Model (LLM), and finally converting text responses back to speech (Text-to-Speech, TTS) [(1)](
) .
If the issue lies in how "9:00 am" is being interpreted or generated in the transcript, it could be related to the STT component. However, if "9:00 am" is being correctly transcribed but is incorrectly articulated in the final speech output, the issue might be with the TTS component or the way the model generates responses based on the text input.
Vapi acts as an orchestration layer over STT, LLM, and TTS providers, enabling developers to configure and customize these components [(2)](
) . This means you have control over various aspects of how input is processed and how responses are generated and delivered.
To address this issue, you may consider examining and adjusting:
1. The
transcriber settings to ensure accurate transcription of speech to text, focusing on how numbers and time are recognized .
2. The
model configuration, specifically the system prompt and how it instructs the LLM to process and generate responses based on the transcribed text .
3. The
voice settings to adjust how the TTS component articulates the generated text responses, ensuring proper pronunciation and format of time expressions .
Ultimately, troubleshooting may involve testing adjustments in these areas to pinpoint and resolve the specific cause of the issue.