9:00 am is read as 9 double 0
this is an issue I don't know is from the model when I say write 9 am as nine am or from the transcript model, In the transcript are you guys putting directly the generation of the model or does it go through transcript pipeline as well? If its the models fault I tackle it there. if its transripter fault I don't think I can do anything really.
### Answer When addressing the issue of "9:00 am" being read as "9 double 0", it's important to understand the components involved in processing and generating speech responses in Vapi. The process typically involves a transcription step (Speech-to-Text, STT) followed by processing through a Large Language Model (LLM), and finally converting text responses back to speech (Text-to-Speech, TTS) [(1)]() . If the issue lies in how "9:00 am" is being interpreted or generated in the transcript, it could be related to the STT component. However, if "9:00 am" is being correctly transcribed but is incorrectly articulated in the final speech output, the issue might be with the TTS component or the way the model generates responses based on the text input. Vapi acts as an orchestration layer over STT, LLM, and TTS providers, enabling developers to configure and customize these components [(2)]() . This means you have control over various aspects of how input is processed and how responses are generated and delivered. To address this issue, you may consider examining and adjusting: 1. The transcriber settings to ensure accurate transcription of speech to text, focusing on how numbers and time are recognized . 2. The model configuration, specifically the system prompt and how it instructs the LLM to process and generate responses based on the transcribed text . 3. The voice settings to adjust how the TTS component articulates the generated text responses, ensuring proper pronunciation and format of time expressions . Ultimately, troubleshooting may involve testing adjustments in these areas to pinpoint and resolve the specific cause of the issue.
All the text generated by the LLM is directly inputted into the TTS. To fix it, you should experiment with the prompt.
Anyone solve it yet?
I saw that someone prompted the agent with an example of how to say 9:00 PM with something like "Say nine pee em"
"you are forbidden to write two zeros in times like 9:00 am, always write 9 am. If it is 9:20 keep it that way" works for me mostly