Custom transriber not working
# support
e
Getting no errors in my logs but the new “custom-transcriber” does not work. When I call the assistant he just does not answer, like the response being sent from my code to vapi is not being properly handled, so the TTS model does not read the transription outloud.
@User would be great if someone could check this for me and verify if I am doing something wrong since this is a new feature 🙂
@Shubham Bajaj and @Sahil I have sent you more details by DM
v
@Esrever I'm sorry to hear you're experiencing issues with the "custom-transcriber". Let's try to troubleshoot the problem together. Could you please provide me with more details about how you're calling the assistant and the configuration you're using for the "custom-transcriber"? This information will help me understand the situation better and assist you more effectively.
You can continue this conversation by mentioning me in the message.
Got it. Tagging @User for additional help.
e
can I ask you more questions here?
v
@Esrever Absolutely, feel free to ask any questions you have here. I'm here to help you with your Voice API queries and issues. What would you like to know more about?
e
In what format do you send format to my custom-transcriber server?
v
@Esrever The format sent to your custom transcriber server typically depends on the specifications you've set up for your server to receive data. Generally, voice data is sent as an audio stream or audio file, and the most common formats for audio data are PCM (Pulse Code Modulation), WAV (Waveform Audio File Format), MP3 (MPEG Audio Layer III), or other similar audio formats. If you're referring to the transcription results, they are usually sent back as JSON objects containing the transcribed text along with other metadata such as timestamps, confidence scores, and speaker labels if the model supports it. For the exact format and specifications, you would need to refer to the documentation for the custom transcriber you are using or the API guidelines provided by VAPI. If you have specific requirements or a particular format in mind, please let me know, and I can provide more detailed information or guidance.
e
do you send audio data in chunks?
v
@Esrever Yes, audio data is typically sent in chunks to ensure efficient and real-time processing, especially for streaming audio. This method allows the transcriber to start processing audio while the rest of the data is still being sent, reducing latency and improving the user experience for applications that require immediate feedback, such as voice assistants or real-time captioning. The size of these chunks can often be configured based on the requirements of the transcriber and the network conditions. For instance, in the context of VAPI, you might have the option to set parameters like
inputMinCharacters
to determine the minimum chunk size based on the number of characters or
inputPunctuationBoundaries
to decide how the audio is split based on punctuation for the voice generation. If you need to adjust these settings or have any specific requirements for the chunk size or format, please let me know, and I can assist you further.
e
What is the size of each chunk that is being sent to my server?
v
@Esrever The size of each audio chunk sent to your custom transcriber server can vary based on the configuration of the Voice API (VAPI) you are using. Typically, chunk sizes can range from a few hundred milliseconds to a few seconds of audio data. For example, a common chunk size for real-time audio processing might be around 20 milliseconds to 1 second. However, the exact chunk size can depend on several factors, including the desired latency, network bandwidth, and the processing capabilities of your server. If you have specific requirements or need to know the exact chunk size that VAPI is using to send audio data to your server, you should refer to the VAPI documentation or the settings within your VAPI configuration. If this information is not readily available or if you have further questions, please let me know, and I can provide additional assistance or direct you to the appropriate support channel.
e
how do I control the chunk size?
v
@Esrever To control the chunk size of the audio data being sent to your custom transcriber server, you would typically need to configure the settings within your Voice API (VAPI) integration or the transcriber service itself. This could involve setting parameters that define the duration or size of the audio chunks. However, the ability to control chunk size and the method to do so can vary greatly depending on the specific API or service you are using. Some APIs may allow you to set chunk size directly, while others may manage it internally based on optimal performance settings. If you're using a particular Voice API and need to adjust the chunk size, you should look for settings related to streaming or real-time audio processing in the API's documentation. If the documentation does not provide the necessary details or if the feature is not exposed for customization, you may need to contact the API provider's support team for further assistance. Since you're working with a custom transcriber, you might also have the option to handle the incoming audio stream in a way that suits your needs by buffering the audio data into the desired chunk sizes within your server's code. If you require specific instructions or if there's a particular configuration you're trying to achieve, please provide more details or the documentation you have, and I'll do my best to assist you further. If the information is not available to me, I would recommend reaching out to the support team for more detailed guidance.
e
I mean control the chunk size of the audio that VAPI sends to my server
v
@Esrever Controlling the chunk size of the audio that VAPI sends to your server is typically managed through the VAPI configuration settings. However, the ability to customize these settings can depend on the level of control the API provides. If VAPI allows for customization of the audio chunk size, you would usually find this option in the API's settings when you set up the integration with your custom transcriber. Look for parameters related to audio streaming, buffer size, or chunk size. You might need to specify these settings in your API request or configuration file. If you're unable to find the settings or if the documentation does not specify how to control the chunk size, it's possible that VAPI manages this internally and does not expose this setting for external configuration. In such cases, the chunk size may be optimized for general use and may not be adjustable by the end-user. For precise instructions or to confirm whether you can control the chunk size, you would need to consult the VAPI documentation or reach out to the VAPI support team. If you have already sent details via DM as mentioned earlier, please let me know if there's anything specific from that information you'd like me to review, or if you need me to direct you to the support team for further assistance.
s
@Esrever can you please share the call id?
e
@Shubham Bajaj I have sent you by DM
@Vapi How does Vapi will initiate a websocket connection with server.url?
How does Vapi will initiate a websocket connection with server.url?
v
@Esrever Please ask support for more info on this.
a
I also require some guidance on the same issue. I have added my transcriber provider, url and authentication details. There request to call api is generated, but the moment i pick the call , it disconnects as transcriber pipeline does not works appropriately. It would be highly appreciating, if you could provide come sample payload for custom transcriber.
7 Views