We use ElevenLabs to synthesize the output text from the agent and turn it into speech to output back to the user. ElevenLabs has incredibly realistic voices (and you can even copy your own)!
We use Deepgram to transcribe the input audio from the user and turn it into text to feed into our agent. It's incredibly accurate and very easy to use!
We leveraged prompt engineering on top of GPT-4 to create our intelligent technical interviewer. It's truly impressive how LLMs can understand and generate code!