@msolstice If you search for a term, then the API takes you to the best match in the file. So, yep, the API can definitely be used to navigate around video & audio based on keywords/phrases.
You could make a fifteen-hour-long video scavenger hunt if you wanted—oh no, ideas for a demo.
Huh, I'm kind of confused. Is it parsing it into words, and then comparing those with a fuzzy match? Or something much subtler, using phonemes rather than words. It seems that the words shown aren't actually that accurate, but the fuzzy match *is*.
Given that searching "conquer" matches the spoken word "concrete" which is close-captioned as "conquering", but "concrete" doesn't seem to find this snippet, I'm guessing that it's more like the first thing I guessed...
Curious also at which stage(s) in the process the AI/ML is used.
@malcolm_ocean Great point! There are circumstances where the match will be imperfect. We are constantly working on improving the algorithm by increasing both precision and recall (check out https://en.wikipedia.org/wiki/Pr... for a very detailed discussion on precision and recall in search).
It's useful to think of the results in the Google sense where your first result may not necessarily be the best result. You might have to venture into the next few results (keep clicking 'next' and the engine will 'lower its standards') to really find what you are looking for. The UI for this isn't super simple yet since it's a new idea—how do you present search results in video and audio?
We have achieved much better accuracy than traditional search based on transcriptions, but we always are trying to improve!
@malcolm_ocean I noticed I missed a few points you asked about. The AI is used in the indexing stage and in prediction layers that are built on top of the search (the prediction is a 'special' thing that customers have to request).
The search does work based on a fuzzy model—matching of words and sounds are all weighted and compared based on their probability of being correct and how far away it is from your query.
@stephensonsco hm, I still don't quite understand. Are you using the neural networks directly on the audio data? Like with the audio as input nodes? Or both that and the transcription? Or...
@malcolm_ocean The NN is used (more or less) directly on the audio waveform to produce a searchable index. When you hit the search button, the index is queried and the most relevant results are returned back to you. The NN is not active during that query stage though, just the indexing. I hope that helps!
Wow. This is really great execution. Can't wait to see this sort of indexing of AV content to become more commonplace.
Report
This is really interesting. It could have applications for audiobooks, particularly non-fiction content. Navigation/searchability is something most of the commercial digital audiobook providers struggle with which really limits the usefulness of audiobooks as a reference resource.
Report
Wow it can help some many people out there! Can you please explain about the technical techniques that you used? I can get from the name of the company that you use Deep learning in some way, but i will be happy to hear more about it. thanks!
As a Voice enthousiast, I really love what you guys are doing !
Best of luck !
Report
Congrats on the launch! Seems like the folks over at @bumpersfm would have a great dataset and application for DeepGram.
Report
This is a fascinating concept! The ability to search through speech and video content in such a precise way opens up countless possibilities for creators, researchers, and businesses.
Replies
Deepgram
Sounds like a great idea
Complice
Deepgram
Deepgram
Complice
Deepgram
Delphi
Smartly.ai
This is a fascinating concept! The ability to search through speech and video content in such a precise way opens up countless possibilities for creators, researchers, and businesses.