What would be a more interesting WebAPI: Speech recognition or AR Object detection?

My startup is currently trying to figure out where to priorities resources for the next model to add to our library after we recently launched an AI text summary API in Beta. Options are a either a speech recognition model that requires no additional data (for fine tuning like others do) and performs very well across a wide range of tasks, from video, conferences, presentations, dictation all the way to phone calls and movies) or an object detection API, for example for augmented reality, though it could be used for other stuff like security footage analysis etc. Thank you so much for your input!
No comments yet