Humalike - Give your AI agents the social intelligence they're missing

by
Today's models are capable enough. Smart enough. Fast enough. But we still feel they don’t fit in the room. Humalike is building the behavioral infrastructure for humanlike AI agents. The social skills & proactiveness your agents have been missing. APIs, models, benchmarks.

Add a comment

Replies

Best

How are you actually measuring "humanlike" behavior beyond the benchmarks you ship, and can customers plug in their own eval scenarios to test against their specific use case?

 At the end of the day it's your agent, so you can evaluate it in whatever way you want. We want to allow you to forget about interaction / social part so you can focus on what matters for your agent.

 To be honest, that's not something we have absolutely figured it out yet. We have some things, observability, intuition and "simple" benchmarks, there is still a long way to go upon how to measure. About eval scenarios, yes! The API "Social Observability" does exactly that!!

Tried the API over the weekend and the proactive context layer actually feels useful, not gimmicky. Liked that it picks up on social cues I usually have to script by hand.

 Tysm!

 That's amazing to hear Hava! Have a great day, thx for the supp

The social intelligence angle is a sharp wedge. Most agent tooling optimizes for finishing the task and forgets how the interaction actually lands. In practice, are you scoring tone and context, or injecting it into the responses themselves? Feels like something multi-agent setups are going to need soon.

 Thanks for the support. We equip your agent with context and judgment it needs to perform better.

 Totally! One of the components is social observability, which we use ourselves to figure out how to eval scoring. In practice, if we go simple, if you don't complain and overall feel satisfied with the interaction, it means you had a good experience! I think we can all relate to how annoying, impersonal and generally unaware agents are (in any usecase / product) :))

the turn-taking problem is so real. I've seen plenty of AI agents that are technically capable but socially exhausting — they jump in too fast, over-explain, and never read the room. curious how you're benchmarking this though — what does 'good' social behavior look like as a metric? is it response timing, or something more nuanced like knowing when a user is thinking vs actually done talking?

 It's a combination of factors and the ultimate judges are humans: do they speak with him or ignore him? Do they get annoyed by him? Do they trust him? But this can be only evaluated in long time horizon, so we also look at short-term as you noted

 100%! Experience should just feel right. Our metric isn't fully acc yet, but if you don't complain and have a good experience, that means it's good :))

the missing layer isn't intelligence, it's calibration. models can generate perfect answers but they don't know when they're supposed to be quiet. the social debt shows up the moment you drop an agent into a real slack or discord. it either lurks awkwardly or overshares. the version that reads the room first and speaks second is the one people let stay.

curious how you're benchmarking "fits in the room." vibes are hard to measure. is it deference patterns, timing, response latency to social cues? that spec is the whole product.

 Hey 👋 humans are the ultimate judge. At the end of the day what matters is "do people like the agent?" "are they annoyed by him" "do they do social sanctioning on it?" this questions matter the most but are only attributable in long time horizon.

The benchmark piece is interesting here. For agents, “social intelligence” can get fuzzy fast. I’d want to see failure cases like interrupting too often or being too passive, not just success scores. Are you measuring those negative behaviors too?

 Hey! Social Observability components evaluates these failure modes. We are obviously focusing on failure modes even more than on success stories.

Congratulations on the launch!
How does Turn Taking decide the best moment for an agent to join a conversation? does it adapt differently for fast moving group chats? really curious about the underlying approach.


 Thanks for support and question! Turn-taking uses other components like Social Signals to make better judgment. Social signals keeps track of typing speeds and realizes when chat is dynamic vs quiet. Turn taking also handles interruptions when agent already started processing and another message appears in group chat, that way agent is never spammy :))

Interesting... may test this with the RAG system I built. Good luck with the launch!

 Amazing :)) Get back to us if you encounter any issue / weird behavior!

Social intelligence is exactly the layer that separates a demo from something a business will actually put on the phone. In production the failures are almost never 'wrong answer' - they're tone, over-promising, or not knowing when to shut up and hand off to a human. How are you measuring 'social' correctness? That's the part that's brutal to eval. Congrats on the launch.

 Thanks for the supp David!

 Yes it's hard to measure and evaluate, especially in automated way since it usually takes human to judge what is appropiate in social setting. We have in-house research team working on evals and we open-sourced one of them, that is targeted on how well LLM adjusts to the group it is speaking to.

Paper link:

By the way I see you are maker of Worvi that could benefit from Humalike APIs. If you have more feedback / ideas please let us know!

The behavioral angle feels really fresh, not just another wrapper around an LLM. The proactivity piece is what stood out when I poked around, agents actually initiating instead of waiting on prompts.

 Indeed! tysm for the supp!