What's great
Vy is a phenomenal product. The vision first approach over DOM based approach that the other CUA agents use is truly game changing. It helps the model understand the real content of the text or the image rather than just extract pure text.
What needs improvement
Vy's speed of thinking + clicking is definitely an area of improvement. Along with it, sometimes the model makes assumptions which frustrates the user.
vs Alternatives
I believe Vy takes a vision approach similar to ChatGPT atlas. However the model feels very lightweight and accurate. Moreover, Vy has access to all the tools a desktop user has and a general understanding of how everything works. The model is also a proprietary model rather than being a wrapper which helps with rejecting sophisticated prompt injection attacks which some other products like browser use may fail at. Claude web browser extension is pretty similar to it, however it takes a DOM approach, which in the long run falls short. Vision is generalized while DOM approaches are specialized to certain task.
What setup steps are required on macOS and Windows?
You can find it in FAQ's
How is on-device data stored and secured?
Seems pretty secure, their friendly terms of service are also really nice as they disclose everything in front of you.
What happens if an app window is minimized or hidden?
Works fine, gets confused for a second if u do it mid session but adapts very quick.

Claude Code on the web