I ve been working on some AI projects recently things like scheduled agents, API responders, and multi-agent systems that need to run continuously. One of the biggest headaches I ve run into is deployment.
Most cloud platforms (AWS, GCP, etc.) are built for stateless apps or short-lived functions. But for long-running, stateful agents, the kind that need to persist data, auto-recover from crashes, and expose custom endpoints it gets surprisingly messy. I ve spent so much time setting up VMs, Docker configs, and recovery logic than actually writing agent behavior logic.