Aaron Chan

Building a coupon aggregator that doesn't suck: Technical FAQ & Behind the Scenes 🛠️

by

Hi PH community! 👋 Chloe here, the maker of Saveyz.

While building a US-focused coupon platform, I quickly realized that the UI/UX is the easy part. The real nightmare is the data. Promo codes expire rapidly, merchants change conditions, and affiliate data feeds are notoriously messy.

Since there are a lot of makers and engineers here, I thought I'd share a quick technical FAQ on how we handle the "Data Decay" problem behind the scenes:

Q: Where does the deal data come from?
A: We aggregate data from multiple major affiliate networks (like Rakuten, Webgains, CJ, etc.). The first hurdle was normalizing entirely different API schemas and CSV structures into a single unified MySQL database using Python and Pandas.

Q: How do you solve the "Fake/Expired Code" problem?
A: Hard expirations. We run automated Python cron jobs daily that cross-reference the schedule_end dates based on US Eastern Time. Any code that is expired, or flagged as inactive by the source network, gets immediately soft-deleted (is_deleted=1). We believe it's better to show 5 verified codes than 50 dead ones.

Q: Affiliate data usually has terrible, spammy titles. How do you fix that?
A: This was my biggest headache! Raw data often looks like "[US ONLY] 20% OFF ALL ITEMS (Excludes Sale)!!!"
We built a two-step cleaning pipeline:

  1. RegEx: Stripping out brackets, country codes, and redundant whitespace.

  2. AI Layer: We push the raw descriptions and parent categories through an LLM API to generate clean, SEO-friendly titles and human-readable "About Us" sections for the merchants.

Q: How do you prevent duplicate deals?
A: Before inserting a new promo code, our script pulls the last 10 deals for that specific domain. The AI evaluates the new deal against the recent history. If it's too similar to an existing active deal, we flag it and skip the database insert to prevent spamming the user interface.

Q: What was the biggest technical bottleneck?
A: AI Rate Limits (HTTP 429 errors)! Processing thousands of deals daily kept hitting API limits. I had to build a custom API Key rotation system that automatically switches keys and retries with exponential backoff when it detects a timeout or limit error.

Question for the community: 👇
For other indie hackers building aggregators or working with messy third-party APIs: How do you handle data validation, deduplication, and LLM rate limits in your own projects?

Would love to geek out over your tech stacks and exchange insights! 🚀

4 views

Add a comment

Replies

Be the first to comment