A few days ago, I launched GenPrinting - a platform that turns a single sentence into a framed poster delivered to your door.
One of the first pieces of feedback I received was clear:
"The product is great, but the upscaling step takes too long."
They were right.
The original flow
Before checkout, every poster was being upscaled to print resolution as part of the checkout flow. That process added 40–50 seconds between a customer deciding they wanted a poster and actually completing payment.
The pipeline looked like this:
Type → Pick → Frame → Upscale → Pay → Done
On the web, that kind of delay is costly. Every extra second between intent and action is another chance for a customer to second-guess, get distracted, or close the tab.
The naive instinct is to optimise the slow step. Faster model, parallel jobs, a nicer progress bar. I started down that path and stopped - none of it changes the fact that the customer is staring at a spinner before they've paid.
The redesign
The fix wasn't to make upscaling faster. It was to move it out of the customer's path entirely.
Upscaling now happens after payment, asynchronously in the background while fulfilment is initiated:
Type → Pick → Frame → Pay → Done
Here's how the pieces fit together.
1. The Stripe webhook is the trigger
When the checkout.session.completed event fires, the webhook creates the order rows and immediately kicks off the
upscale job in the background. The customer is already on the success page - they don't know or care that this is
happening.
The important detail: the upscale call isn't awaited inside the webhook. Stripe needs a fast 200 back, otherwise it
will retry the webhook and you risk duplicate work. Next.js's after() helper turned out to be the cleanest way to run
the work after the response has been sent.
2. The upscale endpoint owns the slow step
A separate internal endpoint is the only place that talks to the upscaling model. It writes the upscaled image back to storage, stamps the order with the print-ready asset, and hands off to Gelato for fulfilment. The order row carries enough state - current status, attempt count, when the last attempt started - for any other part of the system to pick up where things left off.
3. Failure handling, when nothing is in front of the user
Moving work to the background means the user is no longer there to see it fail. That changes what "robust" has to mean.
In the synchronous version, a failed upscale was easy: show an error, let them retry. In the async version, the customer has already paid and walked away. The system has to recover on its own.
So I added a scheduled cron job that runs every 10 minutes and looks for orders whose upscale never finished. A few details I'd have skipped in a v0 but ended up needing:
- An attempt counter so a broken job doesn't retry forever. After 3 tries the order is flagged for manual review and I get an ops alert email.
- A staleness threshold so I only retry jobs that have actually been quiet for 10 minutes - not ones that are mid-upscale right now.
- A cap on how many orders a single run will sweep, so a bad day doesn't turn into a thundering herd against the upscaling provider.
4. What the customer sees
Nothing. That's the point.
They get the order confirmation email immediately, a shipping update from Gelato a few hours later, and the poster on their doorstep. The 40-50 seconds of upscaling, the retries, the sweeps - none of it is exposed.
The broader product lesson
The lesson is a product one, not an engineering one:
Anything users have to wait for should be hidden, not optimised.
It's tempting to spend engineering effort shaving seconds off a slow step. Sometimes that's the right call. But often, the better move is to ask whether the user needs to be there at all while it happens.
If the work can be moved out of the critical path, move it. The fastest step is the one your customer never sees.