The lesson I've learned moving image upscaling after payment

A few days ago, I launched GenPrinting - a platform that turns a single sentence into a framed poster delivered to your door.

One of the first pieces of feedback I received was clear:

"The product is great, but the upscaling step takes too long."

They were right.

The original flow

Before checkout, every poster was being upscaled to print resolution as part of the checkout flow. That process added 40–50 seconds between a customer deciding they wanted a poster and actually completing payment.

The pipeline looked like this:

Type → Pick → Frame → Upscale → Pay → Done

On the web, that kind of delay is costly. Every extra second between intent and action is another chance for a customer to second-guess, get distracted, or close the tab.

The naive instinct is to optimise the slow step. Faster model, parallel jobs, a nicer progress bar. I started down that path and stopped - none of it changes the fact that the customer is staring at a spinner before they've paid.

The redesign

The fix wasn't to make upscaling faster. It was to move it out of the customer's path entirely.

Upscaling now happens after payment, asynchronously in the background while fulfilment is initiated:

Type → Pick → Frame → Pay → Done

Here's how the pieces fit together.

1. The Stripe webhook is the trigger

When the checkout.session.completed event fires, the webhook creates the order rows and immediately kicks off the upscale job in the background. The customer is already on the success page - they don't know or care that this is happening.

The important detail: the upscale call isn't awaited inside the webhook. Stripe needs a fast 200 back, otherwise it will retry the webhook and you risk duplicate work. Next.js's after() helper turned out to be the cleanest way to run the work after the response has been sent.

2. The upscale endpoint owns the slow step

A separate internal endpoint is the only place that talks to the upscaling model. It writes the upscaled image back to storage, stamps the order with the print-ready asset, and hands off to Gelato for fulfilment. The order row carries enough state - current status, attempt count, when the last attempt started - for any other part of the system to pick up where things left off.

3. Failure handling, when nothing is in front of the user

Moving work to the background means the user is no longer there to see it fail. That changes what "robust" has to mean.

In the synchronous version, a failed upscale was easy: show an error, let them retry. In the async version, the customer has already paid and walked away. The system has to recover on its own.

So I added a scheduled cron job that runs every 10 minutes and looks for orders whose upscale never finished. A few details I'd have skipped in a v0 but ended up needing:

An attempt counter so a broken job doesn't retry forever. After 3 tries the order is flagged for manual review and I get an ops alert email.
A staleness threshold so I only retry jobs that have actually been quiet for 10 minutes - not ones that are mid-upscale right now.
A cap on how many orders a single run will sweep, so a bad day doesn't turn into a thundering herd against the upscaling provider.

4. What the customer sees

Nothing. That's the point.

They get the order confirmation email immediately, a shipping update from Gelato a few hours later, and the poster on their doorstep. The 40-50 seconds of upscaling, the retries, the sweeps - none of it is exposed.

The broader product lesson

The lesson is a product one, not an engineering one:

Anything users have to wait for should be hidden, not optimised.

It's tempting to spend engineering effort shaving seconds off a slow step. Sometimes that's the right call. But often, the better move is to ask whether the user needs to be there at all while it happens.

If the work can be moved out of the critical path, move it. The fastest step is the one your customer never sees.