What building an AI-powered document builder taught me about LLM UX

useResume.ai started as a side project - an AI-powered API platform for building documents. It grew to 14,000+ users and became profitable. I learned that building LLM-powered products is different from traditional software.

Latency wasn't the problem I expected

When I started building this, I was worried about how users would handle waiting for the AI to respond. LLM calls can take 30-45 seconds, sometimes longer. I assumed this would frustrate people and make them leave.

I was wrong. Users didn't mind waiting at all, as long as there was something happening on screen. An engaging loading animation that showed progress, some movement, anything that communicated "working on it, just wait". People are surprisingly patient when they know the system is doing something for them.

What users actually cared about was the quality of the output. A fast but mediocre suggestion was worse than a slower but good one. They were willing to wait even a minute if what came back was genuinely useful. This changed how I thought about the product - instead of optimizing for speed, I focused on making the output as good as possible and just made sure the waiting experience felt active and responsive.

This opened the possibility to chain LLM calls to build better responses.

Errors are part of the experience

LLMs fail in ways traditional software doesn't. They sometimes make things up or produce outputs that are useless. How these failures are handled determines whether users trust the product.

In our case, it would occasionally invent job titles or skills the user never mentioned, so we added validation that cross-referenced output against what the user actually input. Sometimes there were tone mismatches - e.g. a finance professional getting suggestions written for a creative role. We had to fine-tune prompts and add industry-specific guidance. And generic outputs like "managed projects" instead of specific accomplishments required feedback loops that detected vague language and asked for more detail.

How do you know when something is wrong?

This was the tricky part. With traditional software, you can write unit tests. But LLM outputs are dynamic. The same prompt can produce different results each time. So how do you know if your system prompt is actually working?

The answer was human evals. Let users tell you when something went wrong. We added simple thumbs up/down buttons after every AI output. When someone clicked thumbs down, we logged the input, the output, and any feedback they provided. Then we went back to the system prompt and adjusted. And kept adjusting until we were only getting thumbs up.

But you also need guardrails before pushing changes to production. We added LLM-as-a-judge tests for all our important endpoints - basically another LLM that evaluates whether the endpoint output consistently meets our criteria. Is it professional? Does it match the user's industry? Is it specific enough? These tests run every time we make changes to the prompts and push to production. They don't catch everything, but they catch the obvious drop in quality before users do.

Pricing fit the use case

We tried different models - free tier with limits, subscriptions, one-time purchase. What I realized was that the nature of the business required a non-subscription model. People don't need a resume builder every month. They need it intensely for a few days when they're job hunting, then not at all for months or years.

So we landed on pay once, get several days of access. This matched how people actually used the product. They could build and refine their resume during their job search window without worrying about forgetting to cancel a subscription later. Users appreciated this - and they came back again when they needed to update their resume for the next job search.