Introduction
The evolution of image generation from a mere “image tool” to a “visual thinking partner” reveals a layered design logic that all AI product managers can learn from.

Recently, the product community has been discussing a phenomenon: why some AI image generation products go unused even when free, while others charge $200/month and remain in high demand?
On April 21, OpenAI’s release of ChatGPT Images 2.0 provided an interesting answer. Instead of competing on “image quality,” it innovated structurally at the product level—injecting reasoning capabilities into image generation, allowing the AI to “think” before generating images, using this “thinking” ability as a core lever for paid conversion.
This article aims to dissect key decisions regarding user segmentation, pricing design, and workflow integration in AI products, hoping to inspire those working on AI products.
The Pricing Dilemma in AI Products
AI product managers often face a common dilemma—users perceive AI capabilities as “good enough.”
For instance, in image generation, telling users that “our model improved by 15% on the FID metric” usually elicits a response of “oh, it does look a bit clearer.” A technology upgrade that took three months to optimize may only be seen as a “slight improvement” by users. Moreover, as competitors also enhance their offerings, users’ perception of quality differences becomes further dulled.
This leads to a pricing dilemma: if the core selling point of a product is “better quality,” users find it hard to pay for “a bit better.”
ChatGPT Images 2.0’s approach is noteworthy. It did not focus on “looking better” but created a new capability dimension—“thinking image generation.” This difference is not a matter of degree (a bit better vs. much better) but of category (can do vs. cannot do).
Specifically, Images 2.0 offers two modes:
- Instant Mode: Open to all users, focusing on “better basic image generation”—more accurate text rendering, better instruction adherence, and support for more languages. This is an upgrade of “doing better.”
- Thinking Mode: Available only to paid users, emphasizing “thinking before generating”—the AI first searches for reference information, plans composition logic, generates multiple stylistically consistent images, and finally checks spelling and positioning. This is an upgrade of “doing new things.”
The product design insight here is that the paid conversion of AI products is more about creating new capabilities as “category differences” rather than optimizing existing capabilities as “degree differences.” Users are unlikely to pay for “a bit better” but will pay for “what I couldn’t do before, now I can.”
Key Layered Design: Pricing Based on “Substituted Labor Costs”
Further dissecting the layered logic of Images 2.0 reveals a deeper design principle.
What does Instant Mode replace? It substitutes the behavior of users searching for images on search engines or downloading images from free material sites. This behavior has a time cost of about 5-10 minutes, making its replacement value low, so offering it for free is reasonable—using it to cultivate the habit of users to “open ChatGPT whenever they need an image.”
What does Thinking Mode replace? It replaces the behavior of users spending 30 minutes on Canva to create an infographic or waiting two hours for a designer to deliver a draft. This behavior has a time cost ranging from 30 minutes to several hours, making its replacement value much higher, thus justifying it as a paid feature that users are willing to pay for.
In other words, OpenAI’s pricing anchor is not based on “Thinking Mode consuming more computing power, hence more expensive,” but rather on “Thinking Mode saving you more labor costs, hence more valuable.”
An important insight here is that AI product pricing should not anchor on the cost side (how much computing power I consumed) but on the value side (how much time and labor costs I saved for users).
I have organized this thought into a simple layered decision framework for reference:
- Identify users’ current alternatives. How would users complete this task without your product? What tools would they use? How much time would it take?
- Categorize features based on the cost of alternatives. Capabilities with low replacement costs (searching for images → free image generation) should be placed in the free layer for user acquisition; capabilities with high replacement costs (hiring a designer → AI auto-layout) should be placed in the paid layer for conversion.
- Ensure that the capabilities in the paid layer represent “category differences” rather than “degree differences.” Users are insensitive to “20% faster” but very sensitive to “what I couldn’t do before, now I can.”
- Use data from the free layer to validate assumptions about the paid layer’s demand. If free users frequently attempt a certain type of complex task but do not achieve satisfactory results, it indicates that this type of task can serve as a selling point for the paid layer.
New Interaction Design Challenge: What Users Think When AI Needs to “Think”
The Thinking Mode introduces a new interaction challenge: the generation time has increased.
Previously, AI image generation was a “second output” experience—input a prompt, wait 3-5 seconds, and the image appears. However, Thinking Mode requires executing a multi-step process of searching, planning, generating, and verifying, which may take several minutes for complex tasks.
A few minutes may not seem long, but in users’ psychological perception, it falls into a dangerous zone.
We all know the “3-second rule” in product design—if a webpage takes more than 3 seconds to load, the user dropout rate skyrockets. However, this rule applies to scenarios of “waiting for an unknown result.” If users can see progress and understand “what is happening,” their patience during the wait will significantly increase.
This is a core interaction proposition for agent-type AI products—Thinking ≠ Waiting; you need to make users perceive that “AI is thinking” rather than “AI is stuck.”
How to achieve this? I have summarized three effective strategies from several well-executed products:
- Show the thinking process. Similar to what ChatGPT’s reasoning model is already doing—display the AI’s thought chain, allowing users to see “searching for reference materials,” “planning layout,” and “checking text.” Users see a transparent workflow instead of a spinning loading animation.
- Provide incremental outputs. Don’t make users wait until the final result to see anything. Show a draft composition (within seconds), then gradually fill in details (in tens of seconds), and finally deliver the complete product (in minutes). Users can see progress at each stage, significantly reducing anxiety.
- Allow user intervention. Permit users to intervene during the thinking process—for example, if the AI plans a three-column layout, users can say “I want two columns” at this stage rather than waiting for the final product to come out and then starting over. This not only reduces waiting anxiety but also effectively enhances the quality of generation.
Another detail worth noting is that some users in testing found that the iterative editing in Thinking Mode would yield diminishing returns after 1-2 rounds—more edits led to worse results, ultimately forcing users to start a new session from scratch. A workaround is to allow users to drag the current image into a new dialogue to restart.
This suggests a problem of “context pollution” in the reasoning chain. For product managers, a feasible product strategy is to add a button in the editing interface that allows users to “restart based on the current image,” transforming technical limitations into a natural interaction process, thereby reducing user frustration.
Ecological Binding Strategy: Image Generation as a Layer of Stickiness
ChatGPT Images 2.0 also includes a strategic move that is easy to overlook—it has been directly embedded in Codex (OpenAI’s coding tool), allowing users to generate images in the coding environment without needing a separate API key with their existing ChatGPT subscription.
This is not about creating an “image generation product.” Instead, it uses image generation as a “stickiness layer” to enhance user retention across the entire Codex ecosystem.
Over the past year, we have seen OpenAI continuously add capabilities to Codex: coding → computer control → image generation → memory → browsing. With each added layer, the cost of user migration increases slightly. When users complete coding, image generation, document writing, and prototype design all within the same tool, the cost of switching to competitors becomes very high.
At the same time, OpenAI announced the complete discontinuation of DALL-E 2 and DALL-E 3 on May 12. This is not only a technological upgrade but also forces existing developers to migrate to the new API system of gpt-image-2. The new API shifts from “per image billing” to “per token billing,” meaning that once developers migrate, they need to restructure their cost models, further increasing switching costs.
The product strategy insight here is that the competitiveness of AI products lies not in how strong a single function is, but in how deeply multiple functions combine to build workflows.
Users will not stay because your image generation is 10% better than others, but they will stay because your image generation + coding + browsing + memory form a complete workflow that they cannot leave. This logic is similar to the strategy of WeChat mini-programs—individual mini-programs may not be strong enough, but when your payment, social, content, and services are all within the WeChat ecosystem, it becomes hard to leave.
For product managers working on AI products, I have a specific suggestion: do not plan each AI capability as an independent function; instead, think about how these capabilities can form a “workflow loop.” The more complete the loop, the higher the user migration cost and the deeper the product barrier.
Competitive Insights: From “Whose Images Look Better” to “Who is More Deeply Integrated into Workflows”
Finally, I want to discuss the changing competitive landscape, as the logic behind this applies to all AI products.
Currently, the AI image generation sector has formed three distinct competitive routes:
- OpenAI has chosen “reasoning capabilities + ecological binding.” Its core differentiation is not image quality (though it is good), but the completeness of the workflow brought by the Thinking Mode and the deep integration into the Codex ecosystem.
- Google (Gemini / Nano Banana) has opted for “price advantage + ecological binding.” At the same resolution, its cost is about one-third that of OpenAI, deeply integrating with Google Workspace. 1 billion images were generated in 53 days, primarily relying on low prices and Google’s vast user base.
- The open-source camp (Stable Diffusion, Flux, etc.) has chosen “freedom + zero cost.” The quality of single images continues to catch up with closed-source models, but in terms of multi-image consistency, reasoning validation, and workflow integration, they struggle to compete in the short term.
These three routes reflect a general pattern of AI product competition evolution—the first stage competes on quality (whose model is better), the second stage competes on price (who is cheaper), and the third stage competes on ecosystem (who is more deeply integrated into workflows).
We have already fully traversed these three stages in the LLM text field. Now, image generation has also reached the third stage.
For product managers, it is crucial to recognize which stage your product is in. If you are still in the first stage, focusing on model quality is correct; if you have already entered the third stage, piling on quality yields diminishing returns, and you should focus your efforts on workflow integration and ecosystem building.
Conclusion
Returning to the initial question: why do some AI products go unused even when free, while others charge $200/month and remain in high demand?
ChatGPT Images 2.0 provides the answer: users pay for “new capabilities” rather than “better performance”; they pay for “saved labor costs” rather than “consumed computing power”; and they are locked in by a “complete workflow” rather than by the “quality of a single function.”
These three principles apply to nearly all AI product designs.
If you are working on the paid design of an AI product, consider asking yourself three questions:
- Is the difference between my paid and free features a “degree difference” or a “category difference”?
- Is my pricing anchor on the cost side (computing power consumption) or the value side (substituted labor costs)?
- Do the various AI capabilities in my product form a workflow loop?
Clarifying these three questions will also clarify the path to paid conversion.
Comments
Discussion is powered by Giscus (GitHub Discussions). Add
repo,repoID,category, andcategoryIDunder[params.comments.giscus]inhugo.tomlusing the values from the Giscus setup tool.