Nightjar LogoSign in
6 Prompt Patterns That Consistently Produce Realistic AI Product Photos

Quick Answer

Realistic AI product photography is consistent when the prompt follows a pattern, not a paragraph. The six prompt patterns for AI product photography that consistently produce realistic results are: Subject + Surface + Light + Lens + Finish, Pose + Framing + Angle, Reference-Image Anchoring, Color Lock, Constraint and Exclusion (which behaves differently in diffusion models versus instruction-tuned models like Imagen and Nano Banana), and Save and Apply. Each pattern controls a specific axis of the image, breaks in a specific way, and stops scaling at the same point: when a brand needs the same prompt to run across hundreds of SKUs without drift. That last gap is what reusable saved configurations like Nightjar's Photography Styles, Compositions, and Recipes are built to close.

Writing a Prompt vs Applying a Pattern

Most articles about prompt engineering for product photography give the reader a fish: thirty copy-paste prompts with no explanation of why any of them work. That is fine for a first image. It is not a production system. A prompt is a one-time creative act. A prompt pattern is a structure: a sequence of slots filled in for any product, any catalog, any campaign. Writing a prompt is improvisation. Applying a pattern is engineering.

The six prompt patterns for AI product photography below get a brand around 80 percent of the way to consistent, on-brand output. The remaining 20 percent (catalog scale, team reuse, drift across hundreds of SKUs) is where prompting alone breaks down, and the closing section addresses that honestly. For broader context, see the broader guide to consistent AI product photography.

One 2026 nuance most articles miss: these patterns are not architecture-agnostic. Diffusion models (Stable Diffusion, SDXL, Flux variants) and instruction-tuned multimodal models (Google Imagen, Gemini Nano Banana, GPT-image, DALL-E 3) honor the same patterns differently, especially around negation. Where the difference matters, the pattern below flags it.

A prompt produces one image. A pattern produces a hundred.

How to Read This List

Each pattern follows the same six-part structure, in the same order: slot template, example prompt, mechanism, what it controls and what it does not, common failure mode, and how it becomes reusable in a Recipe-driven workflow.

The patterns are independent. A reader can use Pattern 1 without Pattern 3, or Pattern 4 without Pattern 5. They compose; they do not depend on each other. Where a pattern behaves differently between diffusion and instruction-tuned models, the mechanism block flags it.

The six patterns:

  1. Subject + Surface + Light + Lens + Finish
  2. Pose + Framing + Angle (Composition)
  3. Reference-Image Anchoring
  4. Color Lock
  5. Constraint and Exclusion (the 2026 negative-prompt split)
  6. Save and Apply

Pattern 1: Subject + Surface + Light + Lens + Finish

This is the foundation pattern. It is the one that fixes the "my AI product images look fake or plastic" complaint.

Slot template

[Subject with material and color] on [surface or background], [light source and direction], [camera and lens], [finish and quality terms]

Example prompt

A matte ceramic coffee mug in warm cream color,
sitting on a smooth concrete surface,
lit by a single large softbox at 45 degrees from camera left,
shot on a medium-format camera with a 100mm macro lens at f/8,
sharp commercial product photography, soft directional shadow,
photorealistic, fine surface detail.

Mechanism

Photographers think in terms of aperture, shutter speed, and focal plane: continuous physical parameters, not vague adjectives (Bokeh Diffusion, arXiv:2503.08434). When a prompt borrows that vocabulary, image models trained on millions of EXIF-tagged photographs pull the output toward the optical statistics those numbers correspond to. "85mm at f/2.8" steers toward images tagged with that lens behavior, which look like photographs because they are (Google Developers Blog).

The lens line is the highest-leverage slot for perceived realism. The brain reads "real photograph" through optical signature: depth of field, distortion, bokeh shape. A 50mm lens matches the perspective of the human eye and is the ecommerce default (Practical Ecommerce). 85mm gives more working distance for medium products like footwear and small electronics (Tom Crowl). 100mm macro is the standard for jewelry, beauty, and watches (Master Product Photography). Anything under 50mm produces visible barrel distortion (Packshot Creator).

One light source described well beats three described loosely. Three-point lighting is the canonical studio setup (MasterClass, StudioBinder), but when a prompt names three lights the model averages their behavior in attention space and produces muddy results. Google's product photography template names a single lighting setup with a single purpose, not a multi-source list.

What it controls, what it does not

Controls: material reading, lighting direction, depth of field, perceived sharpness, camera realism. Does not control: model identity, repeatable framing across SKUs, consistent surface across a catalog, exact hex color.

Common failure mode

Listing twelve adjectives and three lighting types in one prompt. The model averages, and the lighting reads as muddy. The fix is one well-described light source plus one focal length plus one finish, not a thesaurus of all three.

How this becomes reusable in Nightjar

Nightjar has a feature called Photography Styles: reusable saved containers for camera feel, lighting, mood, color scheme, texture, and atmosphere. The first time a user picks one, it is a curated template; once a Photography Style works for the brand, it can be saved and applied across every Generation. The 150+ curated Photography Styles that ship with Nightjar are the catalog of pre-built Subject + Surface + Light + Lens + Finish patterns. A custom Photography Style can also be extracted from one to five reference Assets, which is the editor-side version of "save this lens-and-light combination for next time." For a deeper walkthrough, see the Photography Style explainer.

Pattern 2: Pose + Framing + Angle (Composition)

This is the geometry pattern. It is the one that fixes the "the camera angle keeps changing across my catalog" complaint.

Slot template

[Shot type] [angle] of [subject], [pose or arrangement], [crop and product scale], [composition note]

Example prompt

Three-quarter angle hero shot of the bag,
rotated 30 degrees from front,
slight 10-degree downward tilt,
product centered horizontally, occupying 70 percent of frame height,
generous negative space above for ad copy,
4:5 aspect.

Mechanism

Camera-position and shot-type vocabulary (low angle, three-quarter, top-down, medium-full, knolling, rule of thirds, negative space on the left) maps to compositional priors the model learned from labelled image data. These tokens act on the geometry of the output, not on its surface.

This is independent of light and material vocabulary. A model trained on millions of product photos has learned that "three-quarter angle, 30-degree rotation, 1:1 crop" is a compositional pattern that recurs regardless of lighting (Google Cloud Blog). E-commerce conventions like packshot, hero, lifestyle, flat lay, ghost mannequin, and three-quarter each have a specific compositional signature (Squareshot). Naming the convention is shorter and more reliable than describing it.

Keeping pose and framing in their own slots is what makes a 200-SKU catalog look like one shoot. The pose stays fixed even as products and Photography Styles change.

What it controls, what it does not

Controls: angle, framing, pose, product placement, crop, scale, negative space. Does not control: lighting, color, material reading, mood, atmosphere.

Common failure mode

Mixing pose vocabulary into the lighting line. "Soft golden hour from a 45-degree angle" is ambiguous: the 45 degrees could be the light direction or the camera angle, and models guess. A guess in the geometry slot is the most visually obvious miss. Keep pose tokens separate from lighting tokens.

How this becomes reusable in Nightjar

Nightjar has a feature called Compositions: a reusable arrangement that controls framing, camera angle, product placement, model pose, crop, and product scale, distinct from the Photography Style. The first time a user picks one it is a template from Nightjar's curated Composition library; once a custom Composition works, it can be saved from a single reference Asset and reused across the catalog. The deliberate separation of Photography Style (camera, lighting, mood) from Composition (pose, framing, angle) is the structural reason patterns transfer cleanly across a catalog: changing the pose does not change the lighting, and changing the lighting does not change the pose. For more on controlling camera angle in AI product photos, see the help-desk guide.

Pattern 3: Reference-Image Anchoring

Most competitor articles bury reference images in a footnote. They are a first-class pattern with their own mechanism and their own failure modes.

Slot template

Use [reference image] as [aspect to anchor: structure / style / identity / texture]. Combine with [other reference] for [other aspect]. Apply to [new subject and scene].

Example prompt

Using the studio lighting and color grade from @image1
(an existing brand campaign shot)
and the actual product from @image2
(the real product upload),
generate a new image of the product on a polished walnut surface,
three-quarter angle, soft window light from camera left, 4:5 aspect.

Mechanism

This is the only pattern that does not depend on text vocabulary mapping. A reference image gives the model a high-dimensional visual anchor that text cannot describe with equivalent fidelity.

Two underlying mechanisms, depending on the architecture. For diffusion stacks, the dominant mechanism is the IP-Adapter: a decoupled cross-attention layer that separates text and image features so the image can drive concept and style without overriding the prompt. Only 22M extra parameters; preserves identity and aesthetic; can match or beat fine-tuned image-prompt models (Ye et al., arXiv:2308.06721). For instruction-tuned models like Gemini and Imagen, the reference is passed in the same multimodal context as the prompt, and the model uses end-to-end semantic understanding to interpret which part of which image governs which part of the output. This is why Google recommends an explicit relationship sentence: "Using A as the structure and B as the texture..." (Google Cloud Blog).

IP-Adapter and image-to-image are different mechanisms (Stable Diffusion Art). img2img uses the source image's color and structure to guide denoising; IP-Adapter uses the concepts the model detects in the reference. The first preserves layout, the second preserves identity and style. Unlike img2img and ControlNet, IP-Adapters do not require the reference to match the output aspect ratio.

Either way, one operational rule: assign one role per reference, explicitly.

What it controls, what it does not

Controls: style transfer, identity continuity, lighting and color grading carry-over, multi-source composition. Does not control: anything the references do not contain. If the reference has no clear lighting direction, you cannot extract one.

Common failure mode

Sending five reference images with no relationship instruction. The model averages every visual feature across all five inputs and produces a blurred middle. The fix is one explicit role per reference: "Use A for lighting, B for product, C for background." Google's own multi-image template enforces this structure.

How this becomes reusable in Nightjar

Nightjar's Edit tab is a multi-image board that supports up to 8 input Assets, each referenced directly in the prompt as @image1, @image2, and so on. This is the structural-prompt control most prompt-only tools cannot express. The same reference-image mechanism is also how Nightjar's custom Photography Styles, Compositions, and Fashion Models are built: extract the reusable ingredient once from one to five references, save it as an object, apply across the catalog without re-uploading. For practical guidance on blending products into stock photo backgrounds, or on matching AI photos to real photos for brand consistency, see the help-desk articles.

Pattern 4: Color Lock

This is the exact-color pattern. It addresses the "the color is close but not exact" complaint that breaks colorway catalogs.

Slot template

[Subject], color: [exact specification], preserve [structure / shadow / fold detail] from the source, [framing and lighting if generative]

Example prompt

Recolor the bag in @image1 to exact hex #2A6F4A (deep forest green).
Preserve all stitching, hardware, shadow detail, fold geometry,
and material texture from the source.
Studio lighting and framing identical to source.

Mechanism

AI image models drift on color because named colors are ranges. "Forest green" is a range. "Light beige" is a wider range. Models have learned a probability distribution over what those words mean, not an exact value.

Three specifiers in priority order:

  1. Hex code (#2A6F4A). Explicit, model-readable, stable.
  2. Pantone or industry shade (Pantone 18-3838 TPX, RAL 6002). Narrower than a named color.
  3. Reference image of a known correct sample. Highest fidelity if the model supports it.

For diffusion models, color tokens often live in the same attention pool as material tokens, which is why "matte sage green" reads more cleanly than "sage green," and why a hex code passed verbatim binds tighter than a name. For instruction-tuned models, hex codes are read literally; Google recommends quoting exact text or values when precision matters (Google Cloud Blog).

The structure-preservation clause matters as much as the color. Pass a hex value (or Pantone or RAL code) and explicitly tell the model to preserve structure, stitching, shadow, and fold detail from the source asset. Without the preservation clause, even a correct hex code produces a freshly generated product around the color and the original geometry drifts.

What it controls, what it does not

Controls: exact target color, preserved structure when paired with a source asset. Does not control: how lighting interacts with the new color. A green that looks correct in flat light may look different under warm tungsten. The lighting slot still belongs to Pattern 1.

Common failure mode

Specifying a hex code without asking the model to preserve the source. The model regenerates the entire product around the color and the structure drifts. The fix is to anchor the source asset (Pattern 3) and add an explicit preservation clause to the prompt.

How this becomes reusable in Nightjar

Nightjar exposes this directly through the editor's /color command, which binds a hex value structurally rather than burying it in prose, and through the Recolor Edit Shortcut, a fast path in the Edit tab for color-only changes. The source Asset stays anchored. The capability helps preserve lighting, shadows, texture, and product structure while changing color, a phrasing that does not promise pixel-identical fidelity, because color rendering depends on lighting interaction. For reference-anchored color preservation across a brand, see matching AI photos to real photos for brand consistency.

Pattern 5: Constraint and Exclusion (the 2026 Negative-Prompt Split)

This is the most under-explained pattern in competitor content. It is also the one most articles get wrong by treating "negative prompt" as one thing.

Slot template

For diffusion models with an explicit negative prompt field (Stable Diffusion, SDXL, Flux variants):

Positive: [full positive prompt] Negative: [comma-separated list of unwanted features]

For instruction-tuned multimodal models (Imagen, Gemini Nano Banana, GPT-image, DALL-E 3):

[Positive description]. Render this as [target framing]. Critical: [explicit positive reframings of what to exclude].

Example prompts

For diffusion (Stable Diffusion, SDXL, Flux):

Positive: studio packshot of a glass perfume bottle on white seamless backdrop,
single softbox above, 85mm lens at f/8, sharp focus, commercial photography
Negative: plastic, blurry, distorted reflection, double bottle, hands,
text, watermark, logo, low quality, oversaturated

For instruction-tuned (Imagen, Nano Banana, GPT-image):

A studio packshot of a glass perfume bottle on a seamless pure white
background (RGB 255). Single softbox lighting from above, 85mm lens at f/8,
sharp commercial focus. The image should contain only the bottle and its
shadow, with no hands, no text on the label, no watermark, no logo overlay,
no reflective studio equipment visible.

Mechanism

Negative prompts are not deprecated. They have bifurcated. The diffusion track still uses an explicit negative prompt field with classifier-free guidance. The instruction-tuned track has eliminated the field but still responds to negation phrasing inside the prompt body, with positive reframing being the most reliable form. Anyone copying a negative prompt list from a Stable Diffusion guide into Nano Banana is loading dead weight. Anyone telling Stable Diffusion "no cars" without using the negative field is wasting tokens on weak guidance.

Stable Diffusion's negative prompt field works through Classifier-Free Guidance: at each denoising step, the model makes two predictions, one conditioned on the positive prompt and one on the negative, then steers the output away from the negative (arXiv:2406.02965). The CFG scale parameter controls strength. This is a real, mechanical control at the model level.

Google's Nano Banana Pro API has no such field, and passing one returns a 400 error (Apiyi). Google's own guidance is to use positive framing inside the prompt: "Use positive framing: Describe what you want, not what you don't want (e.g., 'empty street' instead of 'no cars')." Two architectures, two contracts, one practical rule: know which one you are talking to before writing a constraint.

For instruction-tuned models, two phrasings work:

  1. Positive reframing. "Empty street" beats "no cars." This is Google's official recommendation.
  2. Explicit declarative negation. "Do not include text, watermarks, logos, or studio reflections." Max Woolf's empirical Nano Banana testing found this works reliably, especially with ALL CAPS for critical constraints (Max Woolf).

The model-architecture matrix:

Model familySeparate negative prompt fieldInline negation phrasingRecommended pattern
Stable Diffusion / SDXL / FluxYes (CFG-based)Works as positive prompt onlyComma-separated negative keyword list in the negative field
Imagen / Nano Banana / Gemini imageNo (passing negativePrompt returns 400)Works as natural-language instructionEmbed exclusions as positive descriptions ("empty street," "no text on label")
GPT-image / DALL-E 3NoWorks as natural-language instructionSame as above; use clear declarative sentences

What it controls, what it does not

Controls: removal of common artefacts and unwanted elements, marketplace compliance language (no text, no overlays, no watermarks). Does not control: anything not present in the model's vocabulary. Telling a model "no extra fingers" cannot rescue a model that does not understand fingers; it can only bias against them.

Common failure mode

Pasting a 50-word negative prompt copied from Reddit into an instruction-tuned model that has no negative prompt field. The model either silently ignores it or, worse, treats the negative list as a positive description because the field does not exist and the prompt is concatenated. The fix is to know which architecture you are talking to and rewrite the constraint as a positive instruction for instruction-tuned models.

How this becomes reusable in Nightjar

Nightjar bakes the most important constraint, product preservation, into its defaults. Upscale is target-resolution based (2K and 4K long-edge), not creative reinterpretation, so it does not invent label text, hands, or studio reflections. The Recolor and Product Placement Edit Shortcuts anchor to the source Asset so structure, text, logos, and material detail are preserved by design rather than by prompt-engineered exclusion lists. Custom Directions are the slot for additional constraints layered on top of the ingredient system, the cleanest place to keep marketplace-compliance language like "no text on label, no studio reflections, pure white background" without re-typing it every Generation. For more on how Nightjar handles preservation, see AI upscale and product preservation.

Pattern 6: Save and Apply

This is the pattern that does not live inside the prompt box.

Slot template

Run patterns 1 through 5 once. Save the resulting configuration as a named, structured object. Apply the named object to the next product without rewriting the slots.

Example application

A Shopify brand defines one Recipe for its packshot system:

  • Photography Style: clean studio (custom, extracted from 4 brand reference images)
  • Composition: centered three-quarter, 70 percent fill, 1:1
  • Fashion Model: none
  • Background: solid white #FFFFFF
  • Custom Directions: "Preserve product label text exactly. No reflective studio equipment in frame."
  • Aspect ratio: 1:1
  • Resolution: 2K
  • Output format: JPEG
  • Image count: 4

The next 200 SKUs run through this Recipe in one click each.

Mechanism

Patterns 1 through 5 are reusable structures, but they are not reusable objects. Each next product still requires the writer to fill the slots correctly, and small phrasing differences across two attempts produce different outputs. The fix is to move the locked configuration out of the prompt and into a structured saved object.

"Save" means different things in different tools. In raw diffusion stacks, it is a saved prompt template plus a fixed seed plus a saved IP-Adapter or LoRA reference set: brittle, since prompt-only tools collapse the variables into one text blob. In Nightjar, it is a Recipe: a structured saved object that captures Photography Style, Composition, Fashion Model, Background, Custom Directions, image count, aspect ratio, resolution, and output format together.

The mechanism that makes this work at scale is separation. Recipe-style structured saves keep style, composition, model identity, and output settings as distinct fields rather than collapsing them into one string. The writer can change pose without touching lighting and change lighting without touching pose. Two images from the same saved configuration look like the same shoot, even if generated months apart.

What it controls, what it does not

Controls: consistency at scale, transferability across SKUs, transferability across team members, transferability across time. Does not control: the underlying photographic patterns. A saved configuration is a wrapper. Patterns 1 through 5 still have to be correct. The wrapper just stops them from being re-typed each time.

Common failure mode

A founder writes one good prompt for one product, then asks an assistant or agency to "do the same thing" for 50 more SKUs. The assistant retypes the prompt slightly differently every time. The catalog drifts. This is the prompting-as-production failure mode that no amount of better prompt-writing solves; it requires moving the configuration out of the prompt box and into a structured saved object.

How this becomes reusable in Nightjar

Nightjar has a feature called Recipes: a Team-owned structured saved object that captures Photography Style, Composition, Fashion Model, Background, Custom Directions, image count, aspect ratio, resolution, and output format. A Recipe does not save product Assets or generated outputs; it saves the controllable direction. A Team can have up to 100 active Recipes, all shared across Team members, which is what turns the brand's prompt patterns into shared infrastructure rather than tribal knowledge inside one person's account. For practical guidance on consistent AI product photos, or on building a consistent lookbook style for a new collection, see the help-desk articles.

How the Six Patterns Compare Across Model Architectures

Each of the six patterns behaves slightly differently across the three dominant 2026 image-model architectures. The matrix below shows how each pattern translates.

PatternDiffusion (SD, SDXL, Flux)Instruction-tuned (Imagen, Nano Banana, GPT-image)Production tools (Nightjar, Photoroom, Claid)
1. Subject + Surface + Light + Lens + FinishKeyword-pool prompt; vocabulary maps to attention weightsNarrative paragraph form recommendedSaved as a reusable Photography Style; ingredient persists when prompts change
2. Pose + Framing + AngleComposition tokens in keyword poolComposition vocabulary inside the same descriptive paragraphSaved as a reusable Composition; separated from style
3. Reference-Image AnchoringIP-Adapter, ControlNet, img2imgMulti-image multimodal context with explicit role assignmentEdit-tab @image1, @image2 syntax with up to 8 inputs
4. Color LockHex tokens in prompt plus img2img anchorHex tokens read literally; positive preservation clause required/color command and Recolor Edit Shortcut
5. Constraint and ExclusionExplicit negative prompt field with CFGNo field; positive reframing or declarative negationProduct preservation by default; Custom Directions for additional rules
6. Save and ApplySaved prompt template, fixed seed, saved adapterNo native save layer; teams build their ownRecipes capture the full structured configuration

No tool wins every column. Diffusion stacks are unmatched on negative-prompt control through CFG. Instruction-tuned models are stronger on multi-image semantic understanding and clean text rendering. Production tools win on the save-and-reuse axis, because that is what they are designed for.

Where Prompt-Only Workflows Stop Scaling

The six patterns above get a brand around 80 percent of the way to consistent, on-brand AI product photography. The remaining 20 percent is a different problem. It is not "how do I write a better prompt." It is "how do I make sure the prompt I just wrote runs the same way 200 more times, across two team members, across three months, and across a catalog refresh."

Three documented failure modes of prompt-only workflows at catalog scale:

  1. Drift across operators. A founder writes a prompt; an assistant retypes it; an agency rewrites it; an LLM-generated rewording adds an adjective. By image 100, the catalog has fragmented into four micro-styles that no longer look like one brand.
  2. Drift across architectures. A team migrates from Stable Diffusion to Nano Banana mid-year. The negative prompt list silently stops working, image quality degrades, and the team blames the model rather than the architecture mismatch (Pattern 5).
  3. Drift across time. A brand returns six months later for a seasonal refresh. The prompt that worked is gone, the seed is gone, the reference images are unlabeled in a downloads folder. The next batch is a new shoot in everything but name.

The fix is structural. Move the configuration out of the prompt box and into a structured saved object. Then move the saved object out of one person's account and into a shared Team workspace.

Prompt patterns scale until two things happen: a second person starts writing prompts, or a second month passes. After that, the only reliable way to keep a catalog visually unified is to save the configuration as a structured object that can be re-applied without re-typing. Tools like Nightjar are built to be that object layer. Photography Styles save Pattern 1. Compositions save Pattern 2. The Edit-tab @image syntax expresses Pattern 3. The /color command expresses Pattern 4. Product preservation defaults plus Custom Directions express Pattern 5. Recipes plus Teams express Pattern 6. The Library keeps everything findable months later through AI semantic search, the operational version of "I know the reference image is in there somewhere."

For broader context, see the broader guide to consistent AI product photography, or consistent AI product photos guidance for additional tactical detail.

Frequently Asked Questions

Do negative prompts still work in 2026? Yes for diffusion models (Stable Diffusion, SDXL, Flux variants), where Classifier-Free Guidance gives the negative prompt field a real, mechanical effect at each denoising step (arXiv:2406.02965). No for instruction-tuned models like Google Imagen and Gemini Nano Banana, which have no negative prompt field; passing negativePrompt to the Nano Banana Pro API returns a 400 error (Apiyi). For instruction-tuned models, rewrite the negative list as positive instructions inside the prompt body ("empty street" beats "no cars"); Google's own documentation recommends this approach (Google Cloud Blog).

What camera vocabulary do AI image models actually understand? The reliable vocabulary is the same language professional photographers use: focal length (50mm, 85mm, 100mm macro), aperture (f/2.8, f/8), shot type (three-quarter, top-down, packshot, hero), and lighting setup (single softbox at 45 degrees, three-point, golden hour backlighting). Google's official Nano Banana prompting guide explicitly accepts terms like "three-point softbox setup," "Chiaroscuro lighting with harsh, high contrast," and "cinematic color grading with muted teal tones." The 50mm lens matches the human eye and is the ecommerce default; 85mm gives more working distance for medium products; 100mm macro is the standard for jewelry and beauty close-ups.

How do I keep the same product across many AI shots? Anchor the actual product as a reference image (Pattern 3) rather than describing it in text. For diffusion stacks, that means an IP-Adapter or img2img workflow with the source product as the conditioning image (Ye et al., arXiv:2308.06721). For instruction-tuned models, pass the product as a multimodal input with an explicit role sentence ("Use @image1 as the actual product, place it in this new scene"). For catalog-scale work where the same product needs to appear across hundreds of shots in the same style, save the surrounding direction (lighting, composition, background) as a reusable configuration so only the product Asset changes between Generations.

How long should an AI image prompt be for a product photo? Long enough to fill the five slots in Pattern 1 (subject, surface, light, lens, finish) and short enough that no slot has more than one or two well-chosen tokens. Google's product photography template is two sentences in narrative paragraph form, not a 50-keyword list. The single most common mistake is overstuffing: listing twelve adjectives and three lighting types in one prompt makes the model average them into muddy output. One light source described well beats three light sources described loosely.

Can I use a reference image instead of writing a prompt? Yes, and for many product photography tasks the reference image is the more reliable input. A reference gives the model a high-dimensional visual anchor that text cannot describe with equivalent fidelity. The two underlying mechanisms are IP-Adapter for diffusion models (a 22M-parameter adapter that separates text and image conditioning, arXiv:2308.06721) and multi-image multimodal context for instruction-tuned models. For multi-image work, assign one explicit role per reference: "Use A as the structure and B as the texture" beats sending five images and hoping the model averages them well.

Why do my AI product images look fake or plastic? Almost always because the prompt is missing the optical signature that makes a photograph read as a photograph: focal length, aperture, lighting source, and finish vocabulary. The brain reads "real photograph" through depth of field, distortion, and bokeh shape, all of which require the lens and lighting slots in Pattern 1. Adding "85mm at f/2.8, single softbox at 45 degrees from camera left, matte ceramic finish" to a prompt that previously said only "studio product photo" usually produces an immediately more believable result. The deeper fix is to save that lens-and-light combination as a reusable Photography Style so the optical signature stays consistent across the catalog.

What is the best prompt structure for realistic AI product photos? There is not one structure; there are six patterns that compose. Pattern 1 (Subject + Surface + Light + Lens + Finish) handles the photographic feel; Pattern 2 (Pose + Framing + Angle) handles the geometry; Pattern 3 (Reference-Image Anchoring) handles identity and style transfer; Pattern 4 (Color Lock) handles exact color; Pattern 5 (Constraint and Exclusion) handles what the image must not contain, with separate rules for diffusion versus instruction-tuned models; Pattern 6 (Save and Apply) handles the part prompts cannot solve, which is reusing the entire setup at catalog scale. Most production teams use all six in combination.


References

Tier 1 (Official documentation, primary research, academic)

Tier 2 (Established industry publications and well-known practitioner sources)

Tier 3 (Supporting context)