Stress Test — 2026-05-13

AI Quality Report

We ran 120 AI generations across the hardest real-world edge cases, found every failure mode, and fixed what we could. Here's exactly what works, what doesn't, and why.

120

Images generated

10 photos × 4 configs × 3 variants

API failures

100% pipeline stability

Prompt versions

v2 → v3 → v4 in one session

7.4

Quality score

Up from 6.0 (v2 baseline)

Quality Score Evolution

6/10

Baseline

Basic architecture lock + room-type rules

6.8/10

Better prompts

Mandatory furniture, darkness rule, aggressive premium

v4Current

7.4/10

Current

MANDATORY style overrides, anti-reimagination declutter

Results by Mode

🏠

Stage Empty Room

AI places furniture from scratch in an empty room

85%

Pass rate

Full furniture sets in standard rooms

Correct room type (sofa not dining table)

Walls, floors, windows untouched

Luxury/Scandinavian in industrial brick rooms → rug only (model ceiling)

🎨

Restyle Room

AI replaces furniture with a new style while preserving architecture

60%

Pass rate

Arch geometry fully preserved

Mirror reflections update coherently

Open doorways stay open

Strongly-styled rooms (dark velvet, saturated walls) resist replacement (model ceiling)

✨

Declutter & Clean

AI removes clutter, boxes and mess — nothing structural is changed

75%

Pass rate

Large furniture stays in place

Boxes, bags, scattered items removed

Walls and floors untouched

Completely dark/windowless rooms → AI reimagines the whole room (model ceiling)

⭐

Premium Listing

AI improves lighting and atmosphere without changing furniture or layout

70%

Pass rate

2/3 variants fully brightened on dark rooms

Furniture and layout unchanged

Improved after v3 prompt strengthening

1/3 variants seed-dependent → may stay dark. Workaround: always generate 3 variants

Bug Catalog

Click any bug to expand details and fix notes.

Prompt Changes Applied

Prompt v32026-05-13

Rule 4"DO NOT solve darkness by inventing windows or skylights. Darkness is never a reason to add a window."

Rule 5Floor material frozen — but PLACING FURNITURE ON THE FLOOR IS FULLY PERMITTED.

empty_staging"FURNITURE PLACEMENT IS MANDATORY. Not just a rug. Sofa + seating + tables required."

restyle"REMOVE ALL movable items regardless of how expensive or stylish they look. Even luxury furniture."

premium"Increase brightness SUBSTANTIALLY. Lift dark shadows aggressively. MAY open existing window coverings."

CLOSING_LOCKAdded window count assertion: same count as original — no new windows.

NEGATIVE_PROMPTAdded: hallucinated windows, added windows, added fireplace, new skylight.

Prompt v42026-05-13

luxury style"MANDATORY FURNITURE PLACEMENT — regardless of room style, whether industrial, rustic, brick, or minimalist."

scandinavian style"MANDATORY FURNITURE PLACEMENT — regardless of existing room style, whether industrial, rustic, dark, or modern."

declutter"CRITICAL: Do NOT reimagine, redesign, or recreate this room. Output must look like the EXACT SAME ROOM from the EXACT SAME ANGLE."

UX (SettingsScreen)Contextual banners: restyle→amber, empty_staging→blue, declutter→green, premium→purple.

Model Ceilings — What Cannot Be Fixed

These failures are fundamental to how Flux Kontext Pro works. No amount of prompt engineering fixes them — they require architectural changes to the pipeline.

🛋️

Restyle on strongly-styled furnished rooms

When the input has saturated, well-composed furniture (dark velvet sofa, matched color scheme), the model keeps it. Text instructions cannot override dominant visual conditioning.

Requires 2-stage pipeline: Stage 1 neutralize to beige/plain → Stage 2 empty_staging with target style. Doubles generation cost (~$0.08). Roadmap item.

Roadmap: 2-stage pipeline

🧱

Luxury/Scandi in industrial/brick rooms

When the room has exposed brick, track lights, and industrial aesthetic, the model refuses to place velvet or birch furniture. Only a rug is placed. Modern + neutral_sale work fine in the same room.

"MANDATORY PLACEMENT regardless of room style" instruction added in v4 — zero effect. Recommend modern or neutral_sale for industrial rooms.

UI warning planned

🌑

Declutter on completely windowless dark rooms

When a room has zero architectural anchors visible (all walls hidden by shelving, no windows, no light source), the model cannot remove clutter without reference points — it reimagines the entire room.

"CRITICAL: do not reimagine" instruction in v4 — zero effect. Needs upload-time darkness detection to warn users before generation.

Roadmap: darkness detection

🎲

Premium mode seed variance on very dark rooms

On deeply underexposed rooms, some seeds latch onto the dark input and don't diverge — the output stays dark. Other seeds brighten correctly.

Workaround: always generate 3 variants. 2/3 work correctly. The 1/3 failure rate is acceptable for practical use.

Workaround: 3 variants

Conclusions

✅Pipeline is rock solid — 120 generations, 0 API failures, 0 data loss bugs.

✅Architecture preservation is excellent — windows, arches, brick, sloped ceilings, mirrors all survive.

✅empty_staging works for all room types with modern/minimalist/cozy_family/neutral_sale styles.

✅Prompt engineering moved the quality score from 6.0 → 7.4 in one session.

⚠️Restyle on strongly-styled rooms (dark velvet, heavily saturated) needs a 2-stage pipeline — pure prompt work hit the ceiling.

⚠️Luxury + Scandinavian in industrial rooms: only a rug. Model ceiling confirmed. Recommend modern/neutral_sale for these rooms.

⚠️Declutter on completely dark/windowless rooms reimagines the room entirely. Upload-time brightness detection needed.

Try it on your own room

5 free credits — no card required.

Start for free

Test photos from Pexels (free license). AI by Flux Kontext Pro via fal.ai. Report date: 2026-05-13.