Stress Test — 2026-05-13

AI Quality Report

We ran 120 AI generations across the hardest real-world edge cases, found every failure mode, and fixed what we could. Here's exactly what works, what doesn't, and why.

120
Images generated
10 photos × 4 configs × 3 variants
0
API failures
100% pipeline stability
3
Prompt versions
v2 → v3 → v4 in one session
7.4
Quality score
Up from 6.0 (v2 baseline)

Quality Score Evolution

v2
6/10
Baseline

Basic architecture lock + room-type rules

v3
6.8/10
Better prompts

Mandatory furniture, darkness rule, aggressive premium

v4Current
7.4/10
Current

MANDATORY style overrides, anti-reimagination declutter

Results by Mode

🏠

Stage Empty Room

AI places furniture from scratch in an empty room

85%
Pass rate
Full furniture sets in standard rooms
Correct room type (sofa not dining table)
Walls, floors, windows untouched
Luxury/Scandinavian in industrial brick rooms → rug only (model ceiling)
🎨

Restyle Room

AI replaces furniture with a new style while preserving architecture

60%
Pass rate
Arch geometry fully preserved
Mirror reflections update coherently
Open doorways stay open
Strongly-styled rooms (dark velvet, saturated walls) resist replacement (model ceiling)

Declutter & Clean

AI removes clutter, boxes and mess — nothing structural is changed

75%
Pass rate
Large furniture stays in place
Boxes, bags, scattered items removed
Walls and floors untouched
Completely dark/windowless rooms → AI reimagines the whole room (model ceiling)

Premium Listing

AI improves lighting and atmosphere without changing furniture or layout

70%
Pass rate
2/3 variants fully brightened on dark rooms
Furniture and layout unchanged
Improved after v3 prompt strengthening
1/3 variants seed-dependent → may stay dark. Workaround: always generate 3 variants

Bug Catalog

Click any bug to expand details and fix notes.

Prompt Changes Applied

Prompt v32026-05-13
Rule 4"DO NOT solve darkness by inventing windows or skylights. Darkness is never a reason to add a window."
Rule 5Floor material frozen — but PLACING FURNITURE ON THE FLOOR IS FULLY PERMITTED.
empty_staging"FURNITURE PLACEMENT IS MANDATORY. Not just a rug. Sofa + seating + tables required."
restyle"REMOVE ALL movable items regardless of how expensive or stylish they look. Even luxury furniture."
premium"Increase brightness SUBSTANTIALLY. Lift dark shadows aggressively. MAY open existing window coverings."
CLOSING_LOCKAdded window count assertion: same count as original — no new windows.
NEGATIVE_PROMPTAdded: hallucinated windows, added windows, added fireplace, new skylight.
Prompt v42026-05-13
luxury style"MANDATORY FURNITURE PLACEMENT — regardless of room style, whether industrial, rustic, brick, or minimalist."
scandinavian style"MANDATORY FURNITURE PLACEMENT — regardless of existing room style, whether industrial, rustic, dark, or modern."
declutter"CRITICAL: Do NOT reimagine, redesign, or recreate this room. Output must look like the EXACT SAME ROOM from the EXACT SAME ANGLE."
UX (SettingsScreen)Contextual banners: restyle→amber, empty_staging→blue, declutter→green, premium→purple.

Model Ceilings — What Cannot Be Fixed

These failures are fundamental to how Flux Kontext Pro works. No amount of prompt engineering fixes them — they require architectural changes to the pipeline.

🛋️

Restyle on strongly-styled furnished rooms

When the input has saturated, well-composed furniture (dark velvet sofa, matched color scheme), the model keeps it. Text instructions cannot override dominant visual conditioning.

Requires 2-stage pipeline: Stage 1 neutralize to beige/plain → Stage 2 empty_staging with target style. Doubles generation cost (~$0.08). Roadmap item.
Roadmap: 2-stage pipeline
🧱

Luxury/Scandi in industrial/brick rooms

When the room has exposed brick, track lights, and industrial aesthetic, the model refuses to place velvet or birch furniture. Only a rug is placed. Modern + neutral_sale work fine in the same room.

"MANDATORY PLACEMENT regardless of room style" instruction added in v4 — zero effect. Recommend modern or neutral_sale for industrial rooms.
UI warning planned
🌑

Declutter on completely windowless dark rooms

When a room has zero architectural anchors visible (all walls hidden by shelving, no windows, no light source), the model cannot remove clutter without reference points — it reimagines the entire room.

"CRITICAL: do not reimagine" instruction in v4 — zero effect. Needs upload-time darkness detection to warn users before generation.
Roadmap: darkness detection
🎲

Premium mode seed variance on very dark rooms

On deeply underexposed rooms, some seeds latch onto the dark input and don't diverge — the output stays dark. Other seeds brighten correctly.

Workaround: always generate 3 variants. 2/3 work correctly. The 1/3 failure rate is acceptable for practical use.
Workaround: 3 variants

Conclusions

Pipeline is rock solid — 120 generations, 0 API failures, 0 data loss bugs.
Architecture preservation is excellent — windows, arches, brick, sloped ceilings, mirrors all survive.
empty_staging works for all room types with modern/minimalist/cozy_family/neutral_sale styles.
Prompt engineering moved the quality score from 6.0 → 7.4 in one session.
⚠️Restyle on strongly-styled rooms (dark velvet, heavily saturated) needs a 2-stage pipeline — pure prompt work hit the ceiling.
⚠️Luxury + Scandinavian in industrial rooms: only a rug. Model ceiling confirmed. Recommend modern/neutral_sale for these rooms.
⚠️Declutter on completely dark/windowless rooms reimagines the room entirely. Upload-time brightness detection needed.

Try it on your own room

10 free credits — no card required.

Start for free

Test photos from Pexels (free license). AI by Flux Kontext Pro via fal.ai. Report date: 2026-05-13.