MatEval: Evaluating Indoor 3D Scene
Material Recovery from a Single Image

Simon Fraser University
Conference on Robots and Vision (CRV) 2026
Overview of the material recovery problem.
Overview of our problem statement. The input is a single-view RGB image I0 and the geometry of the scene G. To focus on material recovery, we use ground-truth geometry, factoring out geometric errors. The output is the reconstructed lighting L(p,e) and the recovered procedural materials M(θ) or image-based material UV{a, m, r} of the scene, which we render to images IDiffProcMat and IMonoIR respectively.

Abstract

Converting a single image to a 3D scene with geometry, materials, and lighting is a challenging problem. While geometry reconstruction from a single view has been extensively studied, material recovery for the single-photo-to-scene task remains underexplored. Recent advances in differentiable procedural materials, inverse rendering, and texture generation can be potentially applied to this task. However, they have not been systematically evaluated in a benchmark.

In this project, we establish a comprehensive benchmark for material recovery in the single-image-to-scene task. We assume ground-truth 3D geometry as input to isolate material estimation from geometric error. We evaluate three families of methods inspired by recent state-of-the-art approaches in inverse rendering, texture generation, and single-image-to-scene.

Our results show that single-view inverse rendering baselines outperform procedural material baselines (19.40 vs 13.60 in PSNR for albedo on original views), highlighting the strong potential of methods based on single-view inverse rendering for material recovery in the single-image-to-scene task. We will release the full dataset, evaluation code, and baseline implementations to support future work.

Benchmark Dataset

MatEval consists of 180 scenes from 3D-FRONT and 4 high-quality scenes from Bitterli. Each scene provides one original camera view plus four novel views (5° camera-center rotations), with ground-truth RGB, albedo, and part-level segmentation. Compared with OpenRooms, our scenes use professionally designed 3D-FUTURE furniture, are more densely populated (13.2 objects/scene vs. 9.4), and look more photorealistic.

Comparison of MatEval and OpenRooms renderings.
Comparison between MatEval (ours) and OpenRooms. MatEval includes scenes curated from the 3D-FRONT (left) and Bitterli (middle) datasets. Compared with OpenRooms (right), our dataset features furniture with more diverse and photorealistic material textures, leading to richer visual variety across indoor scenes.

Benchmarked Baselines

We evaluate three families of baselines:

  • Differentiable Procedural Materials (DiffProcMat)
  • Monocular Inverse Rendering + Nearest Neighbor (MonoIR + NN)
  • Monocular Inverse Rendering + TEXGen (MonoIR + TEXGen)

MonoIR variants use ColorfulShading, RGBX, or Marigold to estimate camera-space materials, which are then projected onto the 3D geometry. All baselines run under a three-stage framework: material initialization → light initialization → joint optimization.

Pipeline for lifting 2D materials to 3D.
Overview of pipeline to lift 2D materials to 3D. We lift 2D estimations {a, m, r} to 3D in UV space for each object i. For the nearest neighbor approach: (1) map estimated 2D material properties {a, m, r} to a point cloud ppartial{a,m,r} and fill unobserved regions with nearest neighbor lookup to get pnn{a,m,r}, and (2) project pnn{a,m,r} to UV space UVnn{a,m,r}. For the texture generation approach: (1) map the albedo and object mask from camera space to UV space UVpartial{a} and UVpartialMi, (2) obtain the position map and mask map UVfullMi from the 3D model of the object Gi (3) apply TEXGen to fill unobserved regions and obtain UVtexgen{a}.

Key Finding

Single-view inverse rendering substantially outperforms procedural-material optimization. On albedo recovery from the original camera view (3D-FRONT), the best MonoIR baseline reaches PSNR 19.40 versus 13.60 for the strongest DiffProcMat baseline. With ground-truth albedo as input, the same lifting pipeline reaches 32.98, showing how much headroom remains for the 2D estimators themselves.

Method familyBest baselineAlbedo PSNR ↑
DiffProcMatPSDR-Room13.60
MonoIR + NNRGBXa19.40
MonoIR + TEXGenRGBXa19.71
Upper bound (GT albedo → NN)32.98

3D-FRONT, original camera view, unaligned albedo. See the paper for full results across RGB, novel views, and scale-aligned albedo.

Qualitative Results

Qualitative comparisons on original camera view.
Qualitative comparisons on the original camera view on 3D-FRONT scenes. “PSDR-Room*” indicates the PSDR-Room baseline using meshes segmented by PartField. Superscripts denote which material channels are estimated by MonoIR methods. MonoIR methods (last 3 columns) recover diverse material textures and avoid the local minima issue seen in DiffProcMat (2nd and 3rd columns).
Novel-view albedo on unobserved regions.
Qualitative comparison of aligned albedo for novel view renderings on 3D-FRONT scenes. Darkened regions are previously observed regions. TEXGen generates reasonable textures for unobserved regions.

Limitations

Mirror region failures.
Mirror region failures. All baselines struggle with mirrors. In this scene from the Bitterli dataset, for the mirror on the closet, DiffProcMat assigns a median color and MonoIR produces a “baked in” appearance.
TEXGen noise issues on featureless regions.
TEXGen noise issues. The darkened regions are previously observed regions. TEXGen generates noise in featureless regions (e.g., ceilings).

BibTeX

@inproceedings{yang2026mateval,
  title     = {{MatEval}: Evaluating Indoor {3D} Scene Material Recovery from a Single Image},
  author    = {Yang, Dongchen and Savva, Manolis},
  booktitle = {Proceedings of the Conference on Robots and Vision (CRV)},
  year      = {2026}
}

Acknowledgements

TODO: acknowledgements text.