Architecture, Design, and Creativity @ ICCV 2023

Some of the latest works in Computer vision related to architecture, design, and creativity at the International Conference on Computer Vision in Paris last week.

Oct 09, 2023

midjourney-designs — Midjourney generating art, houses, and urban scenes. Seemingly good; but does it really carry meaning? Do we need ‘smart’ machines that generate designs for us? How can we truly complement architecture, design, and creativity with the surge of large models? What should we ask (of) these models?

This short post comprehends some of the latest computer vision works and insights that relate to architecture, design, and / or creativity which could be found at last week’s International Conference on Computer Vision (ICCV) — which was held in Paris, Oct 2–6, 2023. The post is split in two: a summary of our own ICCV workshop and a small list of ICCV papers that catched my eye.

1st Computer Vision Aided Architectural Design Workshop

Together with an awesome team of scholars from Delft and Stanford, we organized the 1st Workshop on Computer Vision Aided Architectural Design.

Team: Seyran Khademi (Delft), Fatemeh Mostafavi (Delft), Jan van Gemert (Delft), Iro Armeni (Stanford), Michael Franzen (Archilyse), Matthias Standfest (Archilyse)

Bridging the Gap

An important goal with the bringing together of enthusiasts in this so to say cross-disciplinary field is that it could potentially create a more effective dialogue between the (architectural) design discipline — whether scholars or practitioners in these disciplines — and that of computer science. Such a dialogue is critical in understanding the relevant directions within computer science that will ultimately complement architecture, design, or creativity in a meaningful manner. Questions related to how to set up and evaluate certain (potentially to-be-automated) tasks, what tools could be fruitful in design, what are the limitations of the trained mathematical models and to what extend are they transparent (e.g. in terms of justification: why (not) do they work?), which datasets are missing and how to clean them up and make them trustworthy, etc., are key here and can, to a large extend, only be critically investigated when designers and computer scientists sit together and co-operate. Hence, the workshop is as well set up to rethink the steps toward a more profound interface between design, creativity and computer science.

I. Talks

The workshop was a mix of several things. We had, first of all, a rich and diverse set of amazing speakers (see website for more details on the speakers):

Noah Snavely, Cornell Tech and Google Research.
Noah’s talk was mostly about 3D understanding and depicting scenes from images, and his endeavor to create “a digital replica of all of the world’s built-environment”.
Daniel Aliaga, Purdue University.
Daniel’s talk shed lights on the many researches he and his team did on, for example, generative modelling of urban scenes (mostly on the city-scale).
Francis Engelmann, ETH Zurich.
Francis’ talk was about the different representations and scales in 3D scene understanding.
Matthias Standfest, Founder of Archilyse A.G.
Matthias’ talk was about the development of the most feature-rich architectural dataset that exists at this moment. Swiss dwellings is the first large-scale dataset that contains architectural data at the level of the building.

II. Competition

Another part of the mix was a free-to-the-public competition on Floor Plan Auto-completion at Scale. We set out a competition online (in Codalab) for which we developed our own dataset called Modified Swiss Dwellings (MSD). MSD is the first machine learning dataset that can be easily used for floor plan generation at the building level. (Until now, even though floor plan generation is and was a hot-topic among computer scientists, floor plan generative models have been developed for single-unit apartments only — simply because large-scale floor plan datasets only contained single-unit apartments. “Simply” might be a bit misleading here as it actually remains unknown whether the state-of-the-art statistical models generalize well to floor plans that consist of multiple apartments.) MSD contains 4500+ well-curated and consistently formatted medium- to large-scale floor plans, for which image, geometry, and graph representations are made public. (Floor plans range from having one to about twenty apartments.) See our GitHub on how to load, use, and manipulate the different data types. We had several amazing contenders and the results of the workshop will become available at our website soon !

III. Papers and Poster

In order to accommodate a rich set of contributors to our workshop, we included a paper track and poster panel as well. The accepted papers to our workshop (which will are part of the official ICCVW proceedings as well!) are as follows:

Scalable MAV Indoor Reconstruction with Neural Implicit Surfaces
PanoStyle: Semantic, Geometry-Aware, and Shading Independent Photorealistic Style Transfer for Indoor Panoramic Scenes
MARL: Multi-scale Archetype Representation Learning for Building Energy Estimation
SSIG: A Visually-Guided Graph Edit Distance for Floor Plan Similarity (our own)
Floor Plan Reconstruction from Sparse Views: Combining Graph Neural Network with Constrained Diffusion

(Missing links will become available soon.)

The above-mentioned articles were presented through posters at the final stage of the workshop. Besides the accepted papers, several other works (from the ICCV main proceedings and outside ICCV) were presented as well:

Doppelgangers: Learning to Disambiguate Images of Similar Structures
SGAligner: 3D Scene Alignment with Scene Graphs
GlobalMapper: Arbitrary-Shaped Urban Layout Generation
Boosting 3-DoF Ground-to-Satellite Camera Localization Accuracy via Geometry-Guided Cross-View Transformer
Carbon Image Project
A Taxonomy of Visual Data in Architectural Design

Papers

Below, I will highlight some of the ICCV conference papers that triggered me the most. For now, I have only added comments to the first paper. In due time, I will add comments to the others as well.

—

Re:PolyWorld — A Graph Neural Network for Polygonal Scene Parsing
Authors: Stefano Zorzi, Friedrich Fraundorfer

Relevancy

In this paper, the task is to extract precise structure from raster imagery of built-environment-like scenes (whether images of floor plans, satellite images of buildings, or photographs of houses). Some examples are given on the right of the figure below. This task, under the banner of reconstruction or vectorization, is incredibly relevant for many applications. Most architectural or built-environment data is available in image, video, or point-cloud (or some other ‘raw’) form. For many downstream tasks, however, the structure is wanted instead, such the reconstructed floor plans, building detections, or parsed wireframes (again: see figure below). Structural representations can be directly leveraged in CAD software (imported as geometry directly) or as a tool for fast analysis e.g. for research purposes or investigation in general.

Key contribution

The main improvement over other methods seemed to lay in the way how the structure (the set of closed-loop polygons) are represented. The structure of the scene was represented as an ‘extended’ permutation matrix that describes the set of connected polygons as a graph in which each room effectively becomes a cycle in that graph. The representation was leveraged in a smart way when choosing and designing the model and finding the objective. Their method improved the state-of-the-art on several benchmarks.

—

GlueStick: Robust Image Matching by Sticking Points and Lines Together
Rémi Pautrat, Iago Suárez, Yifan Yu, Marc Pollefeys, Viktor Larsson

—

Doppelgangers: Learning to Disambiguate Images of Similar Structures
Ruojin Cai, Joseph Tung, Qianqian Wang, Hadar Averbuch-Elor, Bharath Hariharan, Noah Snavely

—

SGAligner: 3D Scene Alignment with Scene Graphs
Sayan Deb Sarkar, Ondrej Miksik, Marc Pollefeys, Daniel Barath, Iro Armeni

—

DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion
Maham Tanveer, Yizhi Wang, Ali Mahdavi-Amiri, Hao Zhang

—

GlobalMapper: Arbitrary-Shaped Urban Layout Generation
Liu He, Daniel Aliaga

—

EigenPlaces: Training Viewpoint Robust Models for Visual Place Recognition
Gabriele Berton, Gabriele Trivigno, Barbara Caputo, Carlo Masone

—

Global Features are All You Need for Image Retrieval and Reranking
Shihao Shao, Kaifeng Chen, Arjun Karpur, Qinghua Cui, André Araujo, Bingyi Cao

—

Diffusion Model as Representation Learner
Xingyi Yang, Xinchao Wang

—

Learning by Sorting: Self-supervised Learning with Group Ordering Constraints
Nina Shvetsova, Felix Petersen, Anna Kukleva, Bernt Schiele, Hilde Kuehne

—

I hoped you liked reading it.

Cheers,
Casper

Casper’s Substack

Discussion about this post