I have a potential project for my e-commerce where I want to allow users to upload images of their house exteriors and impaint awnings.
2) If these are reasonable, a WebGPU demo would be great..
https://characterdesignreferences.com/artist-of-the-week-3/m...
Edit: I think I found it https://huggingface.co/hustvl/Moebius
I have an example of interior decorating inpainting where I replaced a large floor-to-ceiling window with a mirror, and the result was pretty impressive using NB Pro from nearly a year ago.
Locally hostable? For my money I'd argue Flux.2 Klein but Qwen-Edit still puts in the work.
(I'm counting only times I used generative editing options in my Galaxy phone - if I were to take your question literally, it would be "at least once every other day", simply due to rotating and cropping.)
I do agree, however, that the Flux2 family is the SoTA at the moment. Running locally via something like Comfy gets incredible results.
So you're saying that, if I can calculate from the picture the position (height, inclination and such), and I can render the model (should be doable) for that height and angle, my best course of action could be to combine original + render and only at the end use a visual model? That could be interesting.
What art?
We’re talking about generated pictures, aka slop, not art made by a real human.
And I don’t know if you’ve been paying attention but people seem to be pretty tired of the slop. I don’t think it would be appreciated nearly as much as you think.
People are tired of marketing. AI generated slop people are annoyed with, is garbage produced for marketing reasons, and it's distinctly noticeable precisely because all the bottom-feeder marketing houses switched to using it. But it's not the AI itself that's the problem here. Slop was here before, but it was made with cheap protein-based image generators. Silicon-based generators are just cheaper.
> People are tired of marketing.
You know what, I'll give you that one. I find most generated art pretty tasteless, but I have enjoyed the occasional piece of fiction with small generated elements for atmosphere. I still hesitate to call it 'art', but I will grant it's not all 'slop'.
But for the second part:
> But it's not the AI itself that's the problem here. Slop was here before, but it was made with cheap protein-based image generators. Silicon-based generators are just cheaper.
I think the problem is how much cheaper it is now. I would estimate generating a picture is at least 2 orders of magnitude cheaper than paying even a cheap human, so with the same amount of money being invested into slop we are due for - and seeing - a huge tidal wave of it, because the same amount of money turns out way more crap now.

(*) Equal Contribution, (†) Project Leader, (📧) Corresponding Author.
1Huazhong University of Science and Technology 2VIVO AI Lab
While 10B-level industrial foundation models have pushed the boundaries of image inpainting, their prohibitive computational costs severely hinder practical deployment. Constructing a highly optimized task-specific specialist offers a promising solution; however, extreme structural compression inevitably triggers a severe representation bottleneck. To conquer this, we propose Moebius, a highly efficient lightweight inpainting framework. We systematically reconstruct the diffusion backbone by introducing the Local-λ Mix Interaction (LλMI) block. Comprising Local-λ and Interactive-λ modules, it elegantly summarizes spatial contexts and global semantic priors into fixed-size linear matrices, preserving complex latent interactions while drastically shedding parameters. Furthermore, to unlock the full representational capacity of this highly compact architecture, we synergistically pair it with an adaptive multi-granularity distillation strategy. Operating strictly within the latent space to avoid expensive pixel-space decoding, this strategy dynamically balances multiple gradient-based losses to achieve high-fidelity alignment. Extensive experiments across natural and portrait benchmarks demonstrate that this optimal synergy enables Moebius to rival or even surpass the generation quality of the 10B-level industrial generalist FLUX.1-Fill-Dev. Remarkably, Moebius achieves this using less than 2\% of the parameters (0.22B vs. 11.9B) while delivering a >15× acceleration in total inference time, setting a new efficiency standard for high-fidelity inpainting.

Overall pipeline of Moebius. We adopt the Latent Diffusion Model (LDM) framework equipped with Latent Categories Guidance (LCG). To achieve extreme architectural efficiency, the denoising U-Net is systematically restructured using our proposed LλM I blocks (detailed in Sec. 3.2). Furthermore, an adaptive multi-granularity distillation strategy (Sec. 3.3) is applied during training to align our lightweight specialist with the high-capacity teacher, successfully mitigating the capacity drop caused by extreme structural compression.










































































@misc{DuanAndXu2026Moebius,
title={Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance},
author={Kangsheng Duan and Ziyang Xu and Wenyu Liu and Xiaohu Ruan and Xiaoxin Chen and Xinggang Wang},
year={2026},
eprint={2606.19195},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2606.19195},
}