Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation

Language: 中文 | English

1 minute read

Published: August 11, 2025

Note: this is the English version paired with the Chinese post Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation.

Abstract

Visual effects (VFX) are central to modern video and film production. Although recent video generation models enable low-cost VFX creation, they are typically trained with single-effect LoRA adapters and therefore cannot produce multiple effects at user-specified locations. To address cross-effect interference and the lack of spatial controllability in joint multi-VFX training, we propose Omni-Effects, the first unified framework for prompt-driven and spatially controllable composite VFX generation. The core design includes: 1) a LoRA-based Mixture-of-Experts (LoRA-MoE) module that integrates diverse effects in a single model while alleviating inter-task interference; and 2) a Spatial-Aware Prompt (SAP) module that injects spatial masks into text tokens for precise spatial control, equipped with an Independent-Information Flow (IIF) submodule to isolate control signals of different effects and avoid unwanted blending. We further construct the Omni-VFX dataset and a dedicated VFX evaluation protocol. Extensive experiments demonstrate that Omni-Effects achieves accurate spatial control and diverse, high-quality effects, supporting user-defined effect types and locations.

Introduction

Briefly introduce:

the role and cost of traditional VFX production;
limitations of single-effect LoRA-based methods for real-world workflows;
the need for unified, spatially-controllable multi-effect generation.

Method

1. Problem Definition

Formulate unified VFX generation with:

multiple effect types;
user-specified spatial regions;
quality, independence and controllability requirements.

2. Approach

Describe:

the LoRA-MoE module: expert design, routing / combination strategy and how it reduces cross-effect interference;
the SAP module: how spatial masks are embedded into prompts;
the IIF design: how information flow is separated across effects.

3. Data and Training

Summarize the Omni-VFX dataset construction pipeline and the training setup for the unified model.

4. Results and Analysis

Highlight:

single-effect quality vs. single-LoRA baselines;
spatial accuracy compared with existing editing / generation methods;
independence of multiple effects on the same frame.

Conclusion and Future Work

Summarize the contributions and outline:

extension to higher-resolution and production-grade VFX;
better temporal modeling for long videos;
interactive tools built on top of Omni-Effects.

References

Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation (arXiv)

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

AstraNav-World: World Model for Foresight Control and Consistency

less than 1 minute read

Published: December 25, 2025

个人小结

也是工作量极大的一篇工作，我其实比较惊喜的是在训练的时候仅用Lora就可以很快的学习来自VLA的planning特征，并且能够实现视觉预测和动作预测的高效统一。并且在结构上SkyReels-v4也有使用我们类似的MMFCA的方法，个人感觉在结构上，我们的模型还是比较超前的，并且里面也有非常多的小细节，后面可以细细描述一下。

摘要

终身具身导航要求智能体能跨任务累积、保存并利用空间语义经验，从而在新环境中高效探索、在熟悉环境中快速抵达目标。现有以物体为中心的记忆框架虽具备可解释性，但依赖检测与重建流水线，鲁棒性和可扩展性受限。为此，本文提出AstraNav-Memory以图像为中心的记忆框架，通过高效的视觉上下文压缩模块与基于Qwen2.5-VL的导航策略端到端耦合，实现长时隐式记忆。该框架基于冻结DINOv3特征的ViT骨干网络，结合轻量级PixelUnshuffle+Conv块构建视觉tokenizer，支持可配置的压缩率——如16倍压缩设置下，每张图像仅编码为约30个token，将有效上下文容量从数十张图像扩展至数百张。在GOAT-Bench和HM3D-OVON基准上的实验结果表明，该方法取得了SOTA的导航性能，提升了陌生环境的探索效率，同时缩短了熟悉环境中的导航路径。消融实验进一步证明，适度的压缩率能在效率与精度间实现最优平衡。该研究证实，经压缩的以图像为中心的记忆框架可作为终身具身智能体的实用且可扩展的交互接口，使其能基于长时视觉历史进行推理，实现类人的高效导航。

[AAAI2026]Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation

less than 1 minute read

Published: August 11, 2025

个人小结

做了很久，工作量有点大。。讲一下这个工作的心路历程吧，一开始是想简单的做一个统一的能同时生成多个特效的视频，但是后面发现有些特效其实比较难以兼容，并且有些特效是个体级别的（比如物体消失、爆炸），有些则是画面级别的（比如天降大雪、花花世界这种）。而且当时发现市面上还没有人做这种可控制的协同多特效合成的工作，所以就由着这条路继续走了，一开始当然是沿着ControlNet走的，但是确实会引入较大的计算量，后面发现了EasyConrtol，觉得这种在attention级别的mask实现会更好一些，但最后效果上个人感觉差异不是很大。

Jasper (Jintao Chen)

Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation

Abstract

Introduction

Method

1. Problem Definition

2. Approach

3. Data and Training

4. Results and Analysis

Conclusion and Future Work

References

Share on

You May Also Enjoy

AstraNav-World: World Model for Foresight Control and Consistency

个人小结

AstraNav-World: World Model for Foresight Control and Consistency

[CVPR2026]AstraNav-Memory: Contexts Compression for Long Memory

摘要

[AAAI2026]Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation

个人小结