Generative image compression leverages the generative capabilities of diffusion models to achieve excellent perceptual fidelity at extreme-low bitrates. However, existing methods overlook the non-uniform complexity of images, making it difficult to balance global perceptual quality with local texture consistency and to achieve efficient allocation of coding resources. To address this issue, this paper proposes the Map-guided Masked Realistic Image Diffusion Codec (MRIDC), which aims to optimize the trade-off between local distortion and global perceptual quality in extreme-low bitrate compression. MRIDC integrates a vector-quantized image encoder with a diffusion-based decoder. At the encoding stage, the Map-guided Latent Masking (MLM) module enables adaptive resource allocation based on image complexity. At the decoding stage, the Bidirectional Prediction Controllable Generation (BPCG) module completes masked latent variables and reconstructs images. Experimental results demonstrate that MRIDC achieves state-of-the-art (SOTA) perceptual compression quality at extreme-low bitrates, effectively preserving feature consistency in key regions, advancing the perceptual rate-distortion performance curve, and establishing a new benchmark for balancing compression efficiency and visual fidelity.
Generative image compression leverages the generative capabilities of diffusion models to achieve excellent perceptual fidelity at extreme-low bitrates. However, existing methods overlook the non-uniform complexity of images, making it difficult to balance global perceptual quality with local texture consistency and to achieve efficient allocation of coding resources. To address this issue, this paper proposes the Map-guided Masked Realistic Image Diffusion Codec (MRIDC), which aims to optimize the trade-off between local distortion and global perceptual quality in extreme-low bitrate compression. MRIDC integrates a vector-quantized image encoder with a diffusion-based decoder. At the encoding stage, the Map-guided Latent Masking (MLM) module enables adaptive resource allocation based on image complexity. At the decoding stage, the Bidirectional Prediction Controllable Generation (BPCG) module completes masked latent variables and reconstructs images. Experimental results demonstrate that MRIDC achieves state-of-the-art (SOTA) perceptual compression quality at extreme-low bitrates, effectively preserving feature consistency in key regions, advancing the perceptual rate-distortion performance curve, and establishing a new benchmark for balancing compression efficiency and visual fidelity.
Generative image compression leverages the generative capabilities of diffusion models to achieve excellent perceptual fidelity at extreme-low bitrates. However, existing methods overlook the non-uniform complexity of images, making it difficult to balance global perceptual quality with local texture consistency and to achieve efficient allocation of coding resources. To address this issue, this paper proposes the Map-guided Masked Realistic Image Diffusion Codec (MRIDC), which aims to optimize the trade-off between local distortion and global perceptual quality in extreme-low bitrate compression. MRIDC integrates a vector-quantized image encoder with a diffusion-based decoder. At the encoding stage, the Map-guided Latent Masking (MLM) module enables adaptive resource allocation based on image complexity. At the decoding stage, the Bidirectional Prediction Controllable Generation (BPCG) module completes masked latent variables and reconstructs images. Experimental results demonstrate that MRIDC achieves state-of-the-art (SOTA) perceptual compression quality at extreme-low bitrates, effectively preserving feature consistency in key regions, advancing the perceptual rate-distortion performance curve, and establishing a new benchmark for balancing compression efficiency and visual fidelity.
Generative image compression leverages the generative capabilities of diffusion models to achieve excellent perceptual fidelity at extreme-low bitrates. However, existing methods overlook the non-uniform complexity of images, making it difficult to balance global perceptual quality with local texture consistency and to achieve efficient allocation of coding resources. To address this issue, this paper proposes the Map-guided Masked Realistic Image Diffusion Codec (MRIDC), which aims to optimize the trade-off between local distortion and global perceptual quality in extreme-low bitrate compression. MRIDC integrates a vector-quantized image encoder with a diffusion-based decoder. At the encoding stage, the Map-guided Latent Masking (MLM) module enables adaptive resource allocation based on image complexity. At the decoding stage, the Bidirectional Prediction Controllable Generation (BPCG) module completes masked latent variables and reconstructs images. Experimental results demonstrate that MRIDC achieves state-of-the-art (SOTA) perceptual compression quality at extreme-low bitrates, effectively preserving feature consistency in key regions, advancing the perceptual rate-distortion performance curve, and establishing a new benchmark for balancing compression efficiency and visual fidelity.
Generative image compression leverages the generative capabilities of diffusion models to achieve excellent perceptual fidelity at extreme-low bitrates. However, existing methods overlook the non-uniform complexity of images, making it difficult to balance global perceptual quality with local texture consistency and to achieve efficient allocation of coding resources. To address this issue, this paper proposes the Map-guided Masked Realistic Image Diffusion Codec (MRIDC), which aims to optimize the trade-off between local distortion and global perceptual quality in extreme-low bitrate compression. MRIDC integrates a vector-quantized image encoder with a diffusion-based decoder. At the encoding stage, the Map-guided Latent Masking (MLM) module enables adaptive resource allocation based on image complexity. At the decoding stage, the Bidirectional Prediction Controllable Generation (BPCG) module completes masked latent variables and reconstructs images. Experimental results demonstrate that MRIDC achieves state-of-the-art (SOTA) perceptual compression quality at extreme-low bitrates, effectively preserving feature consistency in key regions, advancing the perceptual rate-distortion performance curve, and establishing a new benchmark for balancing compression efficiency and visual fidelity.
Generative image compression leverages the generative capabilities of diffusion models to achieve excellent perceptual fidelity at extreme-low bitrates. However, existing methods overlook the non-uniform complexity of images, making it difficult to balance global perceptual quality with local texture consistency and to achieve efficient allocation of coding resources. To address this issue, this paper proposes the Map-guided Masked Realistic Image Diffusion Codec (MRIDC), which aims to optimize the trade-off between local distortion and global perceptual quality in extreme-low bitrate compression. MRIDC integrates a vector-quantized image encoder with a diffusion-based decoder. At the encoding stage, the Map-guided Latent Masking (MLM) module enables adaptive resource allocation based on image complexity. At the decoding stage, the Bidirectional Prediction Controllable Generation (BPCG) module completes masked latent variables and reconstructs images. Experimental results demonstrate that MRIDC achieves state-of-the-art (SOTA) perceptual compression quality at extreme-low bitrates, effectively preserving feature consistency in key regions, advancing the perceptual rate-distortion performance curve, and establishing a new benchmark for balancing compression efficiency and visual fidelity.