Abstract

The Diffusion Transformer plays a pivotal role in advancing text-to-image and text-to-video generation, owing primarily to its inherent scalability. However, existing controlled diffusion transformer methods incur significant parameter and computational overheads and suffer from inefficient resource allocation due to their failure to account for the varying relevance of control information across different transformer layers. To address this, we propose the Relevance-Guided Efficient Controllable Generation framework, RelaCtrl, enabling efficient and resource-optimized integration of control signals into the Diffusion Transformer. First, we evaluate the relevance of each layer in the Diffusion Transformer to the control information by assessing the "removal influence"—i.e., the impact of skipping each control layer on both the quality of generation and the control effectiveness during inference. Based on the strength of the relevance, we then tailor the positioning, parameter scale, and modeling capacity of the control layers to reduce unnecessary parameters and redundant computations. Additionally, to further improve efficiency, we replace the self-attention and MLP in the commonly used copy block with the carefully designed Two-Dimensional Shuffle Mixer (TDSM), enabling efficient implementation of both the token mixer and channel mixer. Both qualitative and quantitative experimental results demonstrate that our approach achieves superior performance with only 15% of the parameters and computational complexity compared to PixArt-δ.

Some Visualizations of RelaCtrl

Method

method

The overall architecture of RelaCtrl. Control block locations are prioritized based on the ControlNet Relevance Score, ranked from highest to lowest. The direct duplication of the main branch in the original ControlNet is replaced with the carefully designed Reference-Guided Lightweight control block. Additionally, the Two-Dimensional Shuffle Mixer effectively reduces model parameters and computational overhead while preserving performance.

Results of Varying Conditions

efficiency

Compared with SOTA Methods

efficiency

Quantitative Result Compared with SOTA Methods

method

Efficiency Compared with SOTA Methods

method

More Visualizations of RelaCtrl

Canny to Realistic

sample7
The woman in the image is smiling warmly, wearing a fur-lined coat, with voluminous, curly hair.
sample7
sample7
A solitary, leafless tree stands in a vast desert with a glowing sunset behind it, casting long shadows.
sample7
sample7
A festive scene with decorated cupcakes and gingerbread houses, set against a warm, cozy Christmas tree background.
sample7
sample7
A bright, modern dining room with a wooden table, elegant chairs, a bouquet of flowers, and large windows.
sample7

Depth to Realistic

sample7
A cozy dining area with a table set for two, featuring a bouquet of red flowers and lush plants by large windows.
sample7
sample7
A young woman with short, platinum blonde hair and a casual expression, wearing a denim jacket in soft lighting.
sample7
sample7
A close-up of a man with a strong beard, intense eyes, and slightly tousled dark hair, exuding confidence.
sample7
sample7
A couple stands hand in hand, facing the sun in a serene, golden-hued landscape, surrounded by trees and nature.
sample7

Hed to Realistic

sample7
A couple walks down a sunlit dirt path, surrounded by lush trees, bathed in the warm glow of the setting sun.
sample7
sample7
A happy golden retriever stands in a grassy field, panting with a joyful expression, surrounded by lush greenery.
sample7
sample7
A majestic snow-capped mountain range under dramatic skies, with sharp peaks and deep valleys, showcasing winter's grandeur.
sample7
sample7
A bride smiles radiantly, adorned with a floral crown of white roses and a veil, exuding joy on her special day.
sample7

Seg to Realistic

sample7
This is the same image of a snow-capped mountain range under dramatic skies, with sharp peaks and deep valleys.
sample7
sample7
This is the same beautiful image of a bride with a floral crown, smiling joyfully on her wedding day.
sample7
sample7
A man in a formal uniform, with medals on his chest, gazes thoughtfully to the side in a black-and-white portrait.
sample7
sample7
A hiker stands atop a rocky outcrop, gazing at a vast, misty mountain landscape below, with a winding path visible.
sample7

RelaCtrl with Different Community Models

Canny to Paint

sample7
A woman dressed in a colorful kimono with floral patterns stands gracefully, surrounded by painted lotus flowers in soft watercolors.
sample7
sample7
A woman with striking green eyes, wearing a red scarf, stands against a soft watercolor background with vibrant red splashes.
sample7
sample7
A solitary figure walks through a city street at sunset, with warm light reflecting off the wet pavement and tall buildings.
sample7

Depth to Oil

sample7
A rugged man with a thick beard and intense expression, dressed in a brown coat, painted in bold, dramatic oil strokes.
sample7
sample7
A woman with dramatic makeup and red face paint, featuring a star motif on her forehead, set against an intense, dark oil painting background.
sample7
sample7
A woman with platinum blonde hair and striking blue eyes gazes intensely at the viewer, her features highlighted in an oil painting style.
sample7

Hed to Gufeng

sample7
A serene watercolor painting of a lush landscape, featuring multiple cascading waterfalls surrounded by vibrant green foliage and trees.
sample7
sample7
A watercolor scene depicting a bustling waterfront with traditional boats and a city skyline featuring a prominent clock tower in the distance.
sample7
sample7
A watercolor painting of a woman wearing a long coat, standing near a window, with soft tones and a serene expression.
sample7

Seg to Pixel

sample7
A pixelated depiction of a scenic town with castles, a bridge, and boats on a river, creating a nostalgic, retro vibe.
sample7
sample7
A pixelated view of a grand palace with golden domes, reflecting in the water, surrounded by lush greenery and flying birds.
sample7
sample7
A pixelated winter scene featuring a church surrounded by snow-covered trees, with towering mountains in the background.
sample7
-->