IC-LoRA (In-Context LoRA) is the most significant architectural addition in LTX Video 2.3 for character and face consistency work. If you have trained LoRAs for LTX 2 and are upgrading to 2.3, understanding IC-LoRA is important — it changes both training and inference for identity-focused use cases.
This guide covers what IC-LoRA is, how it differs from standard LoRAs, which use cases it solves, and how to train and run one. It also covers when standard LoRAs remain the better choice and when the complexity of IC-LoRA training is unnecessary.
What IC-LoRA Actually Is
Standard LoRAs encode a style, motion pattern, or identity entirely through weight adjustments applied at every generation step. The model learns to "recall" the trained concept from the text trigger word. Identity consistency depends entirely on how strongly the training data stamped the concept into the weights.
IC-LoRA takes a different approach. IC stands for In-Context — instead of relying on trigger words alone, the LoRA learns to condition generation on a reference image provided at inference time. You provide a source image of the character or face at inference, and the model uses that image as a live conditioning signal throughout the generation, not just as a text-encoded memory.
The result for character and face work is substantially better visual consistency than trigger-word-only LoRAs, especially across longer clips and more varied motion sequences.
The Two IC-LoRA Variants in LTX 2.3
LTX 2.3 ships with two official IC-LoRA control architectures on Hugging Face:
LTX-2.3-22b-IC-LoRA-Union-Control — General-purpose identity and style conditioning. The most flexible variant. Takes a single reference image and maintains visual consistency throughout the generated clip. Best for: character LoRAs where you want to preserve face, body, and clothing across varied motion.
LTX-2.3-22b-IC-LoRA-Motion-Track-Control — Adds motion trajectory conditioning on top of identity control. Takes a reference image plus motion tracking data. Best for: controlled character animation where you want to direct how the character moves in addition to what they look like.
For most character and face work, Union Control is the starting point. Motion-Track Control adds complexity that is only worth the investment if precise motion trajectory control is a production requirement.
IC-LoRA vs. Standard LoRA: When to Use Which
Standard LoRAs remain the right choice for style LoRAs (a visual aesthetic not tied to a specific person), motion LoRAs (a type of movement pattern), product LoRAs (a specific object's appearance), and world LoRAs (an environment or setting). For these categories, trigger-word conditioning works well and IC-LoRA adds unnecessary training complexity.
IC-LoRA is worth the extra training investment when your use case requires visual identity consistency for a specific person or character across multiple generations. The specific triggers: face consistency that holds across more than 5-second clips, character animations where the subject must be recognizable across varied motion sequences, and scenes with multiple cuts requiring consistent character appearance.
Training an IC-LoRA: Technical Requirements
IC-LoRA training uses the same LTX 2.3 trainer as standard LoRAs but with IC-LoRA-specific configuration. The same hard constraints apply:
Frame counts must follow the 8n+1 rule: valid values are 1, 9, 17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97, 105, 113, 121. Resolution must be divisible by 32. These are architectural requirements enforced by the model — they are not configurable.
For IC-LoRA specifically, training data quality matters more than for standard LoRAs because the model needs to learn to condition on visual features accurately. Recommendations from WaveSpeedAI and fal.ai documentation:
Dataset: 15-40 video clips for a face or character IC-LoRA. Include varied lighting conditions, multiple angles (front, 3/4, profile), and varied expressions. Consistent subjects with clean, unobstructed frames outperform larger datasets with noisy or inconsistent identity.
Parameters: Rank 32, learning rate 1e-4, alpha equal to rank. These are the standard defaults and work well for IC-LoRA training. Do not adjust before running a baseline.
Checkpoint selection: Sample at checkpoint 500. If identity consistency looks strong, stop there. Pushing to 750-1000 can sharpen fine details but risks overfitting, especially on small datasets.
Using IC-LoRA at Inference Time
IC-LoRA inference requires providing a reference image alongside your text prompt. The reference image is the conditioning signal — it should be a high-quality, well-lit image of the character or face you want to appear in the generated video.
The model uses the reference image to maintain visual consistency throughout the clip. Identity consistency holds well for 5-20 second generations, which maps neatly to LTX 2.3's practical generation range. For clips beyond 20 seconds, drift accumulates — the character's features may shift subtly between the beginning and end of the sequence.
Practical inference tip: use a reference image with neutral expression and good facial visibility. Strong directional lighting, extreme angles, or heavy shadows in the reference image degrade consistency. A clean 3/4 face shot in diffuse lighting gives the model the strongest conditioning signal.
Training IC-LoRAs Without Code: Grix Face Recipe
If you want to train an IC-LoRA without managing training configuration directly, the Grix LoRA Trainer Face recipe handles IC-LoRA configuration automatically.
The Face recipe pre-configures the training for identity-consistent character generation: IC-LoRA architecture, optimal rank and learning rate defaults, dataset preparation guidance, and checkpoint selection recommendations. You upload your source clips, select Face from the recipe menu, and the trainer handles the rest.
The trained LoRA exports as a standard .safetensors file compatible with any LTX 2.3 inference endpoint. You can test it immediately in the Grix LoRA Studio without setting up a separate inference environment.
For users who want full parameter control — custom rank, adjusted learning rate, specific checkpoint interval — the API-first path via WaveSpeedAI or the official fal.ai trainer gives direct access to every training variable.
FAQ
Can I convert an LTX 2 face LoRA to IC-LoRA for 2.3?
No. LTX 2 LoRAs are incompatible with LTX 2.3 due to the rebuilt VAE. IC-LoRA for 2.3 requires training from scratch on LTX 2.3 specifically. The silver lining: IC-LoRA typically requires fewer training clips than standard LoRAs for equivalent face consistency results.
How long does IC-LoRA training take?
On cloud GPU infrastructure (A100 or H100), a 20-30 clip face IC-LoRA trains in approximately 35-50 minutes. Local training on RTX 4090 hardware runs 90-120 minutes. Training time scales with dataset size and the number of steps — stopping at checkpoint 500 is often sufficient.
How many clips do I need for a face IC-LoRA?
15-25 clips for a single subject with good coverage of angles and lighting is a practical starting point. More clips help only if they add genuine coverage diversity — duplicate angles or similar lighting conditions add training time without improving consistency.
Does IC-LoRA work for non-human subjects?
Yes. IC-LoRA works for any subject requiring visual identity consistency — animals, creatures, specific vehicles or objects, stylized characters. The reference image conditioning is not face-specific; it works on any visual identity the model can encode.
What is the difference between Union Control and Motion-Track Control?
Union Control conditions generation on identity (what the character looks like). Motion-Track Control adds motion trajectory conditioning (how the character moves). Start with Union Control — it covers the majority of character consistency use cases. Motion-Track Control adds complexity that is only worth the investment for productions requiring precise motion direction alongside identity consistency.