RLHF and fine-tuning corrections designed to improve diversity representation in AI-generated imagery overfit, causing the model to apply diversity corrections universally regardless of historical or cultural context. The correction for one bias introduced a different factual inaccuracy.
Google paused people image generation in Gemini for approximately six weeks. Became a prominent example of how bias corrections in AI training can produce unexpected and opposite failures. Accelerated discussion of AI training correction methodology and contextual appropriateness.
The Incident
In February 2024, users discovered that Google's Gemini image generation model was producing historically inaccurate images. When asked to generate images of "1943 German soldiers," it produced images of diverse groups of people including Black soldiers and women in German military uniforms. When asked to generate images of "the US founding fathers," it produced a racially diverse group.
The model was applying diversity corrections universally — generating varied racial representations in all historical contexts — without considering whether those corrections were contextually appropriate or historically accurate. The corrections that were designed to reduce under-representation of non-white subjects in contemporary contexts were being applied to historical contexts where the factual representation was specific.
Google paused Gemini's ability to generate images of people on February 22, 2024. The pause lasted approximately six weeks.
Why It Happened
This is not a simple story about AI bias. The incident is specifically about RLHF overcorrection: a targeted training intervention (increase diversity in AI-generated imagery) produced an unintended consequence (universal diversity application regardless of context).
The underlying correction was legitimate. AI image generation systems had documented tendencies to under-represent non-white subjects in contemporary imagery, defaulting to white subjects when race was unspecified. The correction for that tendency — training the model to generate more diverse subjects — was reasonable. The failure was in scope: the correction applied to historical contexts where factual accuracy required specific representation, not corrected representation.
This is Temporal Coupling (Law V) at the training layer: the training intervention that produced correct behavior in 2024 contexts produced incorrect behavior in 1943 contexts. The correction was context-dependent. The training was not.
The Exhibit
This incident is a case study in correction propagation failure — the same pattern that causes a bug fix in one module to break behavior in another. The training correction was a patch applied to the model's representation policy. The patch did not carry its own contextual constraints. It propagated universally.
The Gemini pause is significant not because the model was malicious or because Google failed to think about bias. It is significant because it demonstrates that bias correction is itself a form of training, subject to the same failure modes as any other training: overfitting, unintended scope, context sensitivity.
The Broader Pattern
Google's response — pausing the feature — was technically honest: "we don't know how to fix this without breaking something else, so we'll stop rather than leave the broken version running." That is the correct operational decision. It is also an explicit admission that bias correction in large models is not a solved problem.
The model didn't introduce bias. It overcorrected for one bias and introduced another. Both are failures. Neither was intentional. That is the exhibit: you can do everything right and still get it wrong if you're correcting in the dark.