From Knowledge to Automation
Wikipedia has long been a space where humans write, edit, and debate knowledge. Yet in 2025, as large language models (LLMs) increasingly mediate between users and information, the visibility of human-created sources like Wikipedia is quietly diminishing. When AI chatbots summarize, paraphrase, or rewrite content originally curated by human editors, readers may never visit the underlying pages. The result is measurable: a decline in human pageviews, and potentially, a decline in the shared engagement that sustains Wikimedia projects.
"We are seeing declines in human pageviews on Wikipedia over the past few months, amounting to a decrease of roughly 8% compared to the same months in 2024."
This trend is not isolated. Social platforms, search engines, AI chatbots, and LLMs now filter and repurpose information that once led readers directly to Wikipedia. While these systems are technically remarkable, they risk reproducing a familiar pattern—automation built on human labor, followed by the gradual erasure of the laborers themselves.
The Problem of Erasure
Erasure is not new. Marginalized communities—particularly LGBTQ+ people—have long experienced it through omission, distortion, or outright censorship. In recent years, generative AI has introduced new forms of this erasure. A Wired investigation in April 2024 revealed that AI-generated depictions of queer people were frequently stereotypical, inaccurate, or entirely absent.
During my presentation Célébrer nos histoires : fiertés, récits et mémoire collective (Nantes, August 2025), I observed similar biases when generating images for the word "amour"—the French word for "love." Most systems produced heteronormative imagery by default, reinforcing a narrow and sanitized vision of affection. Even more troublingly, when asked to depict a "gay person" or a "trans man", several major AI models resorted to clichés. For example, the prompt "gay person" often returned the image of a man holding a rainbow flag. I found myself wondering: must queerness always be reduced to a flag? Would anyone expect me to carry one in everyday life?
These are not merely aesthetic issues. They reveal deeper biases embedded in datasets and training pipelines across different stages of AI development:
- Data Collection: Removal of legitimate LGBTQ+ materials alongside actually harmful content.
- Tokenization: Fragmentation or neglect of neopronouns and non-binary identifiers due to statistical rarity.
- Training: Exclusion of documents containing LGBTQ+ terminology and themes.
- Post-Training: Overzealous moderation that flags community discussions of identity or culture as "adult content."
This pipeline is worrying. Imagine a situation where queer topics and lived experiences are removed or filtered out at the very first stage of data collection. In such a case, AI models would never have the opportunity to learn about queerness in the first place.
The result is a cycle where certain identities become unrepresentable—technically filtered out in the name of "safety."
Queering Wikipedia and Defending Representation
In Queering Wikipedia (October 2025), I argued that Wikimedia projects remain among the few digital spaces where queer histories and identities can be documented with nuance, multilingualism, and collective oversight. Unlike proprietary AI datasets, Wikimedia's infrastructure is open, verifiable, and debate-driven. Every edit leaves a trace; every disagreement becomes a public record of how knowledge evolves through human dialogue.
This transparency is crucial. When AI systems remove or distort queer representation, Wikimedia communities can reassert the human layer of context that machines often overlook. Through collective authorship, citation, and discussion, volunteers continue to write us back into history—one entry at a time.
Expanding LGBTQ+ Content Systematically
To counter digital erasure, communities must move beyond isolated edits. We need systematic strategies to expand LGBTQ+ content and ensure inclusive data models across Wikimedia projects:
For Wikipedia and Sister Projects
- Create comprehensive biographical entries: Document historical and contemporary LGBTQ+ figures across regions and languages.
- Document history, culture, and media: Include archives, organizations, movements, and events often overlooked in mainstream narratives.
- Ensure multilingual coverage: Translate and contextualize content for local Wikimedia editions to foster global accessibility.
For Wikidata and Structured Data Projects
- Develop inclusive data models: Refine how identity is represented in structured data to reflect complexity and fluidity.
- Respect self-identification: Prioritize labels and statements that mirror how people and communities describe themselves.
- Document consensus: Make visible the discussions and decisions that shape identity-related properties and statements.
Building a Fairer AI Ecosystem
Wikimedia's role extends beyond knowledge creation—it models ethical data practices. To ensure that open knowledge remains inclusive and verifiable in the age of AI, five principles can guide both Wikimedia and the broader AI ecosystem:
- Mandate diverse data sources: AI models must be trained on datasets reflecting global diversity, not sanitized corpora.
- Create quality metrics: Evaluate inclusivity, linguistic variety, and cultural representation—not just predictive accuracy.
- Fund WikiProjects: Support thematic and regional initiatives documenting marginalized histories and identities.
- Establish attribution standards: Require transparent citation and acknowledgment of Wikimedia and other open knowledge sources.
- Support multilingual efforts: Strengthen smaller language Wikipedias as vital nodes of cultural diversity.
Document Queerness, Defeat Erasure
The current AI wave risks repeating historical cycles of disappearance. Yet Wikimedia offers a counter-model: a living, participatory archive of human experience. Its strength lies not only in openness, but in its people—the editors, translators, verifiers, and readers who defend representation in every language.
If AI is trained on the sum of human knowledge, then the quality and inclusivity of that knowledge depend on us. The future of queer visibility—and of any marginalized identity—cannot be left to the black boxes of proprietary systems. It must remain in the hands of communities that document with care, challenge bias, and celebrate complexity.
Many of us grew up without seeing ourselves reflected in textbooks, media, or public discourse. Wikimedia communities have the power to change that narrative—not only for ourselves, but for generations to come. To ensure that AI does not erase us again, we must continue to write, to question, and to remember—together.
Slides of the talk are available here.
References
- Wired. Here's How Generative AI Depicts Queer People. April 2024.
- Marshall Miller. Wikipedia's Human Pageviews Decline as AI Grows. Diff, 17 October 2025.
- John Samuel. Célébrer nos histoires : fiertés, récits et mémoire collective. 50e Rencontres LGBTI+, Fédération LGBTI+, Nantes, France, August 2025.