Why AI Safety Requires Multilingual Human Experts

Puneet Kohli
|
March 9, 2026

Most AI safety research and testing is conducted in English. The majority of red-teaming exercises, preference evaluations, safety benchmarks, and alignment datasets are English-language. This creates a significant and growing blind spot. AI models are deployed globally, to users who speak hundreds of different languages, and the safety risks — harmful outputs, cultural insensitivity, factual errors, manipulation vectors, and bias patterns — vary dramatically across linguistic and cultural contexts. A model that has been thoroughly safety-tested in English may have critical vulnerabilities in other languages that go undiscovered until they cause harm in production.

At A Glance: Multilingual AI Safety

  • AI safety testing is overwhelmingly English-language, creating blind spots in models deployed globally to speakers of hundreds of languages.
  • Harmful content patterns, cultural taboos, manipulation techniques, and bias vectors differ across linguistic and cultural contexts.
  • Effective multilingual safety requires native-speaker experts who understand both the language and the cultural context, not just translation of English safety tests.
  • Finding multilingual safety experts is harder than English-language ones. The talent pools are smaller, more distributed, and harder to vet for cultural competence.
  • Organizations deploying AI globally should start with highest-deployment languages and build language-specific safety protocols reflecting local cultural context.

The Language Gap in AI Safety

The imbalance in safety testing is substantial. English represents a small fraction of the world’s languages but dominates AI safety work. Red-teaming datasets are predominantly English. RLHF preference data is predominantly English. Safety benchmarks and evaluations are designed and tested primarily in English.

This matters because safety risks are not language-independent. Harmful content manifests differently across languages. A harmful instruction that is filtered effectively in English might bypass safety measures in Hindi, Arabic, or Swahili because the safety training was conducted primarily in English. Toxic language patterns, hate speech markers, and manipulation techniques vary by language and culture. A phrase that is benign in one language might be deeply offensive in another due to cultural context, historical associations, or semantic nuances that do not translate directly.

The gap extends beyond harmful content. Factual accuracy, cultural appropriateness, and helpfulness all have language-specific dimensions. A model that provides culturally appropriate advice in English might give culturally tone-deaf responses in Japanese, Yoruba, or Portuguese. Multilingual data is becoming a competitive edge not just for capability but for safety and user trust.

Why Translation Is Not Enough

Cultural Context Cannot Be Translated

The most naive approach to multilingual safety is translating English safety tests into other languages. This fails because safety risks are culturally embedded. An English safety test probing for racial bias based on American racial categories will not detect bias patterns relevant in India, Nigeria, or Brazil, where the relevant social categories and historical contexts are different.

Effective safety testing in a language requires understanding the cultural context of that language community: the social tensions, the historical sensitivities, the taboo topics, the common manipulation patterns, and the ways that harmful content is typically expressed. This understanding cannot be obtained through translation. It requires native speakers with deep cultural knowledge.

Linguistic Nuance Requires Native Expertise

Languages have different mechanisms for expressing politeness, formality, indirectness, sarcasm, and implication. A response that is helpfully direct in English might be considered rude in Japanese due to missing honorifics. A response that uses appropriate formal register in Spanish might use informal register that is inappropriate in a professional context. These nuances affect how users perceive model safety and trustworthiness.

Safety Attack Vectors Are Language-Specific

Adversarial attacks on AI safety systems exploit language-specific features. Code-switching between languages in a single prompt can bypass safety filters designed for monolingual inputs. Homoglyphs — characters from different scripts that look similar — can be used to evade text-based safety checks. And language-specific euphemisms for harmful content may not be recognized by safety systems trained primarily on English patterns.

What Multilingual Safety Requires

Native-Speaker Domain Experts

Safety evaluation in each language should be conducted by native speakers who also have relevant domain expertise. For medical safety, this means clinicians who practice in the relevant language community. For legal safety, this means legal professionals familiar with the jurisdiction’s laws and norms. For general content safety, this means evaluators with deep cultural knowledge. These are the same qualities that define effective human evaluators who identify biased or harmful outputs — applied across linguistic boundaries.

Language-Specific Red-Team Protocols

Red-teaming for each language should include attack vectors specific to that language’s characteristics: script-specific exploits, language-specific euphemisms for harmful content, culturally specific manipulation techniques, and conversational patterns unique to that language community. Generic red-team protocols translated from English miss these vectors.

Culturally Calibrated Evaluation Rubrics

Evaluation rubrics for safety and quality should be developed by native speakers who understand what constitutes appropriate, helpful, and safe responses in their cultural context. What is considered helpful advice varies. What constitutes an appropriate tone varies. What topics are sensitive varies. Rubrics must reflect these variations.

Ongoing Monitoring, Not Just Pre-Launch Testing

Language-specific safety issues may emerge in production that were not anticipated during pre-launch testing. Continuous monitoring by native-speaker evaluators is necessary to catch emerging problems, new attack patterns, and evolving cultural sensitivities.

The Sourcing Challenge

Finding multilingual safety experts is harder than finding English-language ones. The talent pool for each language is smaller. Cultural competence is harder to assess from outside the culture. And the intersection of domain expertise, language fluency, and safety evaluation skill is a narrow profile. Organizations need partners with genuine global reach and cultural competence. Careerflow’s expert network spans multiple languages and cultural contexts, supporting AI labs that need safety evaluation beyond English. For teams building internally, the principles of recruiting experts for specialized AI tasks apply with additional constraints: language fluency must be native or near-native, and cultural competence must be verified, not assumed.

The challenge extends to quality assurance. QA for multilingual safety requires reviewers who can evaluate the quality of safety judgments in each language — which means having competent evaluators at both the annotation and review levels. Teams that understand how to evaluate safety across domains recognize that language is one of the most important dimensions of this evaluation.

Building a Multilingual Safety Program

Organizations deploying AI globally should approach multilingual safety systematically rather than reactively.

Start with the languages where the model has the highest deployment and the most user interaction. Prioritize languages where the gap between English safety testing and local safety requirements is largest — often languages with significant cultural distance from English.

Build language-specific red-team protocols that reflect the cultural context of each language community. Do not simply translate English protocols.

Use native-speaker reviewers for all safety-critical evaluations. Machine translation of safety evaluations is insufficient — the cultural nuances that matter most are precisely the ones that translation handles worst.

Maintain ongoing monitoring in each deployment language. Safety is not a one-time test. Cultural contexts evolve, new attack vectors emerge, and user behavior patterns change over time.

Track safety metrics by language to identify which languages have the largest gaps and where investment will have the highest impact.

Conclusion

AI safety is a multilingual problem. Models that are safe in English may be unsafe in other languages, and the safety risks in each language are shaped by cultural context that English-language testing cannot capture. Organizations that invest in multilingual human expertise — native-speaker evaluators with cultural competence and domain knowledge — will deploy safer, more trustworthy AI globally.

The ones that treat multilingual safety as an afterthought, relying on English-only testing with occasional translation, will discover safety gaps through user harm rather than through testing. The cost of that discovery — in user trust, regulatory exposure, and reputational damage — far exceeds the investment in building genuine multilingual safety capabilities.

Ready to Transform Your Job Search?

Sign up now to access Careerflow’s powerful suite of AI tools and take the first step toward landing your dream job.