Purpose and Development
Developed in collaboration with 261 domain experts from across India, IndQA features 2,278 questions covering 12 Indian languages and 10 cultural domains such as literature, food, history, spirituality, and daily life. Unlike conventional benchmarks like MMMLU and MGSM, IndQA’s content is “natively written” rather than translated — ensuring authentic phrasing, intent, and cultural context.
OpenAI stated that the project aligns with its mission to build AI systems that understand people as they naturally speak and think, rather than relying solely on literal translation or Western linguistic structures.
Structure and Evaluation Method
IndQA employs a rubric-based evaluation system — a method designed for qualitative, context sensitive assessment.
Each question includes:
-
A culturally grounded prompt written in an Indian language,
-
An English translation for cross-verification,
-
A grading rubric defining expected answer criteria, and
-
An ideal expert-level response.
Responses are evaluated on nuance, reasoning, and cultural correctness, with weighted scoring assigned to each criterion. The final score reflects how well an AI model aligns with expert expectations rather than just factual accuracy.
Languages and Cultural Scope
The IndQA benchmark spans 12 major languages: Bengali, English, Hindi, Hinglish, Kannada, Marathi, Odia, Telugu, Gujarati, Malayalam, Punjabi, and Tamil.
It explores 10 cultural and intellectual domains, including:
-
Architecture & Design
-
Arts & Culture
-
Everyday Life
-
Law & Ethics
-
Media & Entertainment
-
Religion & Spirituality
-
Sports & Recreation
-
Literature, Food, and History
OpenAI chose India as the launch base due to its immense linguistic diversity and because nearly one billion Indians primarily communicate in non-English languages.
🧾 Exam Oriented Facts
IndQA includes 2,278 questions in 12 Indian languages across 10 cultural domains.
Developed with input from 261 domain experts across India.
Uses a rubric-based evaluation rather than multiple-choice format.
Benchmarked using GPT-4o, OpenAI o3, GPT-4.5, and GPT-5 models.
Significance and Future Plans
Srinivas Narayanan, CTO of B2B Applications at OpenAI, explained that the goal was to ensure models understand “the nuances every culture cares about.” OpenAI plans to extend this framework to other countries and linguistic regions, improving AI inclusivity, fairness, and performance beyond English-speaking contexts. With India as ChatGPT’s second-largest market, IndQA demonstrates OpenAI’s commitment to making AI systems more accessible, culturally aligned, and reliable for non-English users worldwide.
