Institute of Policy Dynamics
Projects
NLP · Evaluation · Datathon 2.0

অলীকবচন Bengali LLM Hallucination Detection Challenge

Bengali is spoken by over 280 million people, and the large language models that increasingly mediate how they access information are remarkably fluent in it. They are also, frequently, confidently wrong — correctly naming the author of Bidrohi in English, then attributing it to Tagore in Bengali; describing the Liberation War with fluent prose and the wrong dates; fabricating a monument’s commissioning year in a paragraph that reads beautifully. These failures don’t sound wrong. This challenge asks teams to build a system that catches them.

3
Cultural-distance bands
4
Task types
2
Competition phases
30
Teams advance to Phase 2 review
15
Teams reach the IUT final
F1
Scored on the hallucinated class
How it’s structured

Phase 1 — Kaggle Leaderboard

Teams submit prediction files directly to Kaggle, scored on public and private splits of the test set. Best two submissions carry forward.

Phase 2 — Solution Review & IUT Final

The top 30 teams submit a runnable notebook and short paper. Organizers re-score each on a held-out fold; the top 15 present in person at Islamic University of Technology.

What the data rewards understanding
C0

Globally stable facts

Universal science, world geography, mathematics. Models should get these right in any language — the control band.

C1

Culturally situated

The Bangladeshi answer differs from a Western or globally-dominant default, or the fact exists only in the Bangladeshi context. Where the headline phenomenon lives.

C2

Contested or time-sensitive

Recent, disputed, or time-bound facts that shift under a model's training cutoff.

Given a Bengali prompt and a candidate response — sometimes with a supporting context passage, sometimes without — the task is to predict whether the response is faithful or hallucinated, scored on F1 for the hallucinated class. There is no conventional training set: a small labeled sample is released for pipeline validation, and detectors are judged on how well they generalize to a held-out test set.

অলীকবচনis Datathon 2.0, powered by the Institute of Policy Dynamics with the IUT Computer Society (IUTCS), sponsored by Brain Lab — the research arm of the ICT Division’s EBLICT project. Full rules, data schema, and timeline are on Kaggle.

Read the full rules & data description on Kaggle