NLP · Evaluation · Datathon 2.0

অলীকবচন — Bengali LLM Hallucination Detection Challenge

Bengali is spoken by over 280 million people, and the large language models that increasingly mediate how they access information are remarkably fluent in it. They are also, frequently, confidently wrong — correctly naming the author of Bidrohi in English, then attributing it to Tagore in Bengali; describing the Liberation War with fluent prose and the wrong dates; fabricating a monument’s commissioning year in a paragraph that reads beautifully. These failures don’t sound wrong. This challenge asks teams to build a system that catches them.

View competition & leaderboard on Kaggle

Cultural-distance bands

Task types

Competition phases

Teams advance to Phase 2 review

Teams reach the IUT final

Scored on the hallucinated class

How it’s structured

Phase 1 — Kaggle Leaderboard

Teams submit prediction files directly to Kaggle, scored on public and private splits of the test set. Best two submissions carry forward.

Phase 2 — Solution Review & IUT Final

The top 30 teams submit a runnable notebook and short paper. Organizers re-score each on a held-out fold; the top 15 present in person at Islamic University of Technology.

What the data rewards understanding

Globally stable facts

Universal science, world geography, mathematics. Models should get these right in any language — the control band.

Culturally situated

The Bangladeshi answer differs from a Western or globally-dominant default, or the fact exists only in the Bangladeshi context. Where the headline phenomenon lives.

Contested or time-sensitive

Recent, disputed, or time-bound facts that shift under a model's training cutoff.

Given a Bengali prompt and a candidate response — sometimes with a supporting context passage, sometimes without — the task is to predict whether the response is faithful or hallucinated, scored on F1 for the hallucinated class. There is no conventional training set: a small labeled sample is released for pipeline validation, and detectors are judged on how well they generalize to a held-out test set.

অলীকবচনis Datathon 2.0, powered by the Institute of Policy Dynamics with the IUT Computer Society (IUTCS), sponsored by Brain Lab — the research arm of the ICT Division’s EBLICT project. Full rules, data schema, and timeline are on Kaggle.

Read the full rules & data description on Kaggle