Multimodal AI QA Checking MSA LLM Judge

Start Contributing Now!

The purpose of this project is to evaluate predicted MSA answers for open-ended multimodal QA.

Annotators should compare the predicted answer against the reference answer and the provided image/context, assign 1-10 scores for the required criteria, and provide short rationales plus findings.

The task template includes the scoring guide and required fields directly in the interface.

Multimodal AI QA Checking MSA LLM Judge

Data repository: CKAN server