Multimodal AI QA Checking MSA LLM Judge

The purpose of this project is to evaluate predicted MSA answers for open-ended multimodal QA.

Annotators should compare the predicted answer against the reference answer and the provided image/context, assign 1-10 scores for the required criteria, and provide short rationales plus findings.

The task template includes the scoring guide and required fields directly in the interface.