The purpose of this project is to evaluate predicted MSA answers for open-ended multimodal QA.
Annotators should compare the predicted answer against the reference answer and the provided image/context, assign 1-10 scores for the required criteria, and provide short rationales plus findings.
The task template includes the scoring guide and required fields directly in the interface.