1. Fix the gpt-4o grader API version to avoid using the recently released one. 2. Fix grading API calling logics for reasoning Qs to increase robustness.