diff --git a/benchmarks/struct_indices/spider/README.md b/benchmarks/struct_indices/spider/README.md index 1f0db4a9bd6fba9f274cda2a3d0967d4416a8d02..bb2cf191390553d4516162845670f562f8101bcb 100644 --- a/benchmarks/struct_indices/spider/README.md +++ b/benchmarks/struct_indices/spider/README.md @@ -31,7 +31,7 @@ python generate_sql.py --input spider-0_001 --output spider-0_001-pred --model g ./evaluate.sh spider-0_001 spider-0_001-pred ``` -5. **New**! Use `evaluate.py` to evalaute the generated SQLs against +5. **New**! Use `evaluate.py` to evaluate the generated SQLs against golden SQLs by matching the natural language answers generated from their respective execution outputs. This is called [Answer Accuracy](https://ekzhu.medium.com/human-aligned-text-to-sql-evaluation-399123fa0a64).