diff --git a/benchmarks/struct_indices/spider/README.md b/benchmarks/struct_indices/spider/README.md
index 1f0db4a9bd6fba9f274cda2a3d0967d4416a8d02..bb2cf191390553d4516162845670f562f8101bcb 100644
--- a/benchmarks/struct_indices/spider/README.md
+++ b/benchmarks/struct_indices/spider/README.md
@@ -31,7 +31,7 @@ python generate_sql.py --input spider-0_001 --output spider-0_001-pred --model g
 ./evaluate.sh spider-0_001 spider-0_001-pred
 ```
 
-5. **New**! Use `evaluate.py` to evalaute the generated SQLs against
+5. **New**! Use `evaluate.py` to evaluate the generated SQLs against
    golden SQLs by matching the natural language answers generated from their
    respective execution outputs. This is called [Answer Accuracy](https://ekzhu.medium.com/human-aligned-text-to-sql-evaluation-399123fa0a64).