diff --git a/README.md b/README.md
index 77460da347ea8596a388e41f273fef6f80e7e420..65a51ad6196664d6d24bdf26f25fa4f2f42ee0d8 100644
--- a/README.md
+++ b/README.md
@@ -5,7 +5,7 @@
 
 é˜…è¯»[ä¸æ–‡ç‰ˆæœ¬](README_ZH.md).
 
-# LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
+# ðŸ“– LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
 
 **LongBench** is the first benchmark for bilingual, multitask, and comprehensive assessment of **long context understanding** capabilities of large language models. LongBench includes different languages (Chinese and English) to provide a more comprehensive evaluation of the large models' multilingual capabilities on long contexts. In addition, LongBench is composed of six major categories and twenty different tasks, covering key long-text application scenarios such as multi-document QA, single-document QA, summarization, Few-shot learning, code completion, and synthesis tasks.
 
@@ -22,14 +22,15 @@ LongBench includes 13 English tasks, 5 Chinese tasks, and 2 code tasks, with the
 | Synthetic Tasks | 2 | 1 | - |
 | Code Completion | - | - | 2 |
 
-## Table of Contents
-- [Leaderboard](#leaderboard)
-- [How to evaluate on LongBench](#how-to-evaluate-on-LongBench)
-- [Evaluation Result on Each Dataset](#evaluation-result-on-each-dataset)
-- [Acknowledgement](#acknowledgement)
-- [Citation](#citation)
-
-## Leaderboard
+## ðŸ” Table of Contents
+- [ðŸ–¥ï¸ Leaderboard](#leaderboard)
+- [âš™ï¸ How to evaluate on LongBench](#how-to-evaluate-on-LongBench)
+- [ðŸ“Š Evaluation Result on Each Dataset](#evaluation-result-on-each-dataset)
+- [ðŸ“„ Acknowledgement](#acknowledgement)
+- [ðŸ“ Citation](#citation)
+  
+<a name="leaderboard"></a>
+## ðŸ–¥ï¸ Leaderboard
 Here is the average scores (%) on the main task categories in both Chinese and English languages under the Zero-shot scenario. Please refer to this [link](task.md) for the evaluation metrics used for each task.
 
 > Note: For text exceeding the processing length capability of the model, we truncate from the middle of the text, preserving information from the beginning and end, in accordance with the observations from [Lost in the Middle](https://arxiv.org/abs/2307.03172). Experiments show that this truncation method has the least impact on model performance.
@@ -67,7 +68,8 @@ To more specifically analyze the models' relative performance under different co
 
 > Note: Assume that the model scores x on the data within a specific length range of a task, and y on all data of that task, then the model's **relative score** for that length range is (x/y-1). To better compare the trends of different models, we shift all the lines to 0 on 0-4k.
 
-## How to evaluate on LongBench
+<a name="how-to-evaluate-on-LongBench"></a>
+## âš™ï¸ How to evaluate on LongBench
 
 #### Load Data
 You can download and load the **LongBench** data through the Hugging Face datasets ([ðŸ¤— HF Repo](https://huggingface.co/datasets/THUDM/LongBench)):
@@ -111,7 +113,8 @@ python eval.py
 ```
 You can get the evaluation results on all datasets in `result.json`. Please note that in `config/`, we provide the input format suitable for each dataset and the maximum output length. Feel free to modify them to better suit the model you want to evaluate. After modification, when evaluating with [pred.py](pred.py), the data will be automatically organized according to the new format to get the corresponding model output.
 
-## Evaluation Result on Each Dataset
+<a name="evaluation-result-on-each-dataset"></a>
+## ðŸ“Š Evaluation Result on Each Dataset
 
 The following tables show the Zero-shot evaluation results (%) on all datasets, where Chinese datasets are denoted by "zh" (please refer to this [link](task.md) for the evaluation metrics used for each task).
 
@@ -186,11 +189,13 @@ The following tables show the Zero-shot evaluation results (%) on all datasets,
 | ChatGLM2-6B       |         3.2         |      2.1      |         5.5         |
 | ChatGLM2-6B-32k   |        77.5         |      2.0      |        62.5         |
 
-## Acknowledgement
+<a name="acknowledgement"></a>
+## ðŸ“„ Acknowledgement
 
 - Some of the tasks of **LongBench** are based on the datasets proposed by previous researchers, including [HotpotQA](https://hotpotqa.github.io/), [2WikiMultihopQA](https://aclanthology.org/2020.coling-main.580/), [Musique](https://arxiv.org/abs/2108.00573), [DuReader](https://github.com/baidu/DuReader), [NarrativeQA](https://arxiv.org/pdf/1712.07040.pdf), [Qasper](https://arxiv.org/pdf/2105.03011.pdf), [GovReport](https://arxiv.org/pdf/2104.02112.pdf), [QMSum](https://arxiv.org/pdf/2104.05938.pdf), [VCSUM](https://arxiv.org/abs/2305.05280), [TriviaQA](https://nlp.cs.washington.edu/triviaqa/), [NQ](https://ai.google.com/research/NaturalQuestions/), [TREC](https://aclanthology.org/C02-1150.pdf), [LSHT](http://tcci.ccf.org.cn/conference/2014/dldoc/evatask6.pdf), [LCC](https://arxiv.org/abs/2306.14893) and [RepoBench-P](https://arxiv.org/abs/2306.03091).
 
-## Citation
+<a name="citation"></a>
+## ðŸ“ Citation
 This is a joint work by **THU-KEG** and **Zhipu AI**. We are currently working on the paper, and the citation information will be updated when it's ready. Please stay tuned~
 
 When citing our work, please cite all of the original dataset papers. The relevant citation information is listed [here](refs/ref.bib).
diff --git a/README_ZH.md b/README_ZH.md
index 6bb46f258800d9b23102728456cdba41c40fb499..d0d71564dc810a4571ef41552d620827510e708a 100644
--- a/README_ZH.md
+++ b/README_ZH.md
@@ -5,7 +5,7 @@
 
 Read this in [English](README.md).
 
-# LongBench: å¤šä»»åŠ¡ä¸è‹±åŒè¯é•¿æ–‡æœ¬ç†è§£è¯„æµ‹åŸºå‡†
+# ðŸ“– LongBench: å¤šä»»åŠ¡ä¸è‹±åŒè¯é•¿æ–‡æœ¬ç†è§£è¯„æµ‹åŸºå‡†
 
 **LongBench**æ˜¯ç¬¬ä¸€ä¸ªå¤šä»»åŠ¡ã€ä¸è‹±åŒè¯ã€é’ˆå¯¹å¤§è¯è¨€æ¨¡åž‹**é•¿æ–‡æœ¬ç†è§£èƒ½åŠ›**çš„è¯„æµ‹åŸºå‡†ã€‚åœ¨ç›®å‰å¤§æ¨¡åž‹å¤šè¯è¨€èƒ½åŠ›å¼•èµ·å¹¿æ³›å…³æ³¨çš„èƒŒæ™¯ä¸‹ï¼ŒLongBenchæ¶µç›–äº†ä¸åŒçš„è¯è¨€ï¼ˆä¸æ–‡å’Œè‹±æ–‡ï¼‰ï¼Œä»¥æ¤æ¥å¯¹å¤§æ¨¡åž‹åœ¨é•¿æ–‡æœ¬ä¸‹çš„å¤šè¯è¨€èƒ½åŠ›è¿›è¡Œæ›´å…¨é¢çš„è¯„ä¼°ã€‚åŒæ—¶ï¼ŒLongBenchç”±å…å¤§ç±»ã€äºŒåä¸ªä¸åŒçš„ä»»åŠ¡ç»„æˆï¼Œè¦†ç›–äº†å•æ–‡æ¡£QAã€å¤šæ–‡æ¡£QAã€æ‘˜è¦ã€Few-shotå¦ä¹ ã€ä»£ç è¡¥å…¨å’Œåˆæˆä»»åŠ¡ç‰å…³é”®çš„é•¿æ–‡æœ¬åº”ç”¨åœºæ™¯ã€‚
 
@@ -22,14 +22,15 @@ LongBenchåŒ…å«13ä¸ªè‹±æ–‡ä»»åŠ¡ã€5ä¸ªä¸æ–‡ä»»åŠ¡å’Œ2ä¸ªä»£ç ä»»åŠ¡ï¼Œå¤šæ•°
 |   åˆæˆä»»åŠ¡   |     2      |     1      |     -      |
 |   ä»£ç è¡¥å…¨   |     -      |     -      |     2      |
 
-## ç›®å½•
-- [æŽ’è¡Œæ¦œ](#æŽ’è¡Œæ¦œ)
-- [å¦‚ä½•åœ¨LongBenchä¸Šè¯„æµ‹æ¨¡åž‹](#å¦‚ä½•åœ¨LongBenchä¸Šè¯„æµ‹æ¨¡åž‹)
-- [è¯¦ç»†è¯„æµ‹ç»“æžœ](#è¯¦ç»†è¯„æµ‹ç»“æžœ)
-- [è‡´è°¢](#è‡´è°¢)
-- [å¼•ç”¨](#å¼•ç”¨)
+## ðŸ” ç›®å½•
+- [ðŸ–¥ï¸ æŽ’è¡Œæ¦œ](#æŽ’è¡Œæ¦œ)
+- [âš™ï¸ å¦‚ä½•åœ¨LongBenchä¸Šè¯„æµ‹æ¨¡åž‹](#å¦‚ä½•åœ¨LongBenchä¸Šè¯„æµ‹æ¨¡åž‹)
+- [ðŸ“Š è¯¦ç»†è¯„æµ‹ç»“æžœ](#è¯¦ç»†è¯„æµ‹ç»“æžœ)
+- [ðŸ“„ è‡´è°¢](#è‡´è°¢)
+- [ðŸ“ å¼•ç”¨](#å¼•ç”¨)
 
-## æŽ’è¡Œæ¦œ
+<a name="æŽ’è¡Œæ¦œ"></a>
+## ðŸ–¥ï¸ æŽ’è¡Œæ¦œ
 æˆ‘ä»¬åœ¨è¿™é‡Œå±•ç¤ºäº†æ‰€æœ‰æ¨¡åž‹åœ¨Zero-shotåœºæ™¯ä¸‹ï¼Œåœ¨ä¸æ–‡å’Œè‹±æ–‡å„å¤§ç±»ä»»åŠ¡ä¸Šå¾—åˆ†çš„å¹³å‡å€¼ï¼ˆ%ï¼‰ï¼Œå„ä»»åŠ¡è¯„ä¼°æ‰€ç”¨æŒ‡æ ‡è¯·å‚è€ƒ[è¿™é‡Œ](task_zh.md)ã€‚
 
 > æ³¨ï¼šå¯¹äºŽè¶…å‡ºæ¨¡åž‹å¤„ç†é•¿åº¦èƒ½åŠ›çš„æ–‡æœ¬ï¼Œå‚è€ƒ[Lost in the Middle](https://arxiv.org/abs/2307.03172)çš„è§‚å¯Ÿï¼Œæˆ‘ä»¬ä»Žæ–‡æœ¬ä¸é—´è¿›è¡Œæˆªæ–ï¼Œä¿æŒå‰åŽéƒ¨åˆ†çš„ä¿¡æ¯ã€‚å®žéªŒè¡¨æ˜Žï¼Œè¿™ç§æˆªæ–æ–¹å¼å¯¹æ¨¡åž‹æ€§èƒ½å½±å“æœ€å°ã€‚
@@ -65,7 +66,8 @@ LongBenchåŒ…å«13ä¸ªè‹±æ–‡ä»»åŠ¡ã€5ä¸ªä¸æ–‡ä»»åŠ¡å’Œ2ä¸ªä»£ç ä»»åŠ¡ï¼Œå¤šæ•°
 
 > æ³¨ï¼šå‡è®¾æ¨¡åž‹åœ¨æŸä¸ªä»»åŠ¡çš„ç‰¹å®šé•¿åº¦èŒƒå›´å†…æ•°æ®ä¸Šå¾—åˆ†ä¸ºxï¼Œåœ¨è¯¥ä»»åŠ¡æ‰€æœ‰æ•°æ®ä¸Šå¾—åˆ†ä¸ºyï¼Œåˆ™æ¨¡åž‹åœ¨è¯¥é•¿åº¦èŒƒå›´çš„**ç›¸å¯¹åˆ†æ•°**ä¸º(x/y-1)ã€‚ä¸ºäº†æ›´å¥½æ¯”è¾ƒä¸åŒæ¨¡åž‹çš„å˜åŒ–è¶‹åŠ¿ï¼Œæˆ‘ä»¬åœ¨0-4kå°†æ‰€æœ‰æŠ˜çº¿å¹³ç§»è‡³0ã€‚
 
-## å¦‚ä½•åœ¨LongBenchä¸Šè¯„æµ‹æ¨¡åž‹
+<a name="å¦‚ä½•åœ¨LongBenchä¸Šè¯„æµ‹æ¨¡åž‹"></a>
+## âš™ï¸ å¦‚ä½•åœ¨LongBenchä¸Šè¯„æµ‹æ¨¡åž‹
 
 #### è½½å…¥æ•°æ®
 ä½ å¯ä»¥é€šè¿‡Hugging Face datasetsæ¥ä¸‹è½½å¹¶è½½å…¥**LongBench**çš„æ•°æ®ï¼ˆ[ðŸ¤— HF Repo](https://huggingface.co/datasets/THUDM/LongBench)ï¼‰:
@@ -107,7 +109,8 @@ python eval.py
 ```
 å¯ä»¥åœ¨`result.json`ä¸å¾—åˆ°åœ¨å„æ•°æ®é›†ä¸Šçš„è¯„æµ‹ç»“æžœã€‚è¯·æ³¨æ„ï¼Œæˆ‘ä»¬åœ¨`config/`ä¸‹æä¾›äº†æˆ‘ä»¬æ€»ç»“å‡ºæ¥çš„åœ¨å„æ•°æ®é›†ä¸Šé€‚åˆçš„è¾“å…¥æ ¼å¼å’Œæœ€å¤§è¾“å‡ºé•¿åº¦é™åˆ¶ï¼Œåœ¨è¯„æµ‹çš„æ—¶å€™å¯ä»¥è¿›è¡Œä¿®æ”¹ä»¥æ›´å¥½åœ°é€‚ç”¨ä½ è¦è¯„æµ‹çš„æ¨¡åž‹ï¼Œä¿®æ”¹åŽåœ¨[pred.py](pred.py)è¯„æµ‹æ—¶ä¼šè‡ªåŠ¨æŒ‰ç…§æ–°çš„æ ¼å¼åŽ»æ•´ç†æ•°æ®å¹¶å¾—åˆ°å¯¹åº”çš„æ¨¡åž‹è¾“å‡ºã€‚
 
-## è¯¦ç»†è¯„æµ‹ç»“æžœ
+<a name="è¯¦ç»†è¯„æµ‹ç»“æžœ"></a>
+## ðŸ“Š è¯¦ç»†è¯„æµ‹ç»“æžœ
 ä¸‹é¢çš„å‡ å¼ è¡¨æ ¼å±•ç¤ºäº†æ¨¡åž‹åœ¨æ‰€æœ‰åä»»åŠ¡æ•°æ®é›†ä¸Šçš„Zero-shotè¯„æµ‹ç»“æžœï¼ˆ%ï¼‰ï¼Œå…¶ä¸çš„ä¸æ–‡æ•°æ®é›†ä»¥â€œzhâ€æ ‡ç¤ºï¼ˆå„ä»»åŠ¡è¯„ä¼°æ‰€ç”¨æŒ‡æ ‡è¯·å‚è€ƒ[è¿™é‡Œ](task_zh.md)ï¼‰ã€‚
 
 #### å•æ–‡æ¡£QA
@@ -176,10 +179,12 @@ python eval.py
 | ChatGLM2-6B | 3.2 | 2.1 | 5.5 |
 | ChatGLM2-6B-32k | 77.5 | 2.0 | 62.5 |
 
-## è‡´è°¢
+<a name="è‡´è°¢"></a>
+## ðŸ“„ è‡´è°¢
 - **LongBench**çš„éƒ¨åˆ†ä»»åŠ¡åŸºäºŽä¹‹å‰çš„ç ”ç©¶è€…æå‡ºçš„æ•°æ®é›†æž„å»ºï¼ŒåŒ…æ‹¬[HotpotQA](https://hotpotqa.github.io/)ï¼Œ[2WikiMultihopQA](https://aclanthology.org/2020.coling-main.580/)ï¼Œ[Musique](https://arxiv.org/abs/2108.00573)ï¼Œ[DuReader](https://github.com/baidu/DuReader)ï¼Œ[NarrativeQA](https://arxiv.org/pdf/1712.07040.pdf)ï¼Œ[Qasper](https://arxiv.org/pdf/2105.03011.pdf)ï¼Œ[GovReport](https://arxiv.org/pdf/2104.02112.pdf)ï¼Œ[QMSum](https://arxiv.org/pdf/2104.05938.pdf)ï¼Œ[VCSUM](https://arxiv.org/abs/2305.05280)ï¼Œ[TriviaQA](https://nlp.cs.washington.edu/triviaqa/)ï¼Œ[NQ](https://ai.google.com/research/NaturalQuestions/)ï¼Œ[TREC](https://aclanthology.org/C02-1150.pdf)ï¼Œ[LSHT](http://tcci.ccf.org.cn/conference/2014/dldoc/evatask6.pdf)ï¼Œ[LCC](https://arxiv.org/abs/2306.14893)å’Œ[RepoBench-P](https://arxiv.org/abs/2306.03091)ã€‚
 
-## å¼•ç”¨
+<a name="å¼•ç”¨"></a>
+## ðŸ“ å¼•ç”¨
 æœ¬å·¥ä½œç”±**THU-KEG**å’Œ**Zhipu AI**å…±åŒå®Œæˆï¼Œç›¸å…³è®ºæ–‡æ£åœ¨æ’°å†™ä¸ï¼Œå±Šæ—¶å°†æ›´æ–°å¼•ç”¨ä¿¡æ¯ï¼Œæ•¬è¯·å…³æ³¨~
 
 å¦‚æžœæ‚¨ä½¿ç”¨Longbenchï¼Œè¯·ä¸€å¹¶å¼•ç”¨LongBenchæ‰€åŸºäºŽçš„æ•°æ®é›†å¯¹åº”çš„è®ºæ–‡ï¼Œç›¸å…³å¼•ç”¨ä¿¡æ¯åœ¨[è¿™é‡Œ](refs/ref.bib)ã€‚