diff --git a/README.md b/README.md index 562b5aa3b3423fcca342488ab94b96ae265dbfd9..31c328044f111ec7b08b52b96a0acf75f0b96813 100644 --- a/README.md +++ b/README.md @@ -25,6 +25,7 @@ LongBench includes 14 English tasks, 5 Chinese tasks, and 2 code tasks, with the ## 🔥 Updates **[2023/10/30]** The new [ChatGLM3-6B-32k](https://huggingface.co/THUDM/chatglm3-6b-32k) chat model is out, with better proficiency at long context modeling and is especially good at long document based question answering, reasoning and summarization. Check out its [performance](#leaderboard) on LongBench. + **[2023/08/29]** The [LongBench paper](https://arxiv.org/abs/2308.14508) is released, along with several important updates to LongBench: 1. **More comprehensive datasets**: The MultiNews dataset for multi-document summarization is added to the summarization tasks, and the summarization task SAMSum is added to the Few-shot learning tasks, replacing the previous QA task NQ. TriviaQA and RepoBench-P are resampled to ensure a more appropriate data length; 2. **More uniformed length distribution**: LongBench-E is obtained by uniform sampling according to length, featuring a comparable amount of test data in the length intervals of 0-4k, 4-8k, and 8k+, which is more suitable for evaluating the model's ability in different input lengths variation; diff --git a/README_ZH.md b/README_ZH.md index 1c82184632c89b8d9ff3f049212739da5417e07d..fc6990f9e7a07f852c69d44d70d6a7cad00ad59a 100644 --- a/README_ZH.md +++ b/README_ZH.md @@ -25,6 +25,7 @@ LongBench包å«14个英文任务ã€5个ä¸æ–‡ä»»åŠ¡å’Œ2个代ç 任务,多数 ## 🔥 æ›´æ–°ä¿¡æ¯ **[2023/10/30]** æ–°çš„[ChatGLM3-6B-32k](https://huggingface.co/THUDM/chatglm3-6b-32k)chat模型已ç»å‘布,它更擅长长文本建模,尤其是基于长文档的问ç”ã€æŽ¨ç†å’Œæ€»ç»“。请在LongBench上查看其[性能](#排行榜)。 + **[2023/08/29]** [LongBench论文](https://arxiv.org/abs/2308.14508)å‘布,åŒæ—¶å¯¹LongBenchè¿›è¡Œäº†ä»¥ä¸‹å‡ é¡¹é‡è¦æ›´æ–°ï¼š 1. **æ›´å…¨é¢çš„æ•°æ®é›†**:在摘è¦ä»»åŠ¡ä¸å¢žåŠ 了多文档摘è¦MultiNewsæ•°æ®é›†ï¼Œåœ¨Few-shotå¦ä¹ 任务ä¸å¢žåŠ 了摘è¦ä»»åŠ¡SAMSum,代替之å‰çš„QA任务NQ,并对TriviaQA, RepoBench-P进行é‡æ–°é‡‡æ ·ä»¥ä¿è¯æ•°æ®é•¿åº¦æ›´åŠ åˆé€‚ï¼› 2. **æ›´å‡åŒ€çš„长度分布**ï¼šæ ¹æ®é•¿åº¦è¿›è¡Œå‡åŒ€é‡‡æ ·å¾—到了LongBench-E,其包å«LongBenchä¸çš„13ä¸ªé•¿åº¦åˆ†å¸ƒæ›´åŠ å‡åŒ€çš„英文数æ®é›†ï¼ŒLongBench-E在0-4k,4-8k,8k+长度区间内å‡æœ‰æ•°é‡ç›¸å½“的测试数æ®ï¼Œæ›´åŠ 适åˆè¯„价模型在ä¸åŒè¾“入长度上的能力å˜åŒ–ï¼›