From 46361d3ffdfb46ec1d2f2dc77559a19941c2a5f2 Mon Sep 17 00:00:00 2001 From: Jerry Liu <jerryjliu98@gmail.com> Date: Mon, 16 Jan 2023 11:36:37 -0800 Subject: [PATCH] [readme] update with discord + updated summary content (#237) Co-authored-by: Jerry Liu <jerry@robustintelligence.com> --- README.md | 36 +++++++++++++++++++++--------------- docs/index.rst | 1 - 2 files changed, 21 insertions(+), 16 deletions(-) diff --git a/README.md b/README.md index 693451d323..19fb872fa3 100644 --- a/README.md +++ b/README.md @@ -6,28 +6,34 @@ PyPi: https://pypi.org/project/gpt-index/. Documentation: https://gpt-index.readthedocs.io/en/latest/. +Discord: https://discord.gg/dGcwcsnxhU. + ## 🚀 Overview **NOTE**: This README is not updated as frequently as the documentation. Please check out the documentation above for the latest updates! #### Context -- LLMs are a phenomenal piece of technology for knowledge generation and reasoning. -- A big limitation of LLMs is context size (e.g. OpenAI's `davinci` model for GPT-3 has a [limit](https://openai.com/api/pricing/) of 4096 tokens. Large, but not infinite). +- LLMs are a phenomenonal piece of technology for knowledge generation and reasoning. +- A big limitation of LLMs is context size (e.g. Davinci's limit is 4096 tokens. Large, but not infinite). - The ability to feed "knowledge" to LLMs is restricted to this limited prompt size and model weights. -- **Thought**: what if LLMs could have access to potentially a much larger database of knowledge without retraining/finetuning? #### Proposed Solution -That's where **GPT Index** comes in. GPT Index is a simple, flexible interface between your external data and LLMs. It resolves the following pain points: -- Provides simple data structures to resolve prompt size limitations. -- Offers data connectors to your external data sources. -- Offers you a comprehensive toolset trading off cost and performance. +At its core, GPT Index contains a toolkit of **index data structures** designed to easily connect LLM's with your external data. +GPT Index helps to provide the following advantages: +- Remove concerns over prompt size limitations. +- Abstract common usage patterns to reduce boilerplate code in your LLM app. +- Provide data connectors to your common data sources (Google Docs, Slack, etc.). +- Provide cost transparency + tools that reduce cost while increasing performance. + -At the core of GPT Index is a **data structure**. Instead of relying on world knowledge encoded in model weights, a GPT Index data structure does the following: +Each data structure offers distinct use cases and a variety of customizable parameters. These indices can then be +*queried* in a general purpose manner, in order to achieve any task that you would typically achieve with an LLM: +- Question-Answering +- Summarization +- Text Generation (Stories, TODO's, emails, etc.) +- and more! -- Uses a pre-trained LLM primarily for *reasoning*/*summarization* instead of prior knowledge. -- Takes a large corpus of text data as input and builds a structured index over it (using an LLM or heuristics). -- Allow users to *query* the index in order to synthesize an answer to the question - this requires both *traversal* of the index as well as a synthesis of the answer. ## 💡 Contributing @@ -48,11 +54,11 @@ pip install gpt-index Examples are in the `examples` folder. Indices are in the `indices` folder (see list of indices below). -To build a tree index: +To build a simple vector store index: ```python -from gpt_index import GPTTreeIndex, SimpleDirectoryReader +from gpt_index import GPTSimpleVectorIndex, SimpleDirectoryReader documents = SimpleDirectoryReader('data').load_data() -index = GPTTreeIndex(documents) +index = GPTSimpleVectorIndex(documents) ``` To save to and load from disk: @@ -60,7 +66,7 @@ To save to and load from disk: # save to disk index.save_to_disk('index.json') # load from disk -index = GPTTreeIndex.load_from_disk('index.json') +index = GPTSimpleVectorIndex.load_from_disk('index.json') ``` To query: diff --git a/docs/index.rst b/docs/index.rst index 2e9f5f7bc9..9b39c6ac61 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -24,7 +24,6 @@ Context - LLMs are a phenomenonal piece of technology for knowledge generation and reasoning. - A big limitation of LLMs is context size (e.g. Davinci's limit is 4096 tokens. Large, but not infinite). - The ability to feed "knowledge" to LLMs is restricted to this limited prompt size and model weights. -- **Thought**: What if LLMs can have access to potentially a much larger database of knowledge without retraining/finetuning? Proposed Solution ^^^^^^^^^^^^^^^^^ -- GitLab