Snippets Groups Projects

11 months ago
3d3a8b96

v0.10.24 (#12291) · 3d3a8b96
Logan authored 11 months ago

3d3a8b96

History

v0.10.24 (#12291)
Logan authored 11 months ago

Code owners

Assign users and groups as approvers for specific file changes. Learn more.

After you've reviewed these contribution guidelines, you'll be all set to contribute to this project.

CONTRIBUTING.md 21.98 KiB

Contributing to LlamaIndex

Interested in contributing to LlamaIndex? Here's how to get started!

Contribution Guideline

The best part of LlamaIndex is our community of users and contributors.

What should I work on?

Extend core modules by contributing an integration
Contribute a Tool, Reader, Pack, or Dataset (formerly from llama-hub)
Add new capabilities to core
Fix bugs
Add usage examples
Add experimental features
Improve code quality & documentation

Also, join our Discord for ideas and discussions: https://discord.gg/dGcwcsnxhU.

1. Extend Core Modules

The most impactful way to contribute to LlamaIndex is by extending our core modules:

We welcome contributions in all modules shown above. So far, we have implemented a core set of functionalities for each, all of which are encapsulated in the LlamaIndex core package. As a contributor, you can help each module unlock its full potential. Provided below are brief description of these modules. You can also refer to their respective folders within this Github repository for some example integrations.

Contributing an integration involves submitting the source code for a new Python package. For now, these integrations will live in the LlamaIndex Github repository and the team will be responsible for publishing the package to PyPi. (Having these packages live outside of this repository and maintained by our community members is in consideration.)

Creating A New Integration Package

Both llama-index and llama-index-core come equipped with a command-line tool that can be used to initialize a new integration package.

cd ./llama-index-integrations/llms
llamaindex-cli new-package --kind "llms" --name "gemini"

Executing the above commands will create a new folder called llama-index-llms-gemini within the llama-index-integrations/llms directory.

Please ensure to add a detailed README for your new package as it will appear in both llamahub.ai as well as the PyPi.org website. In addition to preparing your source code and supplying a detailed README, we also ask that you fill in some metadata for your package to appear in llamahub.ai with the correct information. You do so by adding the required metadata under the [tool.llamahub] section with your new package's pyproject.toml.

Below is the example of the metadata required for all of our integration packages. Please replace the default author "llama-index" with your own Github user name.

[tool.llamahub]
contains_example = false
import_path = "llama_index.llms.anthropic"

[tool.llamahub.class_authors]
Anthropic = "llama-index"

NOTE: We are making rapid improvements to the project, and as a result, some interfaces are still volatile. Specifically, we are actively working on making the following components more modular and extensible (uncolored boxes above): core indexes, document stores, index queries, query runner

Module Details

Below, we will describe what each module does, give a high-level idea of the interface, show existing implementations, and give some ideas for contribution.

Data Loaders

A data loader ingests data of any format from anywhere into Document objects, which can then be parsed and indexed.

Interface:

load_data takes arbitrary arguments as input (e.g. path to data), and outputs a sequence of Document objects.
lazy_load_data takes arbitrary arguments as input (e.g. path to data), and outputs an iterable object of Document objects. This is a lazy version of load_data, which is useful for large datasets.

Note: If only lazy_load_data is implemented, load_data will be delegated to it.

Examples:

Contributing a data loader is easy and super impactful for the community. The preferred way to contribute is by making a PR at LlamaHub Github.

Ideas

Want to load something but there's no LlamaHub data loader for it yet? Make a PR!

Node Parser

A node parser parses Document objects into Node objects (atomic units of data that LlamaIndex operates over, e.g., chunk of text, image, or table). It is responsible for splitting text (via text splitters) and explicitly modeling the relationship between units of data (e.g. A is the source of B, C is a chunk after D).

Interface: get_nodes_from_documents takes a sequence of Document objects as input, and outputs a sequence of Node objects.

Examples:

Simple Node Parser

See the API reference for full details.

Ideas:

Add new Node relationships to model hierarchical documents (e.g. play-act-scene, chapter-section-heading).

Text Splitters

Text splitter splits a long text str into smaller text str chunks with desired size and splitting "strategy" since LLMs have a limited context window size, and the quality of text chunk used as context impacts the quality of query results.

Interface: split_text takes a str as input, and outputs a sequence of str

Examples:

Document/Index/KV Stores

Strive to be the person your dogs believe you are