Contributing to LlamaIndex
Interested in contributing to LlamaIndex? Here's how to get started!
Contribution Guideline
The best part of LlamaIndex is our community of users and contributors.
What should I work on?
-
Extend core modules by contributing an integration -
Contribute a Tool, Reader, Pack, or Dataset (formerly from llama-hub) -
Add new capabilities to core -
Fix bugs -
Add usage examples -
Add experimental features -
Improve code quality & documentation
Also, join our Discord for ideas and discussions: https://discord.gg/dGcwcsnxhU.

Extend Core Modules
1. 
The most impactful way to contribute to LlamaIndex is by extending our core modules:
We welcome contributions in all modules shown above. So far, we have implemented a core set of functionalities for each, all of which are encapsulated in the LlamaIndex core package. As a contributor, you can help each module unlock its full potential. Provided below are brief description of these modules. You can also refer to their respective folders within this Github repository for some example integrations.
Contributing an integration involves submitting the source code for a new Python package. For now, these integrations will live in the LlamaIndex Github repository and the team will be responsible for publishing the package to PyPi. (Having these packages live outside of this repository and maintained by our community members is in consideration.)
Creating A New Integration Package
Both llama-index
and llama-index-core
come equipped
with a command-line tool that can be used to initialize a new integration package.
cd ./llama-index-integrations/llms
llamaindex-cli new-package --kind "llms" --name "gemini"
Executing the above commands will create a new folder called llama-index-llms-gemini
within the llama-index-integrations/llms
directory.
Please ensure to add a detailed README for your new package as it will appear in
both llamahub.ai as well as the PyPi.org website.
In addition to preparing your source code and supplying a detailed README, we
also ask that you fill in some
metadata for your package to appear in llamahub.ai with the
correct information. You do so by adding the required metadata under the [tool.llamahub]
section with your new package's pyproject.toml
.
Below is the example of the metadata required for all of our integration packages. Please replace the default author "llama-index" with your own Github user name.
[tool.llamahub]
contains_example = false
import_path = "llama_index.llms.anthropic"
[tool.llamahub.class_authors]
Anthropic = "llama-index"
(source)
NOTE: We are making rapid improvements to the project, and as a result, some interfaces are still volatile. Specifically, we are actively working on making the following components more modular and extensible (uncolored boxes above): core indexes, document stores, index queries, query runner
Module Details
Below, we will describe what each module does, give a high-level idea of the interface, show existing implementations, and give some ideas for contribution.
Data Loaders
A data loader ingests data of any format from anywhere into Document
objects, which can then be parsed and indexed.
Interface:
-
load_data
takes arbitrary arguments as input (e.g. path to data), and outputs a sequence ofDocument
objects. -
lazy_load_data
takes arbitrary arguments as input (e.g. path to data), and outputs an iterable object ofDocument
objects. This is a lazy version ofload_data
, which is useful for large datasets.
Note: If only
lazy_load_data
is implemented,load_data
will be delegated to it.
Examples:
Contributing a data loader is easy and super impactful for the community. The preferred way to contribute is by making a PR at LlamaHub Github.
Ideas
- Want to load something but there's no LlamaHub data loader for it yet? Make a PR!
Node Parser
A node parser parses Document
objects into Node
objects (atomic units of data that LlamaIndex operates over, e.g., chunk of text, image, or table).
It is responsible for splitting text (via text splitters) and explicitly modeling the relationship between units of data (e.g. A is the source of B, C is a chunk after D).
Interface: get_nodes_from_documents
takes a sequence of Document
objects as input, and outputs a sequence of Node
objects.
Examples:
See the API reference for full details.
Ideas:
- Add new
Node
relationships to model hierarchical documents (e.g. play-act-scene, chapter-section-heading).
Text Splitters
Text splitter splits a long text str
into smaller text str
chunks with desired size and splitting "strategy" since LLMs have a limited context window size, and the quality of text chunk used as context impacts the quality of query results.
Interface: split_text
takes a str
as input, and outputs a sequence of str
Examples: