Add Groq LLM integration (#11317)

* Generate skeleton for Groq LLM integration. * Use OpenAI-like to implement Groq LLM integraton. Co-authored-by: gavinsherry <gavin.sherry@gmail.com> --------- Co-authored-by: gavinsherry <gavin.sherry@gmail.com>

Add Groq LLM integration (#11317)
8f8d4635 · Graden Rea · GitHub · 50806ba5 · 8f8d4635 · 8f8d4635
Unverified Commit 8f8d4635 authored 1 year ago by Graden Rea Committed by GitHub 1 year ago
--- a/docs/examples/llm/groq.ipynb
+++ b/docs/examples/llm/groq.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "4d1b897a",
+   "metadata": {},
+   "source": [
+    "<a href=\"https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/examples/llm/groq.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "2e33dced-e587-4397-81b3-d6606aa1738a",
+   "metadata": {},
+   "source": [
+    "# Groq\n",
+    "\n",
+    "Welcome to Groq! 🚀 At Groq, we've developed the world's first Language Processing Unit™, or LPU. The Groq LPU has a deterministic, single core streaming architecture that sets the standard for GenAI inference speed with predictable and repeatable performance for any given workload.\n",
+    "\n",
+    "Beyond the architecture, our software is designed to empower developers like you with the tools you need to create innovative, powerful AI applications. With Groq as your engine, you can:\n",
+    "\n",
+    "* Achieve uncompromised low latency and performance for real-time AI and HPC inferences 🔥\n",
+    "* Know the exact performance and compute time for any given workload 🔮\n",
+    "* Take advantage of our cutting-edge technology to stay ahead of the competition 💪\n",
+    "\n",
+    "Want more Groq? Check out our [website](https://groq.com) for more resources and join our [Discord community](https://discord.gg/JvNsBDKeCG) to connect with our developers!"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "5863dde9-84a0-4c33-ad52-cc767442f63f",
+   "metadata": {},
+   "source": [
+    "## Setup"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "833bdb2b",
+   "metadata": {},
+   "source": [
+    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4aff387e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "% pip install llama-index-llms-groq"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9bbbc106",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install llama-index"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ad297f19-998f-4485-aa2f-d67020058b7d",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.\n"
+     ]
+    }
+   ],
+   "source": [
+    "from llama_index.llms.groq import Groq"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4eefec25",
+   "metadata": {},
+   "source": [
+    "Create an API key at the [Groq console](https://console.groq.com/keys), then set it to the environment variable `GROQ_API_KEY`.\n",
+    "\n",
+    "```bash\n",
+    "export GROQ_API_KEY=<your api key>\n",
+    "```\n",
+    "\n",
+    "Alternatively, you can pass your API key to the LLM when you init it:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "152ced37-9a42-47be-9a39-4218521f5e72",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "llm = Groq(model=\"mixtral-8x7b-32768\", api_key=\"your_api_key\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "562455fe",
+   "metadata": {},
+   "source": [
+    "A list of available LLM models can be found [here](https://console.groq.com/docs/models)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d61b10bb-e911-47fb-8e84-19828cf224be",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = llm.complete(\"Explain the importance of low latency LLMs\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3bd14f4e-c245-4384-a471-97e4ddfcb40e",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Low latency Large Language Models (LLMs) are important in certain applications due to their ability to process and respond to inputs quickly. Latency refers to the time delay between a user's request and the system's response. In some real-time or time-sensitive applications, low latency is critical to ensure a smooth user experience and prevent delays or lag.\n",
+      "\n",
+      "For example, in conversational agents or chatbots, users expect quick and responsive interactions. If the system takes too long to process and respond to user inputs, it can negatively impact the user experience and lead to frustration. Similarly, in applications such as real-time language translation or speech recognition, low latency is essential to provide accurate and timely feedback to the user.\n",
+      "\n",
+      "Furthermore, low latency LLMs can enable new use cases and applications that require real-time or near real-time processing of language inputs. For instance, in the field of autonomous vehicles, low latency LLMs can be used for real-time speech recognition and natural language understanding, enabling voice-controlled interfaces that allow drivers to keep their hands on the wheel and eyes on the road.\n",
+      "\n",
+      "In summary, low latency LLMs are important for providing a smooth and responsive user experience, enabling real-time or near real-time processing of language inputs, and unlocking new use cases and applications that require real-time or near real-time processing of language inputs.\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(response)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "3ba9503c-b440-43c6-a50c-676c79993813",
+   "metadata": {},
+   "source": [
+    "#### Call `chat` with a list of messages"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ee8a4a55-5680-4dc6-a44c-fc8ad7892f80",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_index.core.llms import ChatMessage\n",
+    "\n",
+    "messages = [\n",
+    "    ChatMessage(\n",
+    "        role=\"system\", content=\"You are a pirate with a colorful personality\"\n",
+    "    ),\n",
+    "    ChatMessage(role=\"user\", content=\"What is your name\"),\n",
+    "]\n",
+    "resp = llm.chat(messages)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2a9bfe53-d15b-4e75-9d91-8c5d024f4eda",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "assistant: Arr, I be known as Captain Redbeard, the fiercest pirate on the seven seas! But ye can call me Cap'n Redbeard for short. I'm a fearsome pirate with a love for treasure and adventure, and I'm always ready for a good time! Whether I'm swabbin' the deck or swiggin' grog, I'm always up for a bit of fun. So hoist the Jolly Roger and let's set sail for adventure, me hearties!\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(resp)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "25ad1b00-28fc-4bcd-96c4-d5b35605721a",
+   "metadata": {},
+   "source": [
+    "### Streaming"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "13c641fa-345a-4dce-87c5-ab1f6dcf4757",
+   "metadata": {},
+   "source": [
+    "Using `stream_complete` endpoint "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "06da1ef1-2f6b-497c-847b-62dd2df11491",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = llm.stream_complete(\"Explain the importance of low latency LLMs\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1b851def-5160-46e5-a30c-5a3ef2356b79",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Low latency Large Language Models (LLMs) are important in the field of artificial intelligence and natural language processing (NLP) due to several reasons:\n",
+      "\n",
+      "1. Real-time applications: Low latency LLMs are essential for real-time applications such as chatbots, voice assistants, and real-time translation services. These applications require immediate responses, and high latency can result in a poor user experience.\n",
+      "2. Improved user experience: Low latency LLMs can provide a more seamless and responsive user experience. Users are more likely to continue using a service that provides quick and accurate responses, leading to higher user engagement and satisfaction.\n",
+      "3. Better decision-making: In some applications, such as financial trading or autonomous vehicles, low latency LLMs can provide critical information in real-time, enabling better decision-making and reducing the risk of accidents.\n",
+      "4. Scalability: Low latency LLMs can handle a higher volume of requests, making them more scalable and suitable for large-scale applications.\n",
+      "5. Competitive advantage: Low latency LLMs can provide a competitive advantage in industries where real-time decision-making and responsiveness are critical. For example, in online gaming or e-commerce, low latency LLMs can provide a more immersive and engaging user experience, leading to higher customer loyalty and revenue.\n",
+      "\n",
+      "In summary, low latency LLMs are essential for real-time applications, providing a better user experience, enabling better decision-making, improving scalability, and providing a competitive advantage. As LLMs continue to play an increasingly important role in various industries, low latency will become even more critical for their success."
+     ]
+    }
+   ],
+   "source": [
+    "for r in response:\n",
+    "    print(r.delta, end=\"\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "ca52051d-6b28-49d7-98f5-82e266a1c7a6",
+   "metadata": {},
+   "source": [
+    "Using `stream_chat` endpoint"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fe553190-52a9-436d-84ae-4dd99a1808f4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_index.core.llms import ChatMessage\n",
+    "\n",
+    "messages = [\n",
+    "    ChatMessage(\n",
+    "        role=\"system\", content=\"You are a pirate with a colorful personality\"\n",
+    "    ),\n",
+    "    ChatMessage(role=\"user\", content=\"What is your name\"),\n",
+    "]\n",
+    "resp = llm.stream_chat(messages)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "154c503c-f893-4b6b-8a65-a9a27b636046",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Arr, I be known as Captain Candybeard! A more colorful and swashbuckling pirate, ye will never find!"
+     ]
+    }
+   ],
+   "source": [
+    "for r in resp:\n",
+    "    print(r.delta, end=\"\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
+%% Cell type:markdown id:4d1b897a tags:
+
+<a href="https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/examples/llm/groq.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
+
+%% Cell type:markdown id:2e33dced-e587-4397-81b3-d6606aa1738a tags:
+
+# Groq
+
+Welcome to Groq! 🚀 At Groq, we've developed the world's first Language Processing Unit™, or LPU. The Groq LPU has a deterministic, single core streaming architecture that sets the standard for GenAI inference speed with predictable and repeatable performance for any given workload.
+
+Beyond the architecture, our software is designed to empower developers like you with the tools you need to create innovative, powerful AI applications. With Groq as your engine, you can:
+
+* Achieve uncompromised low latency and performance for real-time AI and HPC inferences 🔥
+* Know the exact performance and compute time for any given workload 🔮
+* Take advantage of our cutting-edge technology to stay ahead of the competition 💪
+
+Want more Groq? Check out our [website](https://groq.com) for more resources and join our [Discord community](https://discord.gg/JvNsBDKeCG) to connect with our developers!
+
+%% Cell type:markdown id:5863dde9-84a0-4c33-ad52-cc767442f63f tags:
+
+## Setup
+
+%% Cell type:markdown id:833bdb2b tags:
+
+If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.
+
+%% Cell type:code id:4aff387e tags:
+
+``` python
+% pip install llama-index-llms-groq
+```
+
+%% Cell type:code id:9bbbc106 tags:
+
+``` python
+!pip install llama-index
+```
+
+%% Cell type:code id:ad297f19-998f-4485-aa2f-d67020058b7d tags:
+
+``` python
+from llama_index.llms.groq import Groq
+```
+
+%% Output
+
+    None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
+
+%% Cell type:markdown id:4eefec25 tags:
+
+Create an API key at the [Groq console](https://console.groq.com/keys), then set it to the environment variable `GROQ_API_KEY`.
+
+```bash
+export GROQ_API_KEY=<your api key>
+```
+
+Alternatively, you can pass your API key to the LLM when you init it:
+
+%% Cell type:code id:152ced37-9a42-47be-9a39-4218521f5e72 tags:
+
+``` python
+llm = Groq(model="mixtral-8x7b-32768", api_key="your_api_key")
+```
+
+%% Cell type:markdown id:562455fe tags:
+
+A list of available LLM models can be found [here](https://console.groq.com/docs/models).
+
+%% Cell type:code id:d61b10bb-e911-47fb-8e84-19828cf224be tags:
+
+``` python
+response = llm.complete("Explain the importance of low latency LLMs")
+```
+
+%% Cell type:code id:3bd14f4e-c245-4384-a471-97e4ddfcb40e tags:
+
+``` python
+print(response)
+```
+
+%% Output
+
+    Low latency Large Language Models (LLMs) are important in certain applications due to their ability to process and respond to inputs quickly. Latency refers to the time delay between a user's request and the system's response. In some real-time or time-sensitive applications, low latency is critical to ensure a smooth user experience and prevent delays or lag.
+    
+    For example, in conversational agents or chatbots, users expect quick and responsive interactions. If the system takes too long to process and respond to user inputs, it can negatively impact the user experience and lead to frustration. Similarly, in applications such as real-time language translation or speech recognition, low latency is essential to provide accurate and timely feedback to the user.
+    
+    Furthermore, low latency LLMs can enable new use cases and applications that require real-time or near real-time processing of language inputs. For instance, in the field of autonomous vehicles, low latency LLMs can be used for real-time speech recognition and natural language understanding, enabling voice-controlled interfaces that allow drivers to keep their hands on the wheel and eyes on the road.
+    
+    In summary, low latency LLMs are important for providing a smooth and responsive user experience, enabling real-time or near real-time processing of language inputs, and unlocking new use cases and applications that require real-time or near real-time processing of language inputs.
+
+%% Cell type:markdown id:3ba9503c-b440-43c6-a50c-676c79993813 tags:
+
+#### Call `chat` with a list of messages
+
+%% Cell type:code id:ee8a4a55-5680-4dc6-a44c-fc8ad7892f80 tags:
+
+``` python
+from llama_index.core.llms import ChatMessage
+
+messages = [
+    ChatMessage(
+        role="system", content="You are a pirate with a colorful personality"
+    ),
+    ChatMessage(role="user", content="What is your name"),
+]
+resp = llm.chat(messages)
+```
+
+%% Cell type:code id:2a9bfe53-d15b-4e75-9d91-8c5d024f4eda tags:
+
+``` python
+print(resp)
+```
+
+%% Output
+
+    assistant: Arr, I be known as Captain Redbeard, the fiercest pirate on the seven seas! But ye can call me Cap'n Redbeard for short. I'm a fearsome pirate with a love for treasure and adventure, and I'm always ready for a good time! Whether I'm swabbin' the deck or swiggin' grog, I'm always up for a bit of fun. So hoist the Jolly Roger and let's set sail for adventure, me hearties!
+
+%% Cell type:markdown id:25ad1b00-28fc-4bcd-96c4-d5b35605721a tags:
+
+### Streaming
+
+%% Cell type:markdown id:13c641fa-345a-4dce-87c5-ab1f6dcf4757 tags:
+
+Using `stream_complete` endpoint
+
+%% Cell type:code id:06da1ef1-2f6b-497c-847b-62dd2df11491 tags:
+
+``` python
+response = llm.stream_complete("Explain the importance of low latency LLMs")
+```
+
+%% Cell type:code id:1b851def-5160-46e5-a30c-5a3ef2356b79 tags:
+
+``` python
+for r in response:
+    print(r.delta, end="")
+```
+
+%% Output
+
+    Low latency Large Language Models (LLMs) are important in the field of artificial intelligence and natural language processing (NLP) due to several reasons:
+    
+    1. Real-time applications: Low latency LLMs are essential for real-time applications such as chatbots, voice assistants, and real-time translation services. These applications require immediate responses, and high latency can result in a poor user experience.
+    2. Improved user experience: Low latency LLMs can provide a more seamless and responsive user experience. Users are more likely to continue using a service that provides quick and accurate responses, leading to higher user engagement and satisfaction.
+    3. Better decision-making: In some applications, such as financial trading or autonomous vehicles, low latency LLMs can provide critical information in real-time, enabling better decision-making and reducing the risk of accidents.
+    4. Scalability: Low latency LLMs can handle a higher volume of requests, making them more scalable and suitable for large-scale applications.
+    5. Competitive advantage: Low latency LLMs can provide a competitive advantage in industries where real-time decision-making and responsiveness are critical. For example, in online gaming or e-commerce, low latency LLMs can provide a more immersive and engaging user experience, leading to higher customer loyalty and revenue.
+    
+    In summary, low latency LLMs are essential for real-time applications, providing a better user experience, enabling better decision-making, improving scalability, and providing a competitive advantage. As LLMs continue to play an increasingly important role in various industries, low latency will become even more critical for their success.
+
+%% Cell type:markdown id:ca52051d-6b28-49d7-98f5-82e266a1c7a6 tags:
+
+Using `stream_chat` endpoint
+
+%% Cell type:code id:fe553190-52a9-436d-84ae-4dd99a1808f4 tags:
+
+``` python
+from llama_index.core.llms import ChatMessage
+
+messages = [
+    ChatMessage(
+        role="system", content="You are a pirate with a colorful personality"
+    ),
+    ChatMessage(role="user", content="What is your name"),
+]
+resp = llm.stream_chat(messages)
+```
+
+%% Cell type:code id:154c503c-f893-4b6b-8a65-a9a27b636046 tags:
+
+``` python
+for r in resp:
+    print(r.delta, end="")
+```
+
+%% Output
+
+    Arr, I be known as Captain Candybeard! A more colorful and swashbuckling pirate, ye will never find!
--- a/docs/module_guides/models/llms/modules.md
+++ b/docs/module_guides/models/llms/modules.md
@@ -94,6 +94,15 @@ maxdepth: 1
 /examples/llm/gradient_model_adapter.ipynb
 ```

+## Groq
+
+```{toctree}
+---
+maxdepth: 1
+---
+/examples/llm/groq.ipynb
+```
+
 ## Hugging Face

 ```{toctree}

--- a/llama-index-integrations/llms/llama-index-llms-groq/.gitignore
+++ b/llama-index-integrations/llms/llama-index-llms-groq/.gitignore
+llama_index/_static
+.DS_Store
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+bin/
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+etc/
+include/
+lib/
+lib64/
+parts/
+sdist/
+share/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+.ruff_cache
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+notebooks/
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+.python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+pyvenv.cfg
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# Jetbrains
+.idea
+modules/
+*.swp
+
+# VsCode
+.vscode
+
+# pipenv
+Pipfile
+Pipfile.lock
+
+# pyright
+pyrightconfig.json
--- a/llama-index-integrations/llms/llama-index-llms-groq/BUILD
+++ b/llama-index-integrations/llms/llama-index-llms-groq/BUILD
+poetry_requirements(
+    name="poetry",
+)
--- a/llama-index-integrations/llms/llama-index-llms-groq/Makefile
+++ b/llama-index-integrations/llms/llama-index-llms-groq/Makefile
+GIT_ROOT ?= $(shell git rev-parse --show-toplevel)
+
+help:	## Show all Makefile targets.
+	@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[33m%-30s\033[0m %s\n", $$1, $$2}'
+
+format:	## Run code autoformatters (black).
+	pre-commit install
+	git ls-files | xargs pre-commit run black --files
+
+lint:	## Run linters: pre-commit (black, ruff, codespell) and mypy
+	pre-commit install && git ls-files | xargs pre-commit run --show-diff-on-failure --files
+
+test:	## Run tests via pytest.
+	pytest tests
+
+watch-docs:	## Build and watch documentation.
+	sphinx-autobuild docs/ docs/_build/html --open-browser --watch $(GIT_ROOT)/llama_index/
--- a/llama-index-integrations/llms/llama-index-llms-groq/README.md
+++ b/llama-index-integrations/llms/llama-index-llms-groq/README.md
+# LlamaIndex Llms Integration: Groq
+
+Welcome to Groq! 🚀 At Groq, we've developed the world's first Language Processing Unit™, or LPU. The Groq LPU has a deterministic, single core streaming architecture that sets the standard for GenAI inference speed with predictable and repeatable performance for any given workload.
+
+Beyond the architecture, our software is designed to empower developers like you with the tools you need to create innovative, powerful AI applications. With Groq as your engine, you can:
+
+- Achieve uncompromised low latency and performance for real-time AI and HPC inferences 🔥
+- Know the exact performance and compute time for any given workload 🔮
+- Take advantage of our cutting-edge technology to stay ahead of the competition 💪
+
+Want more Groq? Check out our [website](https://groq.com) for more resources and join our [Discord community](https://discord.gg/JvNsBDKeCG) to connect with our developers!
+
+## Develop
+
+To create a development environment, install poetry then run:
+
+```bash
+poetry install --with dev
+```
+
+## Testing
+
+To test the integration, first enter the poetry venv:
+
+```bash
+poetry shell
+```
+
+Then tests can be run with make
+
+```bash
+make test
+```
+
+### Integration tests
+
+Integration tests will be skipped unless an API key is provided. API keys can be created ath the [groq console](https://console.groq.com/keys).
+Once created, store the API key in an environment variable and run tests
+
+```bash
+export GROQ_API_KEY=<your key here>
+make test
+```
+
+## Linting and Formatting
+
+Linting and code formatting can be executed with make.
+
+```bash
+make format
+make lint
+```
--- a/llama-index-integrations/llms/llama-index-llms-groq/llama_index/llms/groq/BUILD
+++ b/llama-index-integrations/llms/llama-index-llms-groq/llama_index/llms/groq/BUILD
+python_sources()
--- a/llama-index-integrations/llms/llama-index-llms-groq/llama_index/llms/groq/__init__.py
+++ b/llama-index-integrations/llms/llama-index-llms-groq/llama_index/llms/groq/__init__.py
+from llama_index.llms.groq.base import Groq
+
+__all__ = ["Groq"]
--- a/llama-index-integrations/llms/llama-index-llms-groq/llama_index/llms/groq/base.py
+++ b/llama-index-integrations/llms/llama-index-llms-groq/llama_index/llms/groq/base.py
+import os
+from typing import Any, Optional
+
+from llama_index.llms.openai_like import OpenAILike
+
+
+class Groq(OpenAILike):
+    def __init__(
+        self,
+        model: str,
+        api_key: Optional[str] = None,
+        api_base: str = "https://api.groq.com/openai/v1",
+        is_chat_model: bool = True,
+        **kwargs: Any,
+    ) -> None:
+        api_key = api_key or os.environ.get("GROQ_API_KEY", None)
+        super().__init__(
+            model=model,
+            api_key=api_key,
+            api_base=api_base,
+            is_chat_model=is_chat_model,
+            **kwargs,
+        )
+
+    @classmethod
+    def class_name(cls) -> str:
+        """Get class name."""
+        return "Groq"
--- a/llama-index-integrations/llms/llama-index-llms-groq/pyproject.toml
+++ b/llama-index-integrations/llms/llama-index-llms-groq/pyproject.toml
+[build-system]
+build-backend = "poetry.core.masonry.api"
+requires = ["poetry-core"]
+
+[tool.codespell]
+check-filenames = true
+check-hidden = true
+skip = "*.csv,*.html,*.json,*.jsonl,*.pdf,*.txt,*.ipynb"
+
+[tool.llamahub]
+contains_example = false
+import_path = "llama_index.llms.groq"
+
+[tool.llamahub.class_authors]
+Groq = "gradenr"
+
+[tool.mypy]
+disallow_untyped_defs = true
+exclude = ["_static", "build", "examples", "notebooks", "venv"]
+ignore_missing_imports = true
+python_version = "3.8"
+
+[tool.poetry]
+authors = ["Your Name <you@example.com>"]
+description = "llama-index llms groq integration"
+exclude = ["**/BUILD"]
+license = "MIT"
+name = "llama-index-llms-groq"
+readme = "README.md"
+version = "0.1.3"
+
+[tool.poetry.dependencies]
+python = ">=3.8.1,<4.0"
+llama-index-core = "^0.10.1"
+llama-index-llms-openai-like = "^0.1.3"
+
+[tool.poetry.group.dev.dependencies]
+ipython = "8.10.0"
+jupyter = "^1.0.0"
+mypy = "0.991"
+pre-commit = "3.2.0"
+pylint = "2.15.10"
+pytest = "7.2.1"
+pytest-mock = "3.11.1"
+ruff = "0.0.292"
+tree-sitter-languages = "^1.8.0"
+types-Deprecated = ">=0.1.0"
+types-PyYAML = "^6.0.12.12"
+types-protobuf = "^4.24.0.4"
+types-redis = "4.5.5.0"
+types-requests = "2.28.11.8"
+types-setuptools = "67.1.0.0"
+
+[tool.poetry.group.dev.dependencies.black]
+extras = ["jupyter"]
+version = "<=23.9.1,>=23.7.0"
+
+[tool.poetry.group.dev.dependencies.codespell]
+extras = ["toml"]
+version = ">=v2.2.6"
+
+[[tool.poetry.packages]]
+include = "llama_index/"
--- a/llama-index-integrations/llms/llama-index-llms-groq/tests/BUILD
+++ b/llama-index-integrations/llms/llama-index-llms-groq/tests/BUILD
+python_tests()
--- a/llama-index-integrations/llms/llama-index-llms-groq/tests/__init__.py
+++ b/llama-index-integrations/llms/llama-index-llms-groq/tests/__init__.py
--- a/llama-index-integrations/llms/llama-index-llms-groq/tests/test_integration_groq.py
+++ b/llama-index-integrations/llms/llama-index-llms-groq/tests/test_integration_groq.py
+import os
+
+import pytest
+
+from llama_index.llms.groq import Groq
+
+
+@pytest.mark.skipif("GROQ_API_KEY" not in os.environ, reason="No Groq API key")
+def test_completion():
+    groq = Groq(model="mixtral-8x7b-32768", temperature=0, max_tokens=2)
+    resp = groq.complete("hello")
+    assert resp.text == "Hello"
+
+
+@pytest.mark.skipif("GROQ_API_KEY" not in os.environ, reason="No Groq API key")
+def test_stream_completion():
+    groq = Groq(model="mixtral-8x7b-32768", temperature=0, max_tokens=2)
+    stream = groq.stream_complete("hello")
+    text = None
+    for chunk in stream:
+        text = chunk.text
+    assert text == "Hello"
--- a/llama-index-integrations/llms/llama-index-llms-groq/tests/test_llms_groq.py
+++ b/llama-index-integrations/llms/llama-index-llms-groq/tests/test_llms_groq.py
+from llama_index.core.base.llms.base import BaseLLM
+from llama_index.llms.groq import Groq
+
+
+def test_embedding_class():
+    names_of_base_classes = [b.__name__ for b in Groq.__mro__]
+    assert BaseLLM.__name__ in names_of_base_classes