[Community] Resolve releasing error for `llama-index-embeddings-ipex-llm` with...

[Community] Resolve releasing error for `llama-index-embeddings-ipex-llm` with Intel GPU supports (#13511)

[Community] Resolve releasing error for `llama-index-embeddings-ipex-llm` with...
664d2719 · Yuwen Hu · GitHub · e5ca11ff · 664d2719 · 664d2719
Unverified Commit 664d2719 authored 11 months ago by Yuwen Hu Committed by GitHub 11 months ago
--- a/docs/docs/examples/embeddings/ipex_llm_gpu.ipynb
+++ b/docs/docs/examples/embeddings/ipex_llm_gpu.ipynb
@@ -17,7 +17,7 @@
    "## Install Prerequisites\n",
    "To benefit from IPEX-LLM on Intel GPUs, there are several prerequisite steps for tools installation and environment preparation.\n",
    "\n",
-    "If you are a Windows user, visit the [Install IPEX-LLM on Windows with Intel GPU Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html), and follow [**Install Prerequisites**](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html#install-prerequisites) to install Visual Studio 2022, GPU driver, Conda, and Intel® oneAPI Base Toolkit 2024.0.\n",
+    "If you are a Windows user, visit the [Install IPEX-LLM on Windows with Intel GPU Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html), and follow [**Install Prerequisites**](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html#install-prerequisites) to update GPU driver (optional) and install Conda.\n",
    "\n",
    "If you are a Linux user, visit the [Install IPEX-LLM on Linux with Intel GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html), and follow [**Install Prerequisites**](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html#install-prerequisites) to install GPU driver, Intel® oneAPI Base Toolkit 2024.0, and Conda.\n",
    "\n",

 %% Cell type:markdown id: tags:

 # Local Embeddings with IPEX-LLM on Intel GPU

 > [IPEX-LLM](https://github.com/intel-analytics/ipex-llm/) is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low latency.

 This example goes over how to use LlamaIndex to conduct embedding tasks with `ipex-llm` optimizations on Intel GPU. This would be helpful in applications such as RAG, document QA, etc.

 > **Note**
 >
 > You could refer to [here](https://github.com/run-llama/llama_index/tree/main/llama-index-integrations/embeddings/llama-index-embeddings-ipex-llm/examples) for full examples of `IpexLLMEmbedding`. Please note that for running on Intel GPU, please specify `-d 'xpu'` in command argument when running the examples.

 ## Install Prerequisites
 To benefit from IPEX-LLM on Intel GPUs, there are several prerequisite steps for tools installation and environment preparation.

-If you are a Windows user, visit the [Install IPEX-LLM on Windows with Intel GPU Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html), and follow [**Install Prerequisites**](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html#install-prerequisites) to install Visual Studio 2022, GPU driver, Conda, and Intel® oneAPI Base Toolkit 2024.0.
+If you are a Windows user, visit the [Install IPEX-LLM on Windows with Intel GPU Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html), and follow [**Install Prerequisites**](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html#install-prerequisites) to update GPU driver (optional) and install Conda.

 If you are a Linux user, visit the [Install IPEX-LLM on Linux with Intel GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html), and follow [**Install Prerequisites**](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html#install-prerequisites) to install GPU driver, Intel® oneAPI Base Toolkit 2024.0, and Conda.

 ## Install `llama-index-embeddings-ipex-llm`

 After the prerequisites installation, you should have created a conda environment with all prerequisites installed, activate your conda environment and install `llama-index-embeddings-ipex-llm` as follows:

 ```bash
 conda activate <your-conda-env-name>

 pip install llama-index-embeddings-ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 This step will also install `ipex-llm` and its dependencies.

 > **Note**
 >
 > You can also use `https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/` as the `extra-indel-url`.


 ## Runtime Configuration

 For optimal performance, it is recommended to set several environment variables based on your device:

 ### For Windows Users with Intel Core Ultra integrated GPU

 In Anaconda Prompt:

 ```
 set SYCL_CACHE_PERSISTENT=1
 set BIGDL_LLM_XMX_DISABLED=1
 ```

 ### For Linux Users with Intel Arc A-Series GPU

 ```bash
 # Configure oneAPI environment variables. Required step for APT or offline installed oneAPI.
 # Skip this step for PIP-installed oneAPI since the environment has already been configured in LD_LIBRARY_PATH.
 source /opt/intel/oneapi/setvars.sh

 # Recommended Environment Variables for optimal performance
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```

 > **Note**
 >
 > For the first time that each model runs on Intel iGPU/Intel Arc A300-Series or Pro A60, it may take several minutes to compile.
 >
 > For other GPU type, please refer to [here](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration) for Windows users, and  [here](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#id5) for Linux users.

 ## `IpexLLMEmbedding`

 Setting `device="xpu"` when initializing `IpexLLMEmbedding` will put the embedding model on Intel GPU and benefit from IPEX-LLM optimizations:

 ```python
 from llama_index.embeddings.ipex_llm import IpexLLMEmbedding

 embedding_model = IpexLLMEmbedding(
    model_name="BAAI/bge-large-en-v1.5", device="xpu"
 )
 ```

 > Please note that `IpexLLMEmbedding` currently only provides optimization for Hugging Face Bge models.

 You could then conduct the embedding tasks as normal:

 ```python
 sentence = "IPEX-LLM is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low latency."
 query = "What is IPEX-LLM?"

 text_embedding = embedding_model.get_text_embedding(sentence)
 print(f"embedding[:10]: {text_embedding[:10]}")

 text_embeddings = embedding_model.get_text_embedding_batch([sentence, query])
 print(f"text_embeddings[0][:10]: {text_embeddings[0][:10]}")
 print(f"text_embeddings[1][:10]: {text_embeddings[1][:10]}")

 query_embedding = embedding_model.get_query_embedding(query)
 print(f"query_embedding[:10]: {query_embedding[:10]}")
 ```

--- a/llama-index-integrations/embeddings/llama-index-embeddings-ipex-llm/pyproject.toml
+++ b/llama-index-integrations/embeddings/llama-index-embeddings-ipex-llm/pyproject.toml
@@ -35,10 +35,10 @@ version = "0.1.1"
 [tool.poetry.dependencies]
 python = ">=3.9,<4.0"
 llama-index-core = "^0.10.0"
-ipex-llm = {allow-prereleases = true, extras = ["llama-index"], version = ">=2.1.0b20240423"}
-torch = {optional = true, version = "2.1.0a0"}
-torchvision = {optional = true, version = "0.16.0a0"}
-intel_extension_for_pytorch = {optional = true, version = "2.1.10+xpu"}
+ipex-llm = {allow-prereleases = true, extras = ["llama-index"], version = ">=2.1.0b20240514"}
+torch = {optional = true, source = "ipex-xpu-src-us", version = "2.1.0a0"}
+torchvision = {optional = true, source = "ipex-xpu-src-us", version = "0.16.0a0"}
+intel_extension_for_pytorch = {optional = true, source = "ipex-xpu-src-us", version = "2.1.10+xpu"}
 bigdl-core-xe-21 = {optional = true, version = "*"}
 bigdl-core-xe-esimd-21 = {optional = true, version = "*"}

@@ -63,3 +63,13 @@ types-protobuf = "^4.24.0.4"
 types-redis = "4.5.5.0"
 types-requests = "2.28.11.8"  # TODO: unpin when mypy>0.991
 types-setuptools = "67.1.0.0"
+
+[[tool.poetry.source]]
+name = "ipex-xpu-src-us"
+priority = "explicit"
+url = "https://pytorch-extension.intel.com/release-whl/stable/xpu/us/"
+
+[[tool.poetry.source]]
+name = "ipex-xpu-src-cn"
+priority = "supplemental"
+url = "https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/"