Add IPEX-LLM inference jupyter notebook and instructions (#3310)

alexsin368 · web-flow · commit 936d11bdf846 · 2024-11-22T06:13:11.000+09:00
* Initial commit for IPEX LLM notebook for BF16

* update notebook with commands for quantization

* update READMEs with instructions, fix quantization commands

* minor updates
diff --git a/examples/cpu/llm/README.md b/examples/cpu/llm/README.md
@@ -24,7 +24,7 @@ git submodule update --init --recursive
 docker build -f examples/cpu/llm/Dockerfile --build-arg COMPILE=ON --build-arg PORT_SSH=2345 -t ipex-llm:main .
 
 # Run the container with command below
-docker run --rm -it --privileged -v /dev/shm:/dev/shm ipex-llm:main bash
+docker run --rm -it --net host --privileged -v /dev/shm:/dev/shm ipex-llm:main bash
 
 # When the command prompt shows inside the docker container, enter llm examples directory
 cd llm
@@ -57,6 +57,42 @@ bash ./tools/env_setup.sh
 source ./tools/env_activate.sh [inference|fine-tuning]
 ```
 
+## 2.3 [Optional] Setup for Running Jupyter Notebooks
+
+After setting up your docker or conda environment, you may follow these additional steps to setup and run Jupyter Notebooks. The port number can be changed.
+
+### 2.3.1 Jupyter Notebooks for Docker-based Environments
+
+```bash
+# Install dependencies
+pip install notebook matplotlib
+
+# Launch Jupyter Notebook
+jupyter notebook --ip 0.0.0.0 --port 8888 --allow-root
+```
+
+1. Open up a web browser with the given URL and token.
+2. Open the notebook.
+3. Run all cells. 
+
+### 2.3.2 Jupyter Notebooks for Conda-based Environments
+
+```bash
+# Install dependencies
+pip install notebook ipykernel matplotlib
+
+# Register ipykernel with Conda
+python -m ipykernel install --user --name=IPEX-LLM
+
+# Launch Jupyter Notebook
+jupyter notebook --ip 0.0.0.0 --port 8888 --allow-root
+```
+
+1. Open up a web browser with the given URL and token.
+2. Open the notebook.
+3. Change your Jupyter Notebook kernel to IPEX-LLM.
+4. Run all cells. 
+
 <br>
 
 *Note*: In `env_setup.sh` script a `prompt.json` file is downloaded, which provides prompt samples with pre-defined input token lengths for benchmarking.
diff --git a/examples/cpu/llm/inference/README.md b/examples/cpu/llm/inference/README.md
@@ -114,6 +114,10 @@ python run.py --help # for more detailed usages
 
 *Note:* You may need to log in your HuggingFace account to access the model files. Please refer to [HuggingFace login](https://huggingface.co/docs/huggingface_hub/quick-start#login).
 
+**Alternatively, you can run the Jupyter Notebook to see ipex.llm with BF16 and various other quick start examples.**
+
+Additional setup instructions for running the notebook can be found [here](../README.md#23-optional-setup-for-running-jupyter-notebooks).
+
 ## 2.1 Quick example for running Llama2-7b
 
 ### 2.1.1 To run generation task and benchmark performance
diff --git a/examples/cpu/llm/inference/ipex_llm_optimizations_inference_single_instance.ipynb b/examples/cpu/llm/inference/ipex_llm_optimizations_inference_single_instance.ipynb