diff --git a/tutorials/README.md b/tutorials/README.md index 817501a..23f1a4b 100644 --- a/tutorials/README.md +++ b/tutorials/README.md @@ -22,7 +22,7 @@ Step-by-step walkthroughs covering adapter function invocation, pipeline constru | Guide | Description | |-------|-------------| | [Using Mellea with Granite Switch](guides/mellea_with_granite_switch.md) | Connect Mellea to a Granite Switch model | -| [Bring Your Own Adapter](guides/build_your_own_adapter.md) | Train, compose, and use custom adapters | +| [Build Your Own Adapter](guides/build_your_own_adapter.md) | Train, compose, and use custom adapters | | [Compare Inference Throughput](guides/compare_inference_throughput.md) | Compare LoRA vs aLoRA based models in an inference race setup | @@ -57,11 +57,11 @@ Best for: Seeing how adapter functions compose into multi-step applications -### Path 3: Bring Your Own Adapter +### Path 3: Build Your Own Adapter Best for: Custom adapter function development -1. [Bring Your Own Adapter Guide](guides/build_your_own_adapter.md) +1. [Build Your Own Adapter Guide](guides/build_your_own_adapter.md) 2. [Configure Your Own Adapter Guide](guides/mellea_build_your_own_adapter.md) 3. [Compose Your Checkpoint](notebooks/compose_granite_switch.ipynb) diff --git a/tutorials/guides/build_your_own_adapter.md b/tutorials/guides/build_your_own_adapter.md index 7af4d92..c7227a0 100644 --- a/tutorials/guides/build_your_own_adapter.md +++ b/tutorials/guides/build_your_own_adapter.md @@ -1,4 +1,4 @@ -# Bring Your Own Adapter (BYOA) +# Build Your Own Adapter (BYOA) This guide explains how to train your own adapter (aLoRA or LoRA) and compose it into a Granite Switch model. @@ -183,7 +183,7 @@ The base model's tokenizer and generation assets (`generation_config.json`, `mer ## Step 4: Use the Composed Model -> **Note:** Custom (BYOA) adapters are not supported by [Mellea](https://github.com/generative-computing/mellea). Mellea only supports the official IBM Granite Library adapters. To invoke your custom adapters, use the chat template directly as shown below. +You can invoke your custom adapter directly via the model's chat template (with HuggingFace or vLLM), or through [Mellea](https://github.com/generative-computing/mellea). Note that Mellea's *high-level* wrappers (Guardian, RAG, Core) are specific to the official IBM Granite Library adapters; a custom (BYOA) adapter is invoked by name through Mellea's lower-level `Intrinsic` interface, as shown in the **With Mellea** section below. ### With HuggingFace @@ -247,6 +247,52 @@ response = client.chat.completions.create( print(response.choices[0].message.content) ``` +### With Mellea + +Mellea can drive the same vLLM server. Its high-level wrappers (Guardian, RAG, Core) only cover the official IBM Granite Library adapters, but a custom adapter is invoked by name through the lower-level `Intrinsic` interface: + +```bash +pip install mellea +``` + +```python +import json + +from mellea.backends.model_options import ModelOption +from mellea.backends.openai import OpenAIBackend +from mellea.stdlib.context import ChatContext +from mellea.stdlib.components import Message, Intrinsic +import mellea.stdlib.functional as mfuncs + +# Connect to the vLLM server from the "With vLLM" section above. +# load_embedded_adapters=True autoloads the adapters composed into the model. +backend = OpenAIBackend( + model_id="./composed-model", + base_url="http://localhost:8000/v1", + api_key="unused", # vLLM doesn't require auth by default + load_embedded_adapters=True, +) + +# Invoke your custom adapter by name (the io.yaml `name` from Step 2/3). +context = ChatContext().add(Message("assistant", "Paris is the capital of France.")) +action = Intrinsic("uncertainty") + +out, _ = mfuncs.act( + action, + context, + backend, + model_options={ModelOption.TEMPERATURE: 0.0}, + strategy=None, +) + +# The adapter's io.yaml `response_format` (see Step 3) forces JSON output, and its +# `transformations` rename `score` to `certainty`. +result = json.loads(str(out)) +print(result["certainty"]) +``` + +For the full walkthrough — including helper functions that wrap your adapter so it behaves like Mellea's built-in intrinsics — see [Build Your Own Adapter with Mellea](mellea_build_your_own_adapter.md). + ## Next Steps - **[Hello Adapter](../notebooks/hello_adapter.ipynb)** - minimal embedded-adapter invocation via the HuggingFace backend diff --git a/tutorials/guides/compare_inference_throughput.md b/tutorials/guides/compare_inference_throughput.md index 961172f..2ed2719 100644 --- a/tutorials/guides/compare_inference_throughput.md +++ b/tutorials/guides/compare_inference_throughput.md @@ -87,4 +87,4 @@ raced simultaneously. - **[Hello Adapter](../notebooks/hello_adapter.ipynb)** - minimal embedded-adapter invocation via the HuggingFace backend - **[Using Mellea with Granite Switch](mellea_with_granite_switch.md)** - deeper Mellea integration details -- **[Bring Your Own Adapter](build_your_own_adapter.md)** - train a custom adapter and compose it in +- **[Build Your Own Adapter](build_your_own_adapter.md)** - train a custom adapter and compose it in diff --git a/tutorials/guides/mellea_build_your_own_adapter.md b/tutorials/guides/mellea_build_your_own_adapter.md index 46779ef..5853f09 100644 --- a/tutorials/guides/mellea_build_your_own_adapter.md +++ b/tutorials/guides/mellea_build_your_own_adapter.md @@ -1,4 +1,4 @@ -# Bring Your Own Adapter with Mellea +# Build Your Own Adapter with Mellea This guide explains how to configure your own adapter with Mellea to be used by Granite Switch model. @@ -6,7 +6,7 @@ This guide explains how to configure your own adapter with Mellea to be used by Together, Mellea + Granite Switch + vLLM provide a production-ready inference stack for adapter-based AI applications that can utilize custom adapters. - See [Mellea With Granite Switch](mellea_with_granite_switch.md) for a detailed explanation of how granite-switch and Mellea work together. -- See [Bring Your Own Adapter](build_your_own_adapter.md) for info on how to train your own adapter. +- See [Build Your Own Adapter](build_your_own_adapter.md) for info on how to train your own adapter. - See Mellea's [Lora and aLoRA adapters](https://docs.mellea.ai/advanced/lora-and-alora-adapters) for info on how to train your own custom adapters using Mellea. ## Prerequisites @@ -61,7 +61,7 @@ out, _ = mfuncs.act( ) # Adapter / Intrinsic processing in Mellea utilizes the io.yaml format forcing the output -# to be a json. See the "Bring Your Own Adapter" linked example above. +# to be a json. See the "Build Your Own Adapter" linked example above. result = json.loads(str(out)) print(result) ``` diff --git a/tutorials/guides/mellea_with_granite_switch.md b/tutorials/guides/mellea_with_granite_switch.md index e9946fd..2a8850c 100644 --- a/tutorials/guides/mellea_with_granite_switch.md +++ b/tutorials/guides/mellea_with_granite_switch.md @@ -247,7 +247,7 @@ print(f"Citations: {citations}") ## Next Steps - **[Hello Adapter](../notebooks/hello_adapter.ipynb)** - Minimal embedded-adapter invocation via the HuggingFace backend -- **[Bring Your Own Adapter](build_your_own_adapter.md)** - Train a custom adapter and compose it in +- **[Build Your Own Adapter](build_your_own_adapter.md)** - Train a custom adapter and compose it in - **[Compare Inference Throughput](compare_inference_throughput.md)** - Benchmark ALORA vs LoRA on a 6-step RAG pipeline - **[Mellea Repository](https://github.com/generative-computing/mellea)** - Full documentation - **[Granite Models](https://huggingface.co/ibm-granite)**