Skip to content
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions lib/crewai-tools/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,9 @@ qdrant-client = [
apify = [
"langchain-apify>=0.1.2,<1.0.0",
]
airweave = [
"airweave-sdk>=0.1.0",
]

databricks-sdk = [
"databricks-sdk>=0.46.0",
Expand Down
6 changes: 6 additions & 0 deletions lib/crewai-tools/src/crewai_tools/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@
from crewai_tools.aws.s3.reader_tool import S3ReaderTool
from crewai_tools.aws.s3.writer_tool import S3WriterTool
from crewai_tools.tools.ai_mind_tool.ai_mind_tool import AIMindTool
from crewai_tools.tools.airweave_tool.airweave_advanced_search_tool import (
AirweaveAdvancedSearchTool,
)
from crewai_tools.tools.airweave_tool.airweave_search_tool import AirweaveSearchTool
from crewai_tools.tools.apify_actors_tool.apify_actors_tool import ApifyActorsTool
from crewai_tools.tools.arxiv_paper_tool.arxiv_paper_tool import ArxivPaperTool
from crewai_tools.tools.brave_search_tool.brave_search_tool import BraveSearchTool
Expand Down Expand Up @@ -196,6 +200,8 @@

__all__ = [
"AIMindTool",
"AirweaveAdvancedSearchTool",
"AirweaveSearchTool",
"ApifyActorsTool",
"ArxivPaperTool",
"BedrockInvokeAgentTool",
Expand Down
361 changes: 361 additions & 0 deletions lib/crewai-tools/src/crewai_tools/tools/airweave_tool/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,361 @@
# Airweave Search Tools

Search across all your connected data sources (Stripe, GitHub, Notion, Slack, and 50+ more) using Airweave's unified search API.

## Installation

```bash
pip install 'crewai-tools[airweave]'
```

Or install the SDK directly:

```bash
pip install airweave-sdk
```

## Setup

### 1. Get your API key

Sign up at [https://app.airweave.ai](https://app.airweave.ai) and get your API key.

### 2. Set environment variable

```bash
export AIRWEAVE_API_KEY="your_api_key_here"
```

### 3. Create a collection and connect data sources

Through the Airweave dashboard:
1. Create a new collection
2. Add source connections (Stripe, GitHub, Notion, etc.)
3. Wait for initial sync to complete
4. Copy your collection's `readable_id`

## Tools

### AirweaveSearchTool

Basic search tool for straightforward queries. Mirrors the `client.collections.search()` method from the Airweave Python SDK.

**When to use:**
- Simple searches without filtering
- Quick lookups across all data sources
- When you don't need advanced features

**Example:**

```python
from crewai import Agent, Task, Crew
from crewai_tools import AirweaveSearchTool

# Initialize the tool
search_tool = AirweaveSearchTool(
collection_id="my-collection-id"
)

# Create an agent with the tool
agent = Agent(
role="Data Analyst",
goal="Find information from connected data sources",
tools=[search_tool],
verbose=True
)

# Create a task
task = Task(
description="Find all failed payments from the last month",
agent=agent,
expected_output="List of failed payments with customer details"
)

# Run the crew
crew = Crew(agents=[agent], tasks=[task])
result = crew.kickoff()
```

**Get AI-generated answers:**

```python
from crewai import Task

task = Task(
description="""
What are the most common customer complaints this month?
Use response_type='completion' to get an AI-generated summary.
""",
agent=agent
)
```

### AirweaveAdvancedSearchTool

Advanced search tool with filtering, reranking, and fine-tuned control. Mirrors the `client.collections.search_advanced()` method from the Airweave Python SDK.

**When to use:**
- Filter by specific data sources
- Prioritize recent results
- Set minimum relevance scores
- Enable AI reranking for better results
- Choose specific search methods (hybrid/neural/keyword)

**Example:**

```python
from crewai_tools import AirweaveAdvancedSearchTool

# Initialize with advanced options
advanced_tool = AirweaveAdvancedSearchTool(
collection_id="my-collection-id"
)

agent = Agent(
role="Customer Support Analyst",
goal="Find recent customer issues from specific sources",
tools=[advanced_tool]
)

task = Task(
description="""
Find customer complaints about billing from Zendesk in the last week.
Use these parameters:
- source_filter: 'Zendesk'
- recency_bias: 0.8
- enable_reranking: True
- score_threshold: 0.7
""",
agent=agent
)
```

## Parameters

### AirweaveSearchTool Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `query` | str | Required | Search query to find relevant information |
| `limit` | int | 10 | Maximum number of results (1-100) |
| `response_type` | str | "raw" | "raw" for search results, "completion" for AI answer |
| `recency_bias` | float | 0.0 | Weight for recent results (0.0-1.0) |

### AirweaveAdvancedSearchTool Parameters

All basic parameters plus:

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `source_filter` | str | None | Filter by specific source (e.g., "Stripe", "GitHub") |
| `score_threshold` | float | None | Minimum similarity score (0.0-1.0) |
| `recency_bias` | float | 0.3 | Weight for recent results (0.0-1.0) |
| `enable_reranking` | bool | True | Enable AI reranking for better relevance |
| `search_method` | str | "hybrid" | "hybrid", "neural", or "keyword" |

### Tool Configuration (Constructor)

Both tools accept these configuration parameters:

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `collection_id` | str | Required | Your collection's readable ID |
| `base_url` | str | None | Custom API URL for self-hosted instances |
| `max_content_length` | int | 300 | Max characters to show per result |

## What Can Airweave Search?

Airweave connects to 50+ data sources including:

### Finance & Billing
- Stripe
- QuickBooks
- Chargebee

### Development Tools
- GitHub
- GitLab
- Jira
- Linear

### Collaboration
- Slack
- Microsoft Teams
- Discord

### Productivity
- Notion
- Google Drive
- Confluence
- Dropbox

### CRM & Support
- Salesforce
- HubSpot
- Zendesk
- Intercom

### And many more...

[See full list of integrations →](https://docs.airweave.ai/integrations)

## Features

✅ **Unified Search** - Search across all data sources with one query
✅ **Semantic Search** - Natural language understanding
✅ **Hybrid Search** - Combines vector and keyword search
✅ **AI Reranking** - Improves result relevance (Advanced)
✅ **AI Answers** - Get generated answers via `response_type="completion"`
✅ **Recency Bias** - Prioritize recent results
✅ **Source Filtering** - Search specific sources only (Advanced)
✅ **Score Threshold** - Filter by relevance (Advanced)
✅ **Incremental Sync** - Data stays automatically updated
✅ **Multi-tenant** - Each user has their own collections

## Environment Variables

| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `AIRWEAVE_API_KEY` | Yes | - | Your Airweave API key |

## Response Types

### Raw Results (`response_type="raw"`)

Returns structured search results with:
- Content snippets
- Similarity scores
- Source information
- Entity IDs
- Creation timestamps
- URLs (when available)

### AI Completion (`response_type="completion"`)

Returns an AI-generated natural language answer based on the retrieved documents. Use this when you want:
- Summarized information
- Direct answers to questions
- Synthesized insights from multiple sources

## Search Methods (Advanced Tool)

### Hybrid (Default)
Combines semantic (neural) and keyword (BM25) search for best results.

### Neural
Pure semantic search using embeddings. Best for conceptual queries.

### Keyword
Traditional keyword search. Best for exact term matching.

## Use Cases

### Customer Support Agent
```python
agent = Agent(
role="Customer Support Specialist",
goal="Find and resolve customer issues",
tools=[AirweaveAdvancedSearchTool(collection_id="support-data")],
backstory="Expert at finding relevant customer data across Zendesk, Slack, and email"
)
```

### Sales Intelligence Agent
```python
agent = Agent(
role="Sales Intelligence Analyst",
goal="Research accounts and opportunities",
tools=[AirweaveSearchTool(collection_id="sales-data")],
backstory="Analyzes data from Salesforce, HubSpot, and LinkedIn"
)
```

### Financial Analysis Agent
```python
agent = Agent(
role="Financial Analyst",
goal="Analyze payment trends and issues",
tools=[AirweaveAdvancedSearchTool(collection_id="finance-data")],
backstory="Tracks payments, invoices, and transactions from Stripe and QuickBooks"
)
```

### Technical Documentation Agent
```python
agent = Agent(
role="Documentation Specialist",
goal="Answer technical questions from internal docs",
tools=[AirweaveSearchTool(collection_id="docs-collection")],
backstory="Searches through Notion, Confluence, and GitHub repos"
)
```

## Error Handling

The tools handle errors gracefully and return clear error messages:

- **Missing API key** - Clear instruction on how to set it
- **Collection not found** - Verifies collection exists
- **No results** - Suggests rephrasing query
- **API errors** - Returns error details for debugging

## Advanced Filtering Examples

### Filter by Multiple Sources (Future Enhancement)

Currently supports single source filtering. For multiple sources, create filter using SDK types:

```python
from airweave import Filter, FieldCondition, MatchAny

# In your agent's task description, specify the filter logic
# The tool currently supports single source_filter parameter
```

### Combine Multiple Filters

For complex filtering needs beyond single source, consider using the Airweave SDK directly within a custom tool or use multiple search calls.

## Best Practices

1. **Start with Basic Search** - Use `AirweaveSearchTool` for most queries
2. **Use Advanced When Needed** - Switch to `AirweaveAdvancedSearchTool` when you need filtering or reranking
3. **Set Appropriate Limits** - Higher limits for comprehensive searches, lower for quick lookups
4. **Enable Reranking for Complex Queries** - Better results for nuanced questions
5. **Use Recency Bias for Time-Sensitive Data** - Great for support tickets, recent transactions
6. **Choose Response Type Based on Need** - "raw" for structured data, "completion" for answers
7. **Filter by Source for Focused Searches** - Reduces noise when you know the source

## Troubleshooting

### "No results found"
- Check that your collection has data synced
- Verify sync jobs completed successfully
- Try broader search terms
- Lower score_threshold if using advanced search

### "Unable to generate an answer"
- Ensure you have relevant data in your collection
- Try response_type="raw" to see what results are available
- Rephrase your query to be more specific

### Import errors
```bash
pip install --upgrade airweave-sdk crewai-tools
```

## Learn More

- [Airweave Documentation](https://docs.airweave.ai)
- [API Reference](https://docs.airweave.ai/api-reference)
- [Python SDK](https://github.com/airweave-ai/python-sdk)
- [Get API Key](https://app.airweave.ai)
- [CrewAI Documentation](https://docs.crewai.com)

## Support

- Discord: [Join Airweave Community](https://discord.gg/airweave)
- Email: support@airweave.ai
- GitHub Issues: [airweave-ai/python-sdk](https://github.com/airweave-ai/python-sdk/issues)


Loading