Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
391 changes: 201 additions & 190 deletions LICENSE

Large diffs are not rendered by default.

56 changes: 19 additions & 37 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
# TABStack AI Python SDK
# Tabstack Python SDK

[![PyPI version](https://badge.fury.io/py/tabstack.svg)](https://badge.fury.io/py/tabstack)
[![Python Versions](https://img.shields.io/pypi/pyversions/tabstack.svg)](https://pypi.org/project/tabstack/)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Tests](https://github.com/Mozilla-Ocho/tabstack-python/workflows/Tests/badge.svg)](https://github.com/Mozilla-Ocho/tabstack-python/actions)
[![codecov](https://codecov.io/gh/Mozilla-Ocho/tabstack-python/branch/main/graph/badge.svg)](https://codecov.io/gh/Mozilla-Ocho/tabstack-python)

Python SDK for [TABStack AI](https://tabstack.ai) - Extract, Generate, and Automate web content using AI.
> [!WARNING]
> **Early Release**: This SDK is in early development. The API may change in future releases as we refine and improve the library based on user feedback.

Python SDK for [Tabstack](https://tabstack.ai) - Extract, Generate, and Automate web content using AI.

## Features

Expand Down Expand Up @@ -58,11 +61,11 @@ pip install -e ".[dev]"
```python
import asyncio
import os
from tabstack import TABStack
from tabstack import Tabstack

async def main():
# Initialize the client with connection pooling
async with TABStack(
async with Tabstack(
api_key=os.getenv('TABSTACK_API_KEY'),
max_connections=100,
max_keepalive_connections=20
Expand Down Expand Up @@ -124,7 +127,7 @@ async def main():
)

# Automate web tasks (streaming)
async for event in tabs.automate.execute(
async for event in tabs.agent.automate(
task="Find the top 3 trending repositories and extract their details",
url="https://github.com/trending"
):
Expand All @@ -144,9 +147,9 @@ All methods are async and should be awaited. The client supports async context m
### Client Initialization

```python
from tabstack import TABStack
from tabstack import Tabstack

async with TABStack(
async with Tabstack(
api_key="your-api-key",
base_url="https://api.tabstack.ai/", # optional
max_connections=100, # optional
Expand All @@ -159,7 +162,7 @@ async with TABStack(
```

**Parameters:**
- `api_key` (str, required): Your TABStack API key
- `api_key` (str, required): Your Tabstack API key
- `base_url` (str, optional): API base URL. Default: `https://api.tabstack.ai/`
- `max_connections` (int, optional): Maximum concurrent connections. Default: `100`
- `max_keepalive_connections` (int, optional): Maximum idle connections to keep alive. Default: `20`
Expand Down Expand Up @@ -191,27 +194,6 @@ print(result.content)
print(result.metadata.title)
```

#### `extract.schema(url, instructions, nocache=False)`

Generate a JSON Schema by analyzing the structure of a webpage.

**Parameters:**
- `url` (str): URL to analyze
- `instructions` (str): Instructions for what data to extract (max 1000 characters)
- `nocache` (bool): Bypass cache. Default: `False`

**Returns:** `SchemaResponse` with generated `schema` dict

**Example:**
```python
result = await tabs.extract.schema(
url="https://example.com/products",
instructions="Extract product listings with name, price, and availability"
)
# Use the schema for extraction
data = await tabs.extract.json(url="https://example.com/products", schema=result.schema)
```

#### `extract.json(url, schema, nocache=False)`

Extract structured JSON data from a URL using a schema.
Expand Down Expand Up @@ -269,11 +251,11 @@ result = await tabs.generate.json(
)
```

### Automate Operator
### Agent Client

The Automate operator executes complex web automation tasks using natural language.
The Agent client executes complex web automation tasks using natural language.

#### `automate.execute(task, url=None, schema=None)`
#### `agent.automate(task, url=None, schema=None)`

Execute an AI-powered browser automation task (returns async iterator for Server-Sent Events).

Expand Down Expand Up @@ -305,7 +287,7 @@ schema = {
}
}

async for event in tabs.automate.execute(
async for event in tabs.agent.automate(
task="Find trending repositories and extract their names and star counts",
url="https://github.com/trending",
schema=schema
Expand All @@ -318,7 +300,7 @@ async for event in tabs.automate.execute(

## Working with JSON Schemas

TABStack uses standard JSON Schema for defining data structures. Here are common patterns:
Tabstack uses standard JSON Schema for defining data structures. Here are common patterns:

### Basic Object
```python
Expand Down Expand Up @@ -400,7 +382,7 @@ The SDK provides specific exception classes for different error scenarios:

```python
import asyncio
from tabstack import TABStack
from tabstack import Tabstack
from tabstack.exceptions import (
BadRequestError,
UnauthorizedError,
Expand All @@ -410,7 +392,7 @@ from tabstack.exceptions import (
)

async def main():
async with TABStack(api_key="your-api-key") as tabs:
async with Tabstack(api_key="your-api-key") as tabs:
try:
result = await tabs.extract.markdown(url="https://example.com")
except UnauthorizedError:
Expand Down Expand Up @@ -476,7 +458,7 @@ mypy tabstack/
```
tests/
├── conftest.py # Shared pytest fixtures
├── test_client.py # TABStack client tests
├── test_client.py # Tabstack client tests
├── test_extract.py # Extract operator tests
├── test_generate.py # Generate operator tests
├── test_automate.py # Automate operator tests
Expand Down
41 changes: 10 additions & 31 deletions examples/basic_usage.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
"""Example usage of TABStack AI SDK."""
"""Example usage of Tabstack SDK."""

import asyncio
import os

from tabstack import TABStack
from tabstack import Tabstack


async def main():
"""Run all examples."""
# Initialize the client with connection pooling
async with TABStack(
async with Tabstack(
api_key=os.getenv("TABSTACK_API_KEY", "your-api-key-here"),
max_connections=50,
max_keepalive_connections=10,
Expand All @@ -29,27 +29,8 @@ async def main():

print("\n")

# Example 2: Generate schema from URL
print("Example 2: Generate Schema")
print("-" * 50)
try:
result = await tabs.extract.schema(
url="https://news.ycombinator.com",
instructions="extract top stories with title, points, and author",
)
# result.schema is a JSON Schema dict that can be used directly
print(f"Generated schema: {result.schema}")
# You can now use this schema directly with extract.json()
# data = await tabs.extract.json(
# url="https://news.ycombinator.com", schema=result.schema
# )
except Exception as e:
print(f"Error: {e}")

print("\n")

# Example 3: Extract structured JSON data
print("Example 3: Extract Structured JSON")
# Example 2: Extract structured JSON data
print("Example 2: Extract Structured JSON")
print("-" * 50)
try:
schema = {
Expand All @@ -76,8 +57,8 @@ async def main():

print("\n")

# Example 4: Generate transformed content with AI
print("Example 4: Generate Transformed Content")
# Example 3: Generate transformed content with AI
print("Example 3: Generate Transformed Content")
print("-" * 50)
try:
summary_schema = {
Expand Down Expand Up @@ -109,15 +90,13 @@ async def main():

print("\n")

# Example 5: Automate web tasks (streaming)
print("Example 5: Web Automation (Streaming)")
# Example 4: Automate web tasks (streaming)
print("Example 4: Web Automation (Streaming)")
print("-" * 50)
try:
async for event in tabs.automate.execute(
async for event in tabs.agent.automate(
task="Find the top 3 trending repositories and extract their details",
url="https://github.com/trending",
guardrails="browse and extract only, don't interact with repositories",
max_iterations=20,
):
if event.type == "task:completed":
print(f"✓ Task completed: {event.data.get('finalAnswer', 'N/A')}")
Expand Down
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@ build-backend = "setuptools.build_meta"
[project]
name = "tabstack"
version = "1.0.0"
description = "Python SDK for TABStack AI - Extract, Generate, and Automate web content"
description = "Python SDK for Tabstack - Extract, Generate, and Automate web content"
readme = "README.md"
requires-python = ">=3.10"
license = {text = "Apache-2.0"}
authors = [
{name = "TABStack", email = "support@tabstack.ai"}
{name = "Tabstack", email = "support@tabstack.ai"}
]
keywords = ["web-scraping", "ai", "automation", "data-extraction", "web-automation"]
classifiers = [
Expand Down
4 changes: 2 additions & 2 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
[metadata]
name = tabstack-ai
version = 1.0.0
description = Python SDK for TABStack AI - Extract, Generate, and Automate web content
description = Python SDK for Tabstack - Extract, Generate, and Automate web content
long_description = file: README.md
long_description_content_type = text/markdown
url = https://github.com/Mozilla-Ocho/tabstack-python
author = TABStack
author = Tabstack
author_email = support@tabstack.ai
license = MIT
classifiers =
Expand Down
6 changes: 3 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,14 @@
with open("README.md", encoding="utf-8") as f:
long_description = f.read()
except FileNotFoundError:
long_description = "Python SDK for TABStack AI"
long_description = "Python SDK for Tabstack"

setup(
name="tabstack-ai",
version="1.0.0",
author="TABStack",
author="Tabstack",
author_email="support@tabstack.ai",
description="Python SDK for TABStack AI - Extract, Generate, and Automate web content",
description="Python SDK for Tabstack - Extract, Generate, and Automate web content",
long_description=long_description,
long_description_content_type="text/markdown",
url="https://github.com/Mozilla-Ocho/tabstack-python",
Expand Down
Loading
Loading