diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 000000000..4efefc311 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,38 @@ +# Contributing to Agent Browser + +We love your input! We want to make contributing as easy and transparent as possible. + +## Pull Requests + +1. Fork the repo and create your branch from `main`. +2. If you've added code that should be tested, add tests. +3. Ensure the test suite passes. +4. Issue that pull request! + +## Development Setup + +```bash +git clone https://github.com/dextonai/agent-browser.git +cd agent-browser +npm install +npm run dev +``` + +## Code Style + +- Use TypeScript for all new code +- Follow existing patterns in the codebase +- Write meaningful commit messages +- Add comments for complex logic + +## Report Bugs + +Report bugs by opening a new issue. Include: +- A clear description +- Steps to reproduce +- Expected vs actual behavior +- Screenshots if applicable + +## License + +By contributing, you agree that your contributions will be licensed under the MIT License. diff --git a/README.md b/README.md index 8ede376c9..d0490a660 100644 --- a/README.md +++ b/README.md @@ -1,1322 +1,36 @@ -# agent-browser +# Agent Browser -Browser automation CLI for AI agents. Fast native Rust CLI. +[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE) +[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](CONTRIBUTING.md) +[![GitHub Stars](https://img.shields.io/github/stars/dextonai/agent-browser?style=social)](https://github.com/dextonai/agent-browser) -## Installation +A browser automation agent powered by AI. -### Global Installation (recommended) +## Features -Installs the native Rust binary: - -```bash -npm install -g agent-browser -agent-browser install # Download Chrome from Chrome for Testing (first time only) -``` - -### Project Installation (local dependency) - -For projects that want to pin the version in `package.json`: - -```bash -npm install agent-browser -agent-browser install -``` - -Then use via `package.json` scripts or by invoking `agent-browser` directly. - -### Homebrew (macOS) - -```bash -brew install agent-browser -agent-browser install # Download Chrome from Chrome for Testing (first time only) -``` - -### Cargo (Rust) - -```bash -cargo install agent-browser -agent-browser install # Download Chrome from Chrome for Testing (first time only) -``` - -### From Source - -```bash -git clone https://github.com/vercel-labs/agent-browser -cd agent-browser -pnpm install -pnpm build -pnpm build:native # Requires Rust (https://rustup.rs) -pnpm link --global # Makes agent-browser available globally -agent-browser install -``` - -### Linux Dependencies - -On Linux, install system dependencies: - -```bash -agent-browser install --with-deps -``` - -### Updating - -Upgrade to the latest version: - -```bash -agent-browser upgrade -``` - -Detects your installation method (npm, Homebrew, or Cargo) and runs the appropriate update command automatically. - -### Requirements - -- **Chrome** - Run `agent-browser install` to download Chrome from [Chrome for Testing](https://developer.chrome.com/blog/chrome-for-testing/) (Google's official automation channel). Existing Chrome, Brave, Playwright, and Puppeteer installations are detected automatically. No Playwright or Node.js required for the daemon. -- **Rust** - Only needed when building from source (see From Source above). +- 🤖 AI-powered browser automation +- 🌐 Web interaction and scraping +- 📊 Data extraction and analysis +- 🔒 Secure and sandboxed ## Quick Start ```bash -agent-browser open example.com -agent-browser snapshot # Get accessibility tree with refs -agent-browser click @e2 # Click by ref from snapshot -agent-browser fill @e3 "test@example.com" # Fill by ref -agent-browser get text @e1 # Get text by ref -agent-browser screenshot page.png -agent-browser close -``` - -### Traditional Selectors (also supported) - -```bash -agent-browser click "#submit" -agent-browser fill "#email" "test@example.com" -agent-browser find role button click --name "Submit" -``` - -## Commands - -### Core Commands - -```bash -agent-browser open # Navigate to URL (aliases: goto, navigate) -agent-browser click # Click element (--new-tab to open in new tab) -agent-browser dblclick # Double-click element -agent-browser focus # Focus element -agent-browser type # Type into element -agent-browser fill # Clear and fill -agent-browser press # Press key (Enter, Tab, Control+a) (alias: key) -agent-browser keyboard type # Type with real keystrokes (no selector, current focus) -agent-browser keyboard inserttext # Insert text without key events (no selector) -agent-browser keydown # Hold key down -agent-browser keyup # Release key -agent-browser hover # Hover element -agent-browser select # Select dropdown option -agent-browser check # Check checkbox -agent-browser uncheck # Uncheck checkbox -agent-browser scroll [px] # Scroll (up/down/left/right, --selector ) -agent-browser scrollintoview # Scroll element into view (alias: scrollinto) -agent-browser drag # Drag and drop -agent-browser upload # Upload files -agent-browser screenshot [path] # Take screenshot (--full for full page, saves to a temporary directory if no path) -agent-browser screenshot --annotate # Annotated screenshot with numbered element labels -agent-browser screenshot --screenshot-dir ./shots # Save to custom directory -agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80 -agent-browser pdf # Save as PDF -agent-browser snapshot # Accessibility tree with refs (best for AI) -agent-browser eval # Run JavaScript (-b for base64, --stdin for piped input) -agent-browser connect # Connect to browser via CDP -agent-browser stream enable [--port ] # Start runtime WebSocket streaming -agent-browser stream status # Show runtime streaming state and bound port -agent-browser stream disable # Stop runtime WebSocket streaming -agent-browser close # Close browser (aliases: quit, exit) -agent-browser close --all # Close all active sessions -``` - -### Get Info - -```bash -agent-browser get text # Get text content -agent-browser get html # Get innerHTML -agent-browser get value # Get input value -agent-browser get attr # Get attribute -agent-browser get title # Get page title -agent-browser get url # Get current URL -agent-browser get cdp-url # Get CDP WebSocket URL (for DevTools, debugging) -agent-browser get count # Count matching elements -agent-browser get box # Get bounding box -agent-browser get styles # Get computed styles -``` - -### Check State - -```bash -agent-browser is visible # Check if visible -agent-browser is enabled # Check if enabled -agent-browser is checked # Check if checked -``` - -### Find Elements (Semantic Locators) - -```bash -agent-browser find role [value] # By ARIA role -agent-browser find text # By text content -agent-browser find label