spidra-ruby

Official Ruby SDK for the Spidra web scraping and crawling API. Scrape pages, run browser actions, batch-process URLs, and crawl entire sites — all from Ruby, with no external dependencies.

Installation

gem install spidra

Or add it to your Gemfile:

gem "spidra"

Requires Ruby 2.7 or higher.

Quick start

require "spidra"

client = Spidra.new(ENV["SPIDRA_API_KEY"])

job = client.scrape.run(
  { urls: [{ url: "https://example.com/pricing" }],
    prompt: "Extract all pricing plans with name, price, and features",
    output: "json" }
)

puts job["content"]

Get your API key from app.spidra.io under Settings → API Keys.

Scraping

scrape.run

Submit a job and wait for it to finish. Returns the full result.

job = client.scrape.run(
  urls:   [{ url: "https://example.com" }],
  prompt: "Extract the main headline and subheading"
)

puts job["content"]

Pass poll_interval: and timeout: as keyword arguments to control how long it waits:

job = client.scrape.run(
  { urls: [{ url: "https://example.com" }], prompt: "..." },
  poll_interval: 5,
  timeout: 60
)

On timeout, run returns { "status" => "timeout", "jobId" => "..." } so you can keep polling with scrape.get.

scrape.submit and scrape.get

Fire and forget — submit a job and check status yourself.

response = client.scrape.submit(
  urls:   [{ url: "https://example.com" }],
  prompt: "Extract the main headline"
)
job_id = response["jobId"]

# Later...
status = client.scrape.get(job_id)
puts status["content"] if status["status"] == "completed"

Scrape parameters

Parameter	Type	Description
`urls`	Array	Up to 3 entries. Each is `{ url: "..." }` with optional `actions:`
`prompt`	String	What to extract, in plain English
`output`	String	`"markdown"` (default) or `"json"`
`schema`	Hash	JSON Schema to enforce a specific output shape
`use_proxy`	Boolean	Route through a residential proxy
`proxy_country`	String	Two-letter country code, e.g. `"us"`, `"de"`, `"jp"`
`extract_content_only`	Boolean	Strip nav, ads, and boilerplate before extraction
`screenshot`	Boolean	Capture a viewport screenshot
`full_page_screenshot`	Boolean	Capture a full-page screenshot
`cookies`	String	Raw `Cookie` header for authenticated pages

Browser actions

Pass an actions: array inside a URL entry to interact with the page before extraction runs.

job = client.scrape.run(
  urls: [
    {
      url:     "https://example.com/products",
      actions: [
        { type: "click",  selector: "#accept-cookies" },
        { type: "wait",   duration: 1000 },
        { type: "scroll", to: "80%" }
      ]
    }
  ],
  prompt: "Extract all product names and prices"
)

Batch scraping

Submit up to 50 URLs in one request. They all run in parallel.

batch = client.batch.run(
  { urls: [
      "https://shop.example.com/product/1",
      "https://shop.example.com/product/2",
      "https://shop.example.com/product/3"
    ],
    prompt: "Extract product name, price, and stock status",
    output: "json" }
)

puts "#{batch["completedCount"]}/#{batch["totalUrls"]} completed"

batch["items"].each do |item|
  if item["status"] == "completed"
    puts item["result"].inspect
  else
    puts "Failed: #{item["url"]} — #{item["error"]}"
  end
end

batch.submit and batch.get

response = client.batch.submit(
  urls:   ["https://example.com/1", "https://example.com/2"],
  prompt: "Extract the page title"
)
batch_id = response["batchId"]

result = client.batch.get(batch_id)
puts "#{result["completedCount"]}/#{result["totalUrls"]} done"

Retry failed items

if result["failedCount"] > 0
  client.batch.retry(batch_id)
end

Cancel a batch

client.batch.cancel(batch_id)

List past batches

page = client.batch.list(1, 20) # page, limit

page["jobs"].each do |job|
  puts "#{job["uuid"]} #{job["status"]} — #{job["completedCount"]}/#{job["totalUrls"]}"
end

Crawling

job = client.crawl.run(
  { base_url:               "https://competitor.com/blog",
    crawl_instruction:      "Follow blog post links only — skip tag and category pages",
    transform_instruction:  "Extract post title, author, publish date, and a one-sentence summary",
    max_pages:              30,
    use_proxy:              true }
)

job["result"].each do |page|
  puts "#{page["url"]}: #{page["data"].inspect}"
end

Crawl jobs often take a few minutes. The default timeout for crawl.run is 300 seconds. Adjust with timeout: n if you expect longer runs.

crawl.submit and crawl.get

response = client.crawl.submit(
  base_url:              "https://example.com/docs",
  crawl_instruction:     "Follow all documentation pages",
  transform_instruction: "Extract the page title and a short content summary",
  max_pages:             50
)
job_id = response["jobId"]

status = client.crawl.get(job_id)
# status["status"]: "waiting" | "active" | "running" | "completed" | "failed"

Downloading raw content

result = client.crawl.pages(job_id)

result["pages"].each do |page|
  puts page["url"]
  # page["html_url"]     — download the raw HTML (expires in 1 hour)
  # page["markdown_url"] — download the Markdown version
end

Re-extracting with a new prompt

result = client.crawl.extract(completed_job_id, "Extract product SKUs and prices as JSON")
new_job_id = result["jobId"]

extracted = client.crawl.get(new_job_id)

History and stats

history = client.crawl.history(1, 10)
puts "#{history["total"]} total crawl jobs"

stats = client.crawl.stats
puts "#{stats["total"]} all-time"

Logs

result = client.logs.list(
  status:     "failed",
  searchTerm: "amazon.com",
  dateStart:  "2024-01-01",
  dateEnd:    "2024-12-31",
  page:       1,
  limit:      20
)

result["logs"].each do |log|
  puts "#{log["urls"][0]["url"]} — #{log["status"]} (#{log["credits_used"]} credits)"
end

# Full detail for a single log entry
log = client.logs.get(log_uuid)
puts log["result_data"].inspect

Usage statistics

rows = client.usage.get("30d") # "7d" | "30d" | "weekly"

rows.each do |row|
  puts "#{row["date"]}: #{row["requests"]} requests, #{row["credits"]} credits"
end

Error handling

require "spidra"

begin
  job = client.scrape.run(
    urls:   [{ url: "https://example.com" }],
    prompt: "Extract the headline"
  )
rescue Spidra::AuthenticationError
  puts "Invalid or missing API key"
rescue Spidra::InsufficientCreditsError
  puts "Account is out of credits"
rescue Spidra::RateLimitError
  puts "Rate limited — slow down"
rescue Spidra::ServerError => e
  puts "Server error (#{e.status}): #{e.message}"
rescue Spidra::Error => e
  puts "API error #{e.status}: #{e.message}"
end

Exception	HTTP status	When
`Spidra::AuthenticationError`	401	Missing or invalid API key
`Spidra::InsufficientCreditsError`	403	No credits remaining
`Spidra::RateLimitError`	429	Too many requests
`Spidra::ServerError`	5xx	Unexpected server-side error
`Spidra::Error`	any	Base class for all Spidra exceptions

All exceptions expose .status (HTTP status code) and .message.

License

MIT. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
lib		lib
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Gemfile		Gemfile
LICENSE		LICENSE
README.md		README.md
spidra.gemspec		spidra.gemspec
test.rb		test.rb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spidra-ruby

Installation

Quick start

Scraping

scrape.run

scrape.submit and scrape.get

Scrape parameters

Browser actions

Batch scraping

batch.submit and batch.get

Retry failed items

Cancel a batch

List past batches

Crawling

crawl.submit and crawl.get

Downloading raw content

Re-extracting with a new prompt

History and stats

Logs

Usage statistics

Error handling

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

spidra-ruby

Installation

Quick start

Scraping

scrape.run

scrape.submit and scrape.get

Scrape parameters

Browser actions

Batch scraping

batch.submit and batch.get

Retry failed items

Cancel a batch

List past batches

Crawling

crawl.submit and crawl.get

Downloading raw content

Re-extracting with a new prompt

History and stats

Logs

Usage statistics

Error handling

License

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages