Skip to content

RoieMarciano/wayback-endpoint-discovery-tool

Repository files navigation

Wayback Machine Endpoint Discovery Tool

A Python tool that discovers endpoints from historical Wayback Machine snapshots and identifies endpoints that no longer exist on the current website.

Features

  • Queries Wayback Machine for historical snapshots (from 2019 to 3 months ago by default)
  • Filters snapshots with minimum days spacing (10 days by default) for better time distribution
  • Extracts endpoints using multiple detection methods:
    • JavaScript file analysis (fetch, axios, XMLHttpRequest patterns)
    • HTML link/URL extraction (matching API patterns)
    • Network request pattern analysis
  • Focuses on security-relevant endpoints (auth, admin, API, etc.)
  • Compares historical endpoints with current website
  • Identifies deprecated endpoints that existed in the past but don't exist now
  • Ranks results by frequency and recency

Installation

  1. Install Python dependencies:
pip install -r requirements.txt

Usage

Basic usage:

python waybackSearch.py https://example.com

With custom options:

python waybackSearch.py https://example.com --from-year 2020 --months 6 --min-days 15 --limit 100

Arguments

  • url: Target website URL to analyze (required)
  • --from-year: Start year for snapshot search (default: 2019)
  • --months: Minimum months back from current date to search (default: 3)
  • --min-days: Minimum days between snapshots to analyze (default: 10)
  • --limit: Maximum number of snapshots to analyze (default: 50)
  • --delay: Delay in seconds between requests (default: 0.5)

Output

The tool outputs:

  • Total number of historical endpoints found
  • Number of current endpoints
  • List of deprecated endpoints with:
    • Endpoint URL
    • Frequency (how many snapshots contained it)
    • First and last seen dates
    • List of snapshot dates

Example

$ python waybackSearch.py https://api.example.com

Starting analysis for: https://api.example.com
Searching snapshots from 2019 to 3 months ago...
Minimum days between snapshots: 10

[Step 1/3] Analyzing current website...
Fetching current website content...
Extracting endpoints from HTML...
Fetching JavaScript files...
Found 15 current endpoints

[Step 2/3] Fetching historical snapshots from Wayback Machine...
Found 42 snapshots (after filtering by minimum days spacing)
Analyzing 42 snapshots...

[Step 3/3] Extracting endpoints from historical snapshots...
  [1/42] Processing snapshot from 2024-01-15... Found 8 endpoints
  [2/42] Processing snapshot from 2024-02-01... Found 12 endpoints
  ...

================================================================================
WAYBACK MACHINE ENDPOINT ANALYSIS
================================================================================

Target URL: https://api.example.com

Analysis Statistics:
  - Total historical endpoints found: 28
  - Current endpoints found: 15
  - Deprecated endpoints (no longer exist): 13
  - Still active endpoints: 15

================================================================================
DEPRECATED ENDPOINTS (Found in history, not in current site)
================================================================================

[1] https://api.example.com/v1/users/delete
    Frequency: Found in 8 snapshot(s)
    First seen: 2024-01-15
    Last seen: 2024-03-20
    Snapshots: 2024-01-15, 2024-01-22, 2024-02-01, 2024-02-10, 2024-02-18
              ... and 3 more

How It Works

  1. Current Analysis: Fetches the current website and extracts all endpoints using multiple detection methods.

  2. Historical Snapshot Retrieval: Queries the Wayback Machine CDX API to find snapshots from the specified year (default: 2019) to the specified months back (default: 3 months).

  3. Snapshot Filtering: Filters snapshots to ensure minimum days spacing (default: 10 days) between them for better time distribution and efficiency.

  4. Historical Endpoint Extraction: For each snapshot, extracts endpoints from:

    • HTML content (links, forms, script tags)
    • JavaScript code (fetch calls, axios requests, etc.)
    • Network request patterns
  5. Comparison: Compares historical endpoints with current endpoints to identify deprecated ones.

  6. Ranking: Ranks deprecated endpoints by frequency (how often they appeared) and recency.

Important Notes

Responsible Use

  • This tool uses the Wayback Machine CDX API and archived content
  • Please respect rate limits and use responsibly
  • The tool includes a default 0.5 second delay between requests
  • Adjust with --delay flag if you encounter rate limiting
  • Review Wayback Machine Terms of Service

Rate Limiting

  • Default delay: 0.5 seconds between requests
  • If you encounter 429 (Too Many Requests) errors, increase the delay
  • Example: --delay 1.0 for 1 second delay
  • The tool automatically handles rate limit responses and retries

Ethical Use

  • Only use on websites you have permission to test
  • Respect website owners' privacy
  • Use findings responsibly and ethically
  • Consider responsible disclosure if you find security issues

Limitations

  • Wayback Machine may not have snapshots for all websites
  • Some endpoints may be dynamically generated and not visible in static snapshots
  • Rate limiting may affect the number of snapshots that can be analyzed
  • JavaScript-heavy sites may require additional analysis methods

License

This tool is provided as-is for security research and educational purposes.

About

Discover deprecated endpoints from historical Wayback Machine snapshots

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages