A Python tool that discovers endpoints from historical Wayback Machine snapshots and identifies endpoints that no longer exist on the current website.
- Queries Wayback Machine for historical snapshots (from 2019 to 3 months ago by default)
- Filters snapshots with minimum days spacing (10 days by default) for better time distribution
- Extracts endpoints using multiple detection methods:
- JavaScript file analysis (fetch, axios, XMLHttpRequest patterns)
- HTML link/URL extraction (matching API patterns)
- Network request pattern analysis
- Focuses on security-relevant endpoints (auth, admin, API, etc.)
- Compares historical endpoints with current website
- Identifies deprecated endpoints that existed in the past but don't exist now
- Ranks results by frequency and recency
- Install Python dependencies:
pip install -r requirements.txtBasic usage:
python waybackSearch.py https://example.comWith custom options:
python waybackSearch.py https://example.com --from-year 2020 --months 6 --min-days 15 --limit 100url: Target website URL to analyze (required)--from-year: Start year for snapshot search (default: 2019)--months: Minimum months back from current date to search (default: 3)--min-days: Minimum days between snapshots to analyze (default: 10)--limit: Maximum number of snapshots to analyze (default: 50)--delay: Delay in seconds between requests (default: 0.5)
The tool outputs:
- Total number of historical endpoints found
- Number of current endpoints
- List of deprecated endpoints with:
- Endpoint URL
- Frequency (how many snapshots contained it)
- First and last seen dates
- List of snapshot dates
$ python waybackSearch.py https://api.example.com
Starting analysis for: https://api.example.com
Searching snapshots from 2019 to 3 months ago...
Minimum days between snapshots: 10
[Step 1/3] Analyzing current website...
Fetching current website content...
Extracting endpoints from HTML...
Fetching JavaScript files...
Found 15 current endpoints
[Step 2/3] Fetching historical snapshots from Wayback Machine...
Found 42 snapshots (after filtering by minimum days spacing)
Analyzing 42 snapshots...
[Step 3/3] Extracting endpoints from historical snapshots...
[1/42] Processing snapshot from 2024-01-15... Found 8 endpoints
[2/42] Processing snapshot from 2024-02-01... Found 12 endpoints
...
================================================================================
WAYBACK MACHINE ENDPOINT ANALYSIS
================================================================================
Target URL: https://api.example.com
Analysis Statistics:
- Total historical endpoints found: 28
- Current endpoints found: 15
- Deprecated endpoints (no longer exist): 13
- Still active endpoints: 15
================================================================================
DEPRECATED ENDPOINTS (Found in history, not in current site)
================================================================================
[1] https://api.example.com/v1/users/delete
Frequency: Found in 8 snapshot(s)
First seen: 2024-01-15
Last seen: 2024-03-20
Snapshots: 2024-01-15, 2024-01-22, 2024-02-01, 2024-02-10, 2024-02-18
... and 3 more-
Current Analysis: Fetches the current website and extracts all endpoints using multiple detection methods.
-
Historical Snapshot Retrieval: Queries the Wayback Machine CDX API to find snapshots from the specified year (default: 2019) to the specified months back (default: 3 months).
-
Snapshot Filtering: Filters snapshots to ensure minimum days spacing (default: 10 days) between them for better time distribution and efficiency.
-
Historical Endpoint Extraction: For each snapshot, extracts endpoints from:
- HTML content (links, forms, script tags)
- JavaScript code (fetch calls, axios requests, etc.)
- Network request patterns
-
Comparison: Compares historical endpoints with current endpoints to identify deprecated ones.
-
Ranking: Ranks deprecated endpoints by frequency (how often they appeared) and recency.
- This tool uses the Wayback Machine CDX API and archived content
- Please respect rate limits and use responsibly
- The tool includes a default 0.5 second delay between requests
- Adjust with
--delayflag if you encounter rate limiting - Review Wayback Machine Terms of Service
- Default delay: 0.5 seconds between requests
- If you encounter 429 (Too Many Requests) errors, increase the delay
- Example:
--delay 1.0for 1 second delay - The tool automatically handles rate limit responses and retries
- Only use on websites you have permission to test
- Respect website owners' privacy
- Use findings responsibly and ethically
- Consider responsible disclosure if you find security issues
- Wayback Machine may not have snapshots for all websites
- Some endpoints may be dynamically generated and not visible in static snapshots
- Rate limiting may affect the number of snapshots that can be analyzed
- JavaScript-heavy sites may require additional analysis methods
This tool is provided as-is for security research and educational purposes.