Title: PMID Batch Filtering Returns 400 Error Despite Documentation Claims
Description
The OpenAlex API returns a 400 "Invalid" error when attempting to use batch filtering with PMIDs, despite multiple sources in the documentation and community discussions indicating this should work. This affects the ability to efficiently retrieve works by PMID in bulk.
Expected Behavior
According to multiple official sources, PMID batch filtering should work with pipe-separated values:
-
[OurResearch Blog Post (Dec 21, 2022)](https://blog.ourresearch.org/fetch-multiple-dois-in-one-openalex-api-request/) states:
"This technique works with all IDs in OpenAlex, to include OpenAlex IDs and PubMed Central IDs (PMID)."
-
[Google Groups Discussion](https://groups.google.com/g/openalex-users/c/6xvPsguNM6A) where OpenAlex developer Casey states:
"This is implemented and available! You can now filter works by MAG, PMID, or PMCID."
-
[OpenAlex Community Forum](https://groups.google.com/g/openalex-community/c/5foVRPybEYM) shows example:
https://api.openalex.org/works?filter=ids.pmid:38785209|38773515
-
[openalexR Package Documentation](https://docs.ropensci.org/openalexR/) shows working example:
works_from_pmids <- oa_fetch(
entity = "works",
pmid = c("14907713", 32572199),
verbose = TRUE
)
#> Requesting url: https://api.openalex.org/works?filter=pmid:14907713|32572199
Actual Behavior
All attempts to use PMID batch filtering return a 400 error with message "Invalid", regardless of:
- Filter syntax used (
pmid: vs ids.pmid:)
- Number of PMIDs (fails even with single PMID)
- URL encoding of pipe character
- Presence of
mailto parameter
- Addition of
per-page parameter
Steps to Reproduce
import requests
# Test 1: Single PMID with pmid: filter
response = requests.get(
"https://api.openalex.org/works?filter=pmid:14907713&mailto=test@example.com"
)
print(f"Single PMID (pmid:): {response.status_code}")
print(response.json())
# Output: 400, {"HTTP_status_code": 400, "error": true, "message": "Invalid"}
# Test 2: Single PMID with ids.pmid: filter
response = requests.get(
"https://api.openalex.org/works?filter=ids.pmid:14907713&mailto=test@example.com"
)
print(f"Single PMID (ids.pmid:): {response.status_code}")
print(response.json())
# Output: 400, {"HTTP_status_code": 400, "error": true, "message": "Invalid"}
# Test 3: Multiple PMIDs with pipe separator
response = requests.get(
"https://api.openalex.org/works?filter=pmid:14907713|32572199&mailto=test@example.com"
)
print(f"Multiple PMIDs: {response.status_code}")
print(response.json())
# Output: 400, {"HTTP_status_code": 400, "error": true, "message": "Invalid"}
# Test 4: Direct lookup WORKS FINE
response = requests.get("https://api.openalex.org/works/pmid:14907713")
print(f"Direct lookup: {response.status_code}")
print(f"Found work: {response.json()['id']}")
# Output: 200, Found work: https://openalex.org/W1775749144
Comprehensive Testing Performed
We tested the following combinations:
-
Filter field variations:
filter=pmid: ❌
filter=ids.pmid: ❌
filter=openalex: (with OpenAlex IDs) ❌
filter=ids.openalex: ❌
-
PMID formats:
- Short form:
14907713 ❌
- Full URL form:
https://pubmed.ncbi.nlm.nih.gov/14907713 ❌
-
Separator variations:
- Pipe separator:
pmid:123|456|789 ❌
- URL-encoded pipe:
pmid:123%7C456%7C789 ❌
- Comma separator:
pmid:123,456,789 ❌
-
Request variations:
- With
mailto parameter ✓ (still fails)
- With
per-page=100 parameter ✓ (still fails)
- Different batch sizes (1, 3, 20, 50 PMIDs) ❌
-
Known-good PMIDs tested:
- PMIDs from documentation examples: 14907713, 32572199
- PMIDs from our dataset: 20468064, 25456007, 17885603
- All are valid (confirmed via direct lookup)
Additional Context
- DOI batch filtering works correctly as documented
- Direct PMID lookup works perfectly (e.g.,
/works/pmid:14907713)
- The issue affects only the filter parameter with PMIDs
- This forces users to make N individual API calls instead of N/50 batch calls
- Error message "Invalid" is not descriptive enough to debug the issue
Environment
- API endpoint: https://api.openalex.org/works
- Date tested: June 30, 2025
- No API key used (testing with polite pool via mailto parameter)
- Tested with: Python requests, aiohttp, and direct browser access
- User-Agent: Various (BioQueryous/1.0, Python requests default, Chrome)
Impact
This bug significantly impacts performance for users needing to retrieve multiple works by PMID, forcing them to use individual lookups instead of efficient batch requests. For example, retrieving 1000 PMIDs requires 1000 API calls instead of 20.
Suggested Fix
Either:
- Fix the PMID filter to work as documented
- Update documentation to reflect that PMID batch filtering is not supported
- Provide a more descriptive error message indicating why the filter is invalid
Related Issues
- The same issue likely affects PMCID filtering (mentioned in the same blog post but not tested)
- Possibly related to the filter field deprecation mentioned in docs (host_venue, alternate_host_venues)
Title: PMID Batch Filtering Returns 400 Error Despite Documentation Claims
Description
The OpenAlex API returns a 400 "Invalid" error when attempting to use batch filtering with PMIDs, despite multiple sources in the documentation and community discussions indicating this should work. This affects the ability to efficiently retrieve works by PMID in bulk.
Expected Behavior
According to multiple official sources, PMID batch filtering should work with pipe-separated values:
[OurResearch Blog Post (Dec 21, 2022)](https://blog.ourresearch.org/fetch-multiple-dois-in-one-openalex-api-request/) states:
[Google Groups Discussion](https://groups.google.com/g/openalex-users/c/6xvPsguNM6A) where OpenAlex developer Casey states:
[OpenAlex Community Forum](https://groups.google.com/g/openalex-community/c/5foVRPybEYM) shows example:
[openalexR Package Documentation](https://docs.ropensci.org/openalexR/) shows working example:
Actual Behavior
All attempts to use PMID batch filtering return a 400 error with message "Invalid", regardless of:
pmid:vsids.pmid:)mailtoparameterper-pageparameterSteps to Reproduce
Comprehensive Testing Performed
We tested the following combinations:
Filter field variations:
filter=pmid:❌filter=ids.pmid:❌filter=openalex:(with OpenAlex IDs) ❌filter=ids.openalex:❌PMID formats:
14907713❌https://pubmed.ncbi.nlm.nih.gov/14907713❌Separator variations:
pmid:123|456|789❌pmid:123%7C456%7C789❌pmid:123,456,789❌Request variations:
mailtoparameter ✓ (still fails)per-page=100parameter ✓ (still fails)Known-good PMIDs tested:
Additional Context
/works/pmid:14907713)Environment
Impact
This bug significantly impacts performance for users needing to retrieve multiple works by PMID, forcing them to use individual lookups instead of efficient batch requests. For example, retrieving 1000 PMIDs requires 1000 API calls instead of 20.
Suggested Fix
Either:
Related Issues