FINRA BrokerCheck Scraper automates collection of broker, investment advisor, and firm profile data so compliance and research teams can move faster with fewer manual checks. It turns complex regulatory profiles into clean, analysis-ready records with direct links to profile pages and reports.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for finra-brokercheck-scraper you've just found your team — Let’s Chat. 👆👆
This project collects structured professional and regulatory information from the FINRA BrokerCheck database for individuals and firms. It solves the problem of time-consuming manual lookups by converting search results and profile details into flattened, searchable records. It’s built for compliance officers, financial institutions, legal teams, and analysts who need reliable due diligence data at scale.
- Supports both individual (broker/advisor) and firm searches using names or identifier numbers
- Captures employment history, registration scope, exams, and disclosure flags in a single record
- Includes direct URLs to detail pages and downloadable PDF reports when available
- Provides optional inclusion of previous (inactive) registrations for full-history reviews
- Exports dataset-friendly results for audits, monitoring, and downstream analytics
| Feature | Description |
|---|---|
| Individual & firm search modes | Switch between broker/advisor lookups and firm/company searches using one input. |
| Identifier-based matching | Search by name or CRD/SEC identifiers to reduce ambiguity and improve precision. |
| Previous registration inclusion | Optionally include inactive profiles to support historical due diligence. |
| Flattened, analysis-ready output | Produces normalized fields suitable for spreadsheets, BI tools, and databases. |
| Employment history summarization | Captures current/previous employment counts and firm name summaries for quick review. |
| Licensing & exams extraction | Collects exam names, categories, and taken dates for credential verification. |
| Disclosure detection | Extracts disclosure flags and counts to surface risk signals quickly. |
| Direct report linking | Outputs profile URLs and PDF report links for fast verification and evidence trails. |
| Configurable limits | Control maximum unique results to keep runs predictable and cost-aware. |
| Field Name | Field Description |
|---|---|
| brokerId | Unique internal identifier for the broker/advisor profile result. |
| firstName | First name of the broker/advisor. |
| lastName | Last name of the broker/advisor. |
| middleName | Middle name (if present). |
| fullName | Full formatted name used for display and reporting. |
| crdNumber | CRD identifier for the broker/advisor or firm (when applicable). |
| otherNames | Alternate names/aliases associated with the profile. |
| bcScope | BrokerCheck scope/status indicator (e.g., active/inactive context). |
| iaScope | Investment advisor scope indicator (in-scope/out-of-scope context). |
| industryDays | Days of industry experience captured from the profile. |
| currentEmploymentsCount | Count of current employments/affiliations. |
| previousEmploymentsCount | Count of previous employments/affiliations. |
| totalEmploymentsCount | Total employments/affiliations count. |
| currentFirmName | Current firm name (if actively affiliated). |
| previousFirmNames | Previous firm name(s) associated with past registrations. |
| previousFirmCrds | CRD(s) for prior affiliated firms when available. |
| previousFirmCities | City information for prior affiliated firms when available. |
| previousFirmStates | State information for prior affiliated firms when available. |
| previousRegistrationBeginDates | Begin date(s) for previous registrations. |
| previousRegistrationEndDates | End date(s) for previous registrations. |
| stateExamCount | Count of state exams recorded. |
| principalExamCount | Count of principal exams recorded. |
| productExamCount | Count of product exams recorded. |
| totalExamCount | Total number of exams recorded. |
| examNames | Semicolon-delimited list of exam names. |
| examCategories | Semicolon-delimited list of exam categories (e.g., series). |
| examTakenDates | Semicolon-delimited list of exam taken dates. |
| examScopes | Semicolon-delimited scope markers per exam entry. |
| approvedSroRegistrationCount | Count of approved SRO registrations. |
| approvedFinraRegistrationCount | Count of approved FINRA registrations. |
| approvedStateRegistrationCount | Count of approved state registrations. |
| totalRegistrationCount | Total registrations count. |
| disclosureFlag | Indicates whether disclosures exist on the profile. |
| iaDisclosureFlag | Indicates whether IA disclosures exist on the profile. |
| hasDisclosures | Boolean form of disclosure presence. |
| disclosuresCount | Number of disclosures found (if available). |
| hasBcComments | Indicates BrokerCheck comments presence (when available). |
| hasIaComments | Indicates IA comments presence (when available). |
| legacyReportStatusDescription | Status description for report availability/requests. |
| scrapedTimestamp | ISO timestamp when the record was collected. |
| detailPageUrl | Direct URL to the profile detail page. |
| pdfReportUrl | Direct URL to the PDF report when available. |
[
{
"brokerId": "4876562",
"firstName": "ROBERT",
"lastName": "SMITH",
"middleName": "EUGENE",
"fullName": "ROBERT EUGENE SMITH",
"crdNumber": "4876562",
"otherNames": "BOB SMITH; Smitty Smith",
"bcScope": "InActive",
"iaScope": "NotInScope",
"industryDays": 606,
"currentEmploymentsCount": 0,
"previousEmploymentsCount": 1,
"totalEmploymentsCount": 1,
"currentFirmName": null,
"previousFirmNames": "NYLIFE SECURITIES LLC",
"previousFirmCrds": "5167",
"previousFirmCities": "NEW BRAUNFELS",
"previousFirmStates": "TX",
"previousRegistrationBeginDates": "6/5/2017",
"previousRegistrationEndDates": "1/31/2019",
"stateExamCount": 0,
"principalExamCount": 0,
"productExamCount": 2,
"totalExamCount": 2,
"examNames": "Securities Industry Essentials Examination; Investment Company Products/Variable Contracts Representative Examination",
"examCategories": "SIE; Series 6",
"examTakenDates": "10/1/2018; 6/5/2017",
"examScopes": "BC; BC",
"approvedSroRegistrationCount": 0,
"approvedFinraRegistrationCount": 0,
"approvedStateRegistrationCount": 0,
"totalRegistrationCount": 0,
"disclosureFlag": "N",
"iaDisclosureFlag": "N",
"hasDisclosures": false,
"disclosuresCount": 0,
"hasBcComments": "N",
"hasIaComments": "N",
"legacyReportStatusDescription": "Not Requested",
"scrapedTimestamp": "2025-10-09T15:24:00.753Z",
"detailPageUrl": "https://brokercheck.finra.org/individual/summary/4876562",
"pdfReportUrl": "https://files.brokercheck.finra.org/individual/individual_4876562.pdf"
}
]
FINRA BrokerCheck Scraper (IMPORTANT :!! always keep this name as the name of the apify actor !!! FINRA BrokerCheck Scraper )/
├── src/
│ ├── main.py
│ ├── runner.py
│ ├── pipelines/
│ │ ├── search_pipeline.py
│ │ ├── individual_pipeline.py
│ │ └── firm_pipeline.py
│ ├── clients/
│ │ ├── http_client.py
│ │ ├── brokercheck_client.py
│ │ └── rate_limiter.py
│ ├── parsers/
│ │ ├── search_results_parser.py
│ │ ├── individual_profile_parser.py
│ │ ├── firm_profile_parser.py
│ │ └── normalize.py
│ ├── models/
│ │ ├── input_schema.py
│ │ ├── broker_record.py
│ │ └── firm_record.py
│ ├── exporters/
│ │ ├── dataset_writer.py
│ │ └── field_mapping.py
│ ├── utils/
│ │ ├── dates.py
│ │ ├── strings.py
│ │ ├── validation.py
│ │ └── logging.py
│ └── config/
│ ├── settings.py
│ └── settings.example.json
├── tests/
│ ├── test_normalize.py
│ ├── test_parsers.py
│ └── test_validation.py
├── data/
│ ├── input.example.json
│ └── sample_output.json
├── scripts/
│ ├── local_run.sh
│ └── export_dataset.py
├── .gitignore
├── pyproject.toml
├── requirements.txt
├── LICENSE
└── README.md
- Compliance teams use it to monitor broker/advisor status and disclosures, so they can reduce regulatory risk and keep records audit-ready.
- Financial institutions use it to verify credentials during hiring or onboarding, so they can make faster, safer personnel decisions.
- Investment firms use it to run due diligence on counterparties and affiliates, so they can avoid reputational exposure and strengthen governance.
- Legal professionals use it to collect profile evidence and report links, so they can support investigations and litigation with traceable sources.
- Data analysts use it to build structured datasets of industry professionals, so they can analyze trends, coverage, and compliance signals at scale.
Q: What search types are supported, and how do I choose?
A: Use searchType: "individual" when you’re searching brokers or investment advisors by name or CRD. Use searchType: "firm" when you’re searching organizations by firm name or identifier numbers. The output schema is optimized per mode while keeping shared compliance fields consistent.
Q: How do I limit the number of results and avoid duplicates?
A: Set maxItems to cap the number of unique results collected per run. The pipeline de-duplicates by stable identifiers (such as CRD/profile IDs) so repeated appearances across search pages won’t inflate your dataset.
Q: Can I include inactive or previously registered professionals?
A: Yes. Enable includePrevious: true to include previously registered profiles. This is useful for historical due diligence, offboarding reviews, and compliance investigations that require full registration timelines.
Q: What if some fields are missing (like current firm name or PDF link)?
A: Some profiles may not have certain attributes available (e.g., no current affiliation, no disclosures, or report links not present). The scraper outputs null or empty values consistently so your downstream workflows don’t break and you can filter reliably.
Primary Metric: Averages 1.2–2.0 seconds per profile record on typical runs, depending on whether profile-level enrichment is enabled and result set size.
Reliability Metric: 98%+ successful record yield on steady network conditions with automatic retries and conservative request pacing to reduce transient failures.
Efficiency Metric: Sustains ~30–60 records/minute in mixed search workloads (individual + firm) while keeping memory usage under 300MB for most runs.
Quality Metric: 95%+ field completeness for core identity, registration scope, exams, and disclosure flags; employment summaries and report links populate when present on the source profile.
