Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: BLSloadR
Type: Package
Title: Download Time Series Data from the U.S. Bureau of Labor Statistics
Version: 0.5.1
Version: 0.5.2
Authors@R: c(
person(
given = "Nevada Department of Employment, Training, and Rehabilitation",
Expand Down Expand Up @@ -48,7 +48,8 @@ Suggests:
rmarkdown,
R.utils,
testthat (>= 3.0.0),
tidyr
tidyr,
usethis
VignetteBuilder: knitr
URL: https://schmidtdetr.github.io/BLSloadR/
Config/Needs/website: rmarkdown
Expand Down
3 changes: 3 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# BLSloadR 0.5.2 patch notes

This patch includes a critical fix to resolve rate limit issues downloading data from the BLS. It implements a `BLS_USER_AGENT` environment variable which is called to populate the file download requests to BLS. Users encountering a 403 error on most requests will need to set this environment variable to ensure smooth downloads. Additional documentation and warning messages will be implemented in a future patch.

# BLSloadR 0.5.1 patch notes

Expand Down
27 changes: 23 additions & 4 deletions R/download_helpers.R
Original file line number Diff line number Diff line change
@@ -1,12 +1,31 @@
#' Get standard BLS HTTP headers
#'
#' Generate headers for BLS requests
#'
#' Returns a named character vector of HTTP headers required for BLS API requests.
#' These headers mimic a standard browser to ensure compatibility with BLS servers.
#'
#'
#' @param host The host to use in the Host header (default: "download.bls.gov")
#' @return A named character vector of HTTP headers
#' @keywords internal
get_bls_headers <- function(host = "download.bls.gov") {
# 1. Check for a local environment variable first
# This allows users to set their email/identity via .Renviron or Sys.setenv()
ua <- Sys.getenv("BLS_USER_AGENT")

# 2. If the variable is empty, use a list of plausible headers to rotate
if (ua == "") {
plausible_agents <- c(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:123.0) Gecko/20100101 Firefox/123.0",
"Mozilla/5.0 (R; BLSloadR Package)"
)
# Select one at random for this session/call
ua <- sample(plausible_agents, 1)
}

# 3. Generate dynamic headers

c(
"Accept" = "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"Accept-Encoding" = "gzip, deflate, br",
Expand All @@ -22,7 +41,7 @@ get_bls_headers <- function(host = "download.bls.gov") {
"Sec-Fetch-Site" = "same-origin",
"Sec-Fetch-User" = "?1",
"Upgrade-Insecure-Requests" = "1",
"User-Agent" = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
"User-Agent" = ua
)
}

Expand Down
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,18 @@ The primary functions in this package all begin with get_ and are listed below:
-`get_qcew()` - This accesses data from the Quarterly Census of Employment and Wages (QCEW). This is a very large data set, so access is filtered by area or industry. This function iterates requesting single-quarter files via the BLS QCEW Data Slices tool at https://www.bls.gov/cew/additional-resources/open-data/csv-data-slices.htm. This function was included beginning in version 0.3.1.

-`get_cps_subset()` - This accesses data from the National Current Population Survey (CPS) which determines the national unemployment rate. Several demographic details are available here which are not available at the state or local levels. This is the "LN" database. This function was introduced in BLSloadR version 0.5.

# Configuring Your User Profile
BLSloadR will typically work by default without any cusomization. However, there are some options you can use that may improve your experience. These options are managed with *environment variables* in your R session that enable the following:

-`BLS_USER_AGENT` - setting this environment variable to your e-mail address will use your e-mail address when downloading data from the BLS. In case of errors with your downloads, this may help the BLS to identify you as an individual user. Setting this environment variable to a character string passes that character string to the BLS as the User-Agent HTML header.

-`USE_BLS_CACHE` - Setting this environment variable to "TRUE" will enable a local file cache of your BLS downloads which will download new files for supported functions only when the underlying data has changed.

-`BLS_CACHE_DIR` - If you want to use the file cache, you may wish to specify a location. Setting this environment variable will specify a different path for the file cache than the default.

To permanently set these environment variables, you can edit your .Renviron file (such as with `usethis::edit_r_environ()`). To do so for a single session, you can set your environment variables with `Sys.setenv(USE_BLS_CACHE="TRUE")`.

# Enhanced CES Filtering for Performance

The `get_ces()` and `get_national_ces()` functions now include powerful filtering options that significantly improve performance:
Expand Down
2 changes: 1 addition & 1 deletion man/get_bls_headers.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading