From 19efc8f0c8bb46dc25f05c70b70d846c139fa4c2 Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Mon, 5 Jan 2026 11:56:37 +0000 Subject: [PATCH 01/39] start the 'vignette-first' development workflow for the Asynchronous APIs --- vignettes/sync_async.Rmd | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) create mode 100644 vignettes/sync_async.Rmd diff --git a/vignettes/sync_async.Rmd b/vignettes/sync_async.Rmd new file mode 100644 index 0000000..a89fc87 --- /dev/null +++ b/vignettes/sync_async.Rmd @@ -0,0 +1,31 @@ +--- +title: "Synchronous vs Asynchronous APIs" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Synchronous vs Asynchronous APIs} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` + +```{r setup} +library(EndpointR) +``` + +# Introduction + +Most of EndpointR's integrations are with synchronous APIs such as [Completions](https://platform.openai.com/docs/api-reference/completions) by OpenAI, Hugging Face's [Inference Endpoints](https://huggingface.co/docs/inference-endpoints/en/index), and Messages by [Anthropic](https://platform.claude.com/docs/en/api/messages). When using these APIs, we send a request, wait a second or two and receive a response. + +However, most Generative AI providers also offer lower-cost, asynchronous APIs. The providers usually offer a guarantee of the results within a time frame, and an estimate of the average time to return the results. For example, they may guarantee results within 24 hours, but expect them within 1-3 hours. + +# When to choose Synchronous vs Asynchronous + +For us as consumers, the decision is a trade-off between time and money. If we are serving Generative AI to other consumers, e.g. in an application, we will usually favour a Synchronous API because users expect instant results. Alternatively, if we are running analyses over large datasets, or repeated batch-inference, we can usually afford to wait longer for the results so we may favour the Asynchronous APIs. + +Synchronous APIs are also very useful when we are still in the experimental step of an analysis and need quick feedback- i.e. when we're testing our prompts, and developing our [schemas](https://json-schema.org/) for [Structured Outputs](https://platform.openai.com/docs/guides/structured-outputs), we don't want to wait 24 hours to check whether our schema works. Instead, we want to send a request and receive a response within seconds. That way we can iteratively fix/develop our schemas/prompts and get to a better outcome quicker. From 10449bf71529e43f25351240ca1ee6d00e399146 Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Mon, 5 Jan 2026 12:36:41 +0000 Subject: [PATCH 02/39] Continue outline of batch workflow in the sync_async vignette --- vignettes/sync_async.Rmd | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/vignettes/sync_async.Rmd b/vignettes/sync_async.Rmd index a89fc87..9e37a9c 100644 --- a/vignettes/sync_async.Rmd +++ b/vignettes/sync_async.Rmd @@ -26,6 +26,34 @@ However, most Generative AI providers also offer lower-cost, asynchronous APIs. # When to choose Synchronous vs Asynchronous -For us as consumers, the decision is a trade-off between time and money. If we are serving Generative AI to other consumers, e.g. in an application, we will usually favour a Synchronous API because users expect instant results. Alternatively, if we are running analyses over large datasets, or repeated batch-inference, we can usually afford to wait longer for the results so we may favour the Asynchronous APIs. +> For a more comprehensive treatment, and motivating examples [OpenAI's offficial documentation/guide](https://platform.openai.com/docs/guides/batch) is a good place to start. -Synchronous APIs are also very useful when we are still in the experimental step of an analysis and need quick feedback- i.e. when we're testing our prompts, and developing our [schemas](https://json-schema.org/) for [Structured Outputs](https://platform.openai.com/docs/guides/structured-outputs), we don't want to wait 24 hours to check whether our schema works. Instead, we want to send a request and receive a response within seconds. That way we can iteratively fix/develop our schemas/prompts and get to a better outcome quicker. +As consumers, the decision represents a trade-off between time and money. If we are serving Generative AI to other consumers, e.g. in an application, we will usually favour a Synchronous API because users expect instant results. Alternatively, if we are running analyses over large datasets, or repeated batch-inference, we can usually afford to wait longer for the results so we may favour the Asynchronous APIs. + +Synchronous APIs are also very useful when we are still in the experimental step of an analysis and need quick feedback- i.e. when we're testing our prompts, and developing our [schemas](https://json-schema.org/) for [Structured Outputs](https://platform.openai.com/docs/guides/structured-outputs), we don't want to wait 24 hours to find out we made an error in our schema. Instead, we want to send a request and receive a response within seconds. That way we can iteratively fix/develop our schemas/prompts and get to a better outcome quicker. + +At the time of writing, [OpenAI's batch API](https://platform.openai.com/docs/guides/batch) offers a 50% discount on the Completions API, as well as higher rate limits: + +
+ +Learn how to use OpenAI's Batch API to send asynchronous groups of requests with 50% lower costs, a separate pool of significantly higher rate limits, and a clear 24-hour turnaround time. The service is ideal for processing jobs that don't require immediate responses. + +Batch processing jobs are often helpful in use cases like: + +1. Running evaluations +2. Classifying large datasets +3. Embedding content repositories + +
+ +# EndpointR Implementation of OpenAI Batch API + +Due to inherent differences between Synchronous and Asynchronous APIs, the EndpointR implementation of the OpenAI Batch API will feel more like submitting jobs to a cluster/serve than automagically working with an entire data frame as in `oai_complete_df()` and `oai_embed_df()`. + +\`oai_batch_create + +`oai_prepare_batch()` `oai_prepare_batches()` + +`oai_create_batch()` `oai_create_batches()` + +`oai_check_batch_jobs()` From fe17415662e6fee79e01cbc0075db87f33e5bb7e Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Mon, 5 Jan 2026 14:50:37 +0000 Subject: [PATCH 03/39] add the 50k limit explanation and then start on the code for creating a batch of requests and managing them --- vignettes/sync_async.Rmd | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/vignettes/sync_async.Rmd b/vignettes/sync_async.Rmd index 9e37a9c..f7f3057 100644 --- a/vignettes/sync_async.Rmd +++ b/vignettes/sync_async.Rmd @@ -18,6 +18,10 @@ knitr::opts_chunk$set( library(EndpointR) ``` +# Quickstart + +TODO: Code samples when the functions etc. are up and running. + # Introduction Most of EndpointR's integrations are with synchronous APIs such as [Completions](https://platform.openai.com/docs/api-reference/completions) by OpenAI, Hugging Face's [Inference Endpoints](https://huggingface.co/docs/inference-endpoints/en/index), and Messages by [Anthropic](https://platform.claude.com/docs/en/api/messages). When using these APIs, we send a request, wait a second or two and receive a response. @@ -48,12 +52,13 @@ Batch processing jobs are often helpful in use cases like: # EndpointR Implementation of OpenAI Batch API -Due to inherent differences between Synchronous and Asynchronous APIs, the EndpointR implementation of the OpenAI Batch API will feel more like submitting jobs to a cluster/serve than automagically working with an entire data frame as in `oai_complete_df()` and `oai_embed_df()`. +Due to inherent differences between Synchronous and Asynchronous APIs, the EndpointR implementation of the OpenAI Batch API will feel more like submitting jobs to a cluster/server than automagically working with an entire data frame as in `oai_complete_df()` and `oai_embed_df()`. As such, different functions and workflows are needed. -\`oai_batch_create +You will likely want to use the Batch API for both embeddings and completions, so we have a separate function to prepare batches for each one: -`oai_prepare_batch()` `oai_prepare_batches()` +- `oai_batch_prepare_embeddings()` +- `oai_batch_prepare_completions()` -`oai_create_batch()` `oai_create_batches()` +Each function expects a data frame as input: `oai_batch_prepare_embeddings()` will accept the relevant arguments from `oai_embed_df()`, `oai_batch_prepare_completions` will accept the relevant arguments from `oai_complete_df()`. Each row in the input data frame is converted first to a http request via {httr2}, and then to a line in a .jsonl file. The OpenAI Batch API expects a single .jsonl file of up 50,000 rows or 200 MB in size. If we want to perform the operation on a 150,000 row data frame, we need to create and manage 3 separate batches. -`oai_check_batch_jobs()` +> NOTE: For structured outputs the Batch API requires us to send the JSON schema with each request. Complex schemas will quickly lead to large file size, perhaps eclipsing the 200 MB limit. From f739ed2906a949585ae24cce94b359af761eaf20 Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Mon, 5 Jan 2026 16:52:50 +0000 Subject: [PATCH 04/39] bump vers add vignette to pkgdown (funcs later) start batch api implementation add tests for oai_batch_build_embed_row --- DESCRIPTION | 4 ++-- NEWS.md | 4 ++++ R/oai_batch_api.R | 33 +++++++++++++++++++++++++++++ _pkgdown.yml | 3 +++ tests/testthat/test-oai_batch_api.R | 27 +++++++++++++++++++++++ todos.qmd | 10 +++++++-- 6 files changed, 77 insertions(+), 4 deletions(-) create mode 100644 R/oai_batch_api.R create mode 100644 tests/testthat/test-oai_batch_api.R diff --git a/DESCRIPTION b/DESCRIPTION index 356eafe..e52f29b 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,13 +1,13 @@ Package: EndpointR Title: Connects to various Machine Learning inference providers -Version: 0.2 +Version: 0.2.1 Authors@R: person("Jack", "Penzer", , "Jack.penzer@sharecreative.com", role = c("aut", "cre")) Description: EndpointR is a 'batteries included', open-source R package for connecting to various APIs for Machine Learning model predictions. EndpointR is built for company-specific use cases, so may not be useful to a wide audience. License: MIT + file LICENSE Encoding: UTF-8 Roxygen: list(markdown = TRUE) -RoxygenNote: 7.3.2 +RoxygenNote: 7.3.3 Suggests: spelling, broom, diff --git a/NEWS.md b/NEWS.md index 23bd895..6944178 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,7 @@ +# EndpointR 0.2.1 + +- OpenAI Batch API for Embeddings and Completions + # EndpointR 0.2 - error message and status propagation improvement. Now writes .error, .error_msg (standardised across package), and .status. Main change is preventing httr2 eating the errors before we can deal with them diff --git a/R/oai_batch_api.R b/R/oai_batch_api.R new file mode 100644 index 0000000..4bce249 --- /dev/null +++ b/R/oai_batch_api.R @@ -0,0 +1,33 @@ +# the batch + +oai_batch_build_embed_req <- function(input, id, model = "text-embedding-3-small", dimensions = NULL, method = "POST", encoding_format = "float", endpoint = "/v1/embeddings") { + + + body <- purrr::compact( + # use compact so that if dimensions is NULL it gets dropped from the req + list( + input = input, + model = model, + dimensions = dimensions, + encoding_format = encoding_format + )) + + embed_row <- list( + custom_id = id, + method = method, + url = endpoint, + body = body + ) + + embed_row_json <- jsonlite::toJSON(embed_row, + auto_unbox = TRUE) + + return(embed_row_json) +} + + + +oai_batch_prepare_embeddings <- function(df, text_var, id_var, model = "text-embedding-3-small", dimensions, key_name = "OPENAI_API_KEY", endpoint_url = "https://api.openai.com/v1/embeddings") { + +} + diff --git a/_pkgdown.yml b/_pkgdown.yml index 99b766b..551ee6f 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -48,6 +48,9 @@ navbar: - text: Advanced Topics - text: Improving Performance href: articles/improving_performance.html + - text: Synchronous vs Asynchronous APIs + href: articles/sync_async.html + reference: - title: "Getting Started" diff --git a/tests/testthat/test-oai_batch_api.R b/tests/testthat/test-oai_batch_api.R new file mode 100644 index 0000000..7c88d53 --- /dev/null +++ b/tests/testthat/test-oai_batch_api.R @@ -0,0 +1,27 @@ +test_that("oai_batch_build_embed_row creates a row of JSON and responds to its input arguments", { + no_dims <- expect_no_error( + oai_batch_build_embed_row( + "hello", + "1234" + ) + ) + + no_dims_str <- jsonlite::fromJSON(no_dims) + + with_dims <- expect_no_error( + oai_batch_build_embed_row( + "hello", + "134", + dimensions = 124 + ) + ) + + with_dims_str <- jsonlite::fromJSON(with_dims) + + expect_equal(with_dims_str$body$dimensions, 124) + expect_setequal(names(no_dims_str), names(with_dims_str)) + + expect_true(no_dims_str$method == "POST") + expect_equal(no_dims_str$url, "/v1/embeddings") + expect_equal(no_dims_str$body$model, "text-embedding-3-small") +}) diff --git a/todos.qmd b/todos.qmd index 65db2ae..d7d710e 100644 --- a/todos.qmd +++ b/todos.qmd @@ -2,13 +2,19 @@ # Versions +## 0.2.1 + +- [ ] OpenAI Batch API + - [ ] Embeddings + - [ ] Completions + ## 0.2 - [ ] Support for Anthropic API - [ ] Batches - - [ ] Messages (Completions) + - [x] Messages (Completions) - [x] Structured Outputs -- [ ] Support for Gemini API +- [ ] Support for Gemini API (moving to later release) - [ ] Embeddings - [ ] Completions - [ ] Structured Outputs From ec5b376dff3514574b0d3d8504136122c18074c4 Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Tue, 6 Jan 2026 17:06:28 +0000 Subject: [PATCH 05/39] remove badges from README text continue vignette add dev_doc for batch API add chunk for progress for batches and files --- R/oai_batch_api.R | 25 +++++++++++++++++++++++-- README.Rmd | 6 +----- README.md | 6 +----- vignettes/sync_async.Rmd | 33 +++++++++++++++++++++++++++++++-- 4 files changed, 56 insertions(+), 14 deletions(-) diff --git a/R/oai_batch_api.R b/R/oai_batch_api.R index 4bce249..cbf7ef7 100644 --- a/R/oai_batch_api.R +++ b/R/oai_batch_api.R @@ -25,9 +25,30 @@ oai_batch_build_embed_req <- function(input, id, model = "text-embedding-3-small return(embed_row_json) } +oai_batch_prepare_embeddings <- function(df, text_var, id_var, model = "text-embedding-3-small", dimensions = NULL) { -oai_batch_prepare_embeddings <- function(df, text_var, id_var, model = "text-embedding-3-small", dimensions, key_name = "OPENAI_API_KEY", endpoint_url = "https://api.openai.com/v1/embeddings") { - } +oai_batch_file_upload <- function(jsonl_rows, key_name = "OPENAI_API_KEY") { + + api_key <- get_api_key(key_name) + + tmp <- tempfile(fileext = ".jsonl") + on.exit(unlink(tmp)) # if session crashes we drop the file from mem safely + writeLines(jsonl_rows, tmp) # send the content to the temp file for uploading to OAI + # question here is whether to also save this somewhere by force... + # once OAI have the file it's backed up for 30 days. + + httr2::request(base_url = "https://api.openai.com/v1/files") |> + httr2::req_auth_bearer_token(api_key) |> + httr2::req_body_multipart(file = curl::form_file(tmp), + purpose = "batch") |> + httr2::req_perform() |> + httr2::resp_body_json() + + + + + +} diff --git a/README.Rmd b/README.Rmd index 31989d6..ea55d42 100644 --- a/README.Rmd +++ b/README.Rmd @@ -13,10 +13,6 @@ knitr::opts_chunk$set( # EndpointR - - - - EndpointR is a 'batteries included', open-source R package for connecting to various Application Programming Interfaces ([APIs](https://en.wikipedia.org/wiki/API){target="_blank"}) for Machine Learning model predictions. > **TIP:** If you are an experienced programmer, or have experience with hitting APIs, consider going directly to [httr2](https://httr2.r-lib.org/reference/index.html) @@ -274,7 +270,7 @@ metadata$endpoint_url Read the [LLM Providers Vignette](articles/llm_providers.html), and the [Structured Outputs Vignette](articles/structured_outputs_json_schema.html) for more information on common workflows with the OpenAI Chat Completions API [^1] -[^1]: Content pending implementation for Anthroic Messages API, Gemini API, and OpenAI Responses API +[^1]: Content pending implementation for Anthropic Messages API, Gemini API, and OpenAI Responses API # API Key Security diff --git a/README.md b/README.md index 10c035d..fb69873 100644 --- a/README.md +++ b/README.md @@ -3,10 +3,6 @@ # EndpointR - - - - EndpointR is a ‘batteries included’, open-source R package for connecting to various Application Programming Interfaces (APIs) @@ -295,5 +291,5 @@ information on common workflows with the OpenAI Chat Completions API information on which API keys you need for wach endpoint we support, and how to securely import those API keys into your .Renvironfile. -[^1]: Content pending implementation for Anthroic Messages API, Gemini +[^1]: Content pending implementation for Anthropic Messages API, Gemini API, and OpenAI Responses API diff --git a/vignettes/sync_async.Rmd b/vignettes/sync_async.Rmd index f7f3057..40c25d6 100644 --- a/vignettes/sync_async.Rmd +++ b/vignettes/sync_async.Rmd @@ -59,6 +59,35 @@ You will likely want to use the Batch API for both embeddings and completions, s - `oai_batch_prepare_embeddings()` - `oai_batch_prepare_completions()` -Each function expects a data frame as input: `oai_batch_prepare_embeddings()` will accept the relevant arguments from `oai_embed_df()`, `oai_batch_prepare_completions` will accept the relevant arguments from `oai_complete_df()`. Each row in the input data frame is converted first to a http request via {httr2}, and then to a line in a .jsonl file. The OpenAI Batch API expects a single .jsonl file of up 50,000 rows or 200 MB in size. If we want to perform the operation on a 150,000 row data frame, we need to create and manage 3 separate batches. +Each function expects a data frame as input: `oai_batch_prepare_embeddings()` will accept the relevant arguments from `oai_embed_df()`, `oai_batch_prepare_completions` will accept the relevant arguments from `oai_complete_df()`. The OpenAI Batch API expects a single .jsonl file of up 50,000 rows or 200 MB in size. If we want to perform the operation on a 150,000 row data frame, we need to create and manage 3 separate batches. -> NOTE: For structured outputs the Batch API requires us to send the JSON schema with each request. Complex schemas will quickly lead to large file size, perhaps eclipsing the 200 MB limit. +> **NOTE:** For structured outputs the Batch API requires us to send the JSON schema with each request. Complex schemas will quickly lead to large file size, perhaps eclipsing the 200 MB limit. + +EndpointR prepares each batch, writes it to a file in temporary storage, and then sends the file to the OpenAI Files API. Once in the Files API, EndpointR can trigger the batch to run. + +Each line of of the .jsonl file should form a self-contained request. And rather than routing to the endpoint's URL we route to a stub. For reference, the entire batch gets sent to its own, full URL. + +Example for embeddings (no structured output!): + +Row version: + +``` +"{\"custom_id\":1,\"method\":\"POST\",\"url\":\"/v1/embeddings\",\"body\":{\"input\":\"hello\",\"model\":\"text-embedding-3-small\",\"encoding_format\":\"float\"}}" +``` + +Prettify'd version: + +``` +{ + "custom_id": 1, + "method": "POST", + "url": "/v1/embeddings", + "body": { + "input": "hello", + "model": "text-embedding-3-small", + "encoding_format": "float" + } +} +``` + +> **NOTE:** The Embeddings API expects the input in an 'input' field rather than 'messages' as in the Completions API, and the batch requests must adhere to this. From 40baf0c28d19d20b21279cb6c0e45b1756a05109 Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Tue, 6 Jan 2026 17:17:07 +0000 Subject: [PATCH 06/39] add func for preparing a batch of json rows for embeddings --- R/oai_batch_api.R | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/R/oai_batch_api.R b/R/oai_batch_api.R index cbf7ef7..8fd47bd 100644 --- a/R/oai_batch_api.R +++ b/R/oai_batch_api.R @@ -25,9 +25,29 @@ oai_batch_build_embed_req <- function(input, id, model = "text-embedding-3-small return(embed_row_json) } -oai_batch_prepare_embeddings <- function(df, text_var, id_var, model = "text-embedding-3-small", dimensions = NULL) { +oai_batch_prepare_embeddings <- function(df, text_var, id_var, model = "text-embedding-3-small", dimensions = NULL, method = "POST", encoding_format = "float", endpoint = "/v1/embeddings") { + text_sym <- rlang::ensym(text_var) + id_sym <- rlang::ensym(id_var) + .texts <- dplyr::pull(df, !!text_sym) + .ids <- dplyr::pull(df, !!id_sym) + + reqs <- purrr::map2_chr(.texts, .ids, \(x, y) { + oai_batch_build_embed_req( + input = x, + id = y, + model = model, + dimensions = dimensions, + method = method, + encoding_format = encoding_format, + endpoint = endpoint + ) + }) + + reqs <- paste0(reqs, collapse = "\n") + + return(reqs) } oai_batch_file_upload <- function(jsonl_rows, key_name = "OPENAI_API_KEY") { From 58667743ee20221172e1751bde1d91cd88e1e80f Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Tue, 6 Jan 2026 17:17:41 +0000 Subject: [PATCH 07/39] add conveniece helpers for oai files API --- R/oai_batch_api.R | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/R/oai_batch_api.R b/R/oai_batch_api.R index 8fd47bd..2d4c568 100644 --- a/R/oai_batch_api.R +++ b/R/oai_batch_api.R @@ -67,8 +67,29 @@ oai_batch_file_upload <- function(jsonl_rows, key_name = "OPENAI_API_KEY") { httr2::req_perform() |> httr2::resp_body_json() +oai_file_list <- function(purpose = "batch", key_name = "OPENAI_API_KEY") { + api_key <- get_api_key(key_name) + + httr2::request("https://api.openai.com/v1/files") |> + httr2::req_auth_bearer_token(api_key) |> + httr2::req_url_query(purpose = purpose) |> + httr2::req_error(is_error = ~ FALSE) |> + httr2::req_perform() |> + httr2::resp_body_json() + +} +oai_file_delete <- function(file_id, key_name = "OPENAI_API_KEY") { + api_key <- get_api_key(key_name) + + httr2::request(paste0("https://api.openai.com/v1/files/", file_id)) |> + httr2::req_auth_bearer_token(api_key) |> + httr2::req_method("DELETE") |> + httr2::req_error(is_error = ~ FALSE) |> + httr2::req_perform() |> + httr2::resp_body_json() } + From 0a1eebbe7c583bc6291a69f51e4afff0adddc14d Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Tue, 6 Jan 2026 17:17:54 +0000 Subject: [PATCH 08/39] add oai_batch_file_upload but may revise this to oai_file_upload --- R/oai_batch_api.R | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/R/oai_batch_api.R b/R/oai_batch_api.R index 2d4c568..bd5c048 100644 --- a/R/oai_batch_api.R +++ b/R/oai_batch_api.R @@ -50,23 +50,26 @@ oai_batch_prepare_embeddings <- function(df, text_var, id_var, model = "text-emb return(reqs) } -oai_batch_file_upload <- function(jsonl_rows, key_name = "OPENAI_API_KEY") { +oai_batch_file_upload <- function(jsonl_rows, key_name = "OPENAI_API_KEY", purpose = "batch") { api_key <- get_api_key(key_name) - tmp <- tempfile(fileext = ".jsonl") - on.exit(unlink(tmp)) # if session crashes we drop the file from mem safely - writeLines(jsonl_rows, tmp) # send the content to the temp file for uploading to OAI + .tmp <- tempfile(fileext = ".jsonl") + on.exit(unlink(.tmp)) # if session crashes we drop the file from mem safely + writeLines(jsonl_rows, .tmp) # send the content to the temp file for uploading to OAI # question here is whether to also save this somewhere by force... # once OAI have the file it's backed up for 30 days. httr2::request(base_url = "https://api.openai.com/v1/files") |> httr2::req_auth_bearer_token(api_key) |> - httr2::req_body_multipart(file = curl::form_file(tmp), - purpose = "batch") |> + httr2::req_body_multipart(file = curl::form_file(.tmp), + purpose = purpose) |> + httr2::req_error(is_error = ~ FALSE) |> httr2::req_perform() |> httr2::resp_body_json() +} + oai_file_list <- function(purpose = "batch", key_name = "OPENAI_API_KEY") { api_key <- get_api_key(key_name) From 5330bac1c0b4f9335994bd19bf6097bbc48a7fce Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Wed, 7 Jan 2026 10:18:06 +0000 Subject: [PATCH 09/39] split files api and batch api into separate docs in package (for clarity/potential future development) add dev_docs for batch api --- R/{oai_batch_api.R => openai_batch_api.R} | 23 ----- R/openai_files_api.R | 24 +++++ dev_docs/openai_batch_api.qmd | 104 ++++++++++++++++++++++ 3 files changed, 128 insertions(+), 23 deletions(-) rename R/{oai_batch_api.R => openai_batch_api.R} (74%) create mode 100644 R/openai_files_api.R create mode 100644 dev_docs/openai_batch_api.qmd diff --git a/R/oai_batch_api.R b/R/openai_batch_api.R similarity index 74% rename from R/oai_batch_api.R rename to R/openai_batch_api.R index bd5c048..dca3c19 100644 --- a/R/oai_batch_api.R +++ b/R/openai_batch_api.R @@ -70,29 +70,6 @@ oai_batch_file_upload <- function(jsonl_rows, key_name = "OPENAI_API_KEY", purpo } -oai_file_list <- function(purpose = "batch", key_name = "OPENAI_API_KEY") { - - api_key <- get_api_key(key_name) - - httr2::request("https://api.openai.com/v1/files") |> - httr2::req_auth_bearer_token(api_key) |> - httr2::req_url_query(purpose = purpose) |> - httr2::req_error(is_error = ~ FALSE) |> - httr2::req_perform() |> - httr2::resp_body_json() - -} -oai_file_delete <- function(file_id, key_name = "OPENAI_API_KEY") { - - api_key <- get_api_key(key_name) - - httr2::request(paste0("https://api.openai.com/v1/files/", file_id)) |> - httr2::req_auth_bearer_token(api_key) |> - httr2::req_method("DELETE") |> - httr2::req_error(is_error = ~ FALSE) |> - httr2::req_perform() |> - httr2::resp_body_json() -} diff --git a/R/openai_files_api.R b/R/openai_files_api.R new file mode 100644 index 0000000..4f2ec57 --- /dev/null +++ b/R/openai_files_api.R @@ -0,0 +1,24 @@ +oai_file_list <- function(purpose = "batch", key_name = "OPENAI_API_KEY") { + + api_key <- get_api_key(key_name) + + httr2::request("https://api.openai.com/v1/files") |> + httr2::req_auth_bearer_token(api_key) |> + httr2::req_url_query(purpose = purpose) |> + httr2::req_error(is_error = ~ FALSE) |> + httr2::req_perform() |> + httr2::resp_body_json() + +} + +oai_file_delete <- function(file_id, key_name = "OPENAI_API_KEY") { + + api_key <- get_api_key(key_name) + + httr2::request(paste0("https://api.openai.com/v1/files/", file_id)) |> + httr2::req_auth_bearer_token(api_key) |> + httr2::req_method("DELETE") |> + httr2::req_error(is_error = ~ FALSE) |> + httr2::req_perform() |> + httr2::resp_body_json() +} diff --git a/dev_docs/openai_batch_api.qmd b/dev_docs/openai_batch_api.qmd new file mode 100644 index 0000000..2f6c3a9 --- /dev/null +++ b/dev_docs/openai_batch_api.qmd @@ -0,0 +1,104 @@ +--- +title: "openai_batch_api" +format: html +--- + +So... we could actually re-use some of the logic from oai_embed.R / oai_classify.R. There are some small differences, e.g. we feed a stub of the endpoint URL to each request, and then we upload the batch of inputs as a file directly to the files API, then create the batch. + +```{r} +embed_req <- oai_build_embedding_request("xx", dimensions = 324) + +body <- embed_req$body$data + +row <- list( + custom_id = "xx", + method = "POST", + url = "/v1/embeddings", + body = embed_req$body$data +) +``` + +Initial thought was to just stream_out, but stream_out expects a data frame as input. Which we can use further down the line, but not here. + +```{r} +jsonlite::stream_out(row, con = stdout()) + +tib_w_row <- tibble::tibble(rows = list(row)) + +jsonlite::stream_out(tib_w_row, con = file("test_dir/jsonl_outputs/batch_api_test.jsonl")) + +read_in <- jsonlite::stream_in(con = file("test_dir/jsonl_outputs/batch_api_test.jsonl")) |> jsonlite::toJSON() +read_in["rows"] + +tibble::tibble(x = 1:10^5) |> + chunk_dataframe(chunk_size = 80000) +``` + +So instead, we want to take the row, convert it to JSON with `auto_unbox = TRUE` and then writeLines to a `.jsonl` file. Recall that `auto_unbox` just stops each k:v pair's value being treated as a list + +```{r} +# stream in./ out not the right way +jsonlite::toJSON(row, auto_unbox = TRUE) |> + writeLines("test_dir/jsonl_outputs/batch_api_test_write_lines.jsonl") + + +x <- readLines("test_dir/jsonl_outputs/batch_api_test_write_lines.jsonl") +jsonlite::toJSON(x, auto_unbox = TRUE) +``` + +We said we could re-use some of the logic, but looking at it we don't really benefit from using httr2 for each request - it's unnecessary overhead. So we just create lists for now. We may use httr2 for the actual batch request (but maybe not!) + +```{r} +single_batch_row <- oai_batch_build_embed_req("hello", "1") + +list_rows <- purrr::map(1:10, \(x) oai_batch_build_embed_req("hello", x)) +``` + +Then we can write them to a file as follows, and send it to OpenAI as a batch job. + +```{r} +writeLines( + unlist(list_rows), + "test_dir/jsonl_outputs/write_ten_lines.jsonl") + + +readLines( + "test_dir/jsonl_outputs/write_ten_lines.jsonl", + n = 2 +) +``` + +```{r} +test_df <- tibble::tibble( + x = letters, + y = 1:length(letters) +) + +test_df |> + mutate( + reqs = map2_chr(x, y, \(text, id) { + oai_batch_build_embed_req( + text, + id, + dimensions = 324 + ) + }) + ) + +xx <- test_df |> + oai_batch_prepare_embeddings( + x, + y + ) + +oai_batch_file_upload( + xx +) + +batch_job_data <- oai_batch_file_list() +temp_id <- batch_job_data$data[[1]]$id + +oai_batch_file_delete(temp_id) + +oai_batch_file_list() +``` From 6e0a34c717940ed83ff4d0f52f9b65bc8d05c8f2 Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Fri, 9 Jan 2026 11:28:36 +0000 Subject: [PATCH 10/39] add oai_file_content to retrieve a file's contents here main intention is file = batch job output, but it doesn't have to be that way. --- R/openai_files_api.R | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/R/openai_files_api.R b/R/openai_files_api.R index 4f2ec57..41dd2fd 100644 --- a/R/openai_files_api.R +++ b/R/openai_files_api.R @@ -22,3 +22,15 @@ oai_file_delete <- function(file_id, key_name = "OPENAI_API_KEY") { httr2::req_perform() |> httr2::resp_body_json() } + +oai_file_content <- function(file_id, key_name = "OPENAI_API_KEY") { + + api_key <- get_api_key(key_name) + + resp <- httr2::request(paste0("https://api.openai.com/v1/files/", file_id, "/content")) |> + httr2::req_auth_bearer_token(api_key) |> + httr2::req_error(is_error = ~ FALSE) |> + httr2::req_perform() + + httr2::resp_body_string(resp) +} From a71bd2ef3490dfa4700bd1f3088817ced087f982 Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Fri, 9 Jan 2026 11:34:43 +0000 Subject: [PATCH 11/39] fix batch upload file/creation function --- R/openai_batch_api.R | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/R/openai_batch_api.R b/R/openai_batch_api.R index dca3c19..722429a 100644 --- a/R/openai_batch_api.R +++ b/R/openai_batch_api.R @@ -60,11 +60,25 @@ oai_batch_file_upload <- function(jsonl_rows, key_name = "OPENAI_API_KEY", purpo # question here is whether to also save this somewhere by force... # once OAI have the file it's backed up for 30 days. - httr2::request(base_url = "https://api.openai.com/v1/files") |> + resp <- httr2::request(base_url = "https://api.openai.com/v1/files") |> httr2::req_auth_bearer_token(api_key) |> httr2::req_body_multipart(file = curl::form_file(.tmp), purpose = purpose) |> httr2::req_error(is_error = ~ FALSE) |> + httr2::req_perform() + + result <- httr2::resp_body_json(resp) + + if (httr2::resp_status(resp) >= 400) { + error_msg <- result$error$message %||% "Unknown error" + cli::cli_abort(c( + "Failed to upload file to OpenAI Files API", + "x" = error_msg + )) + } + + return(result) +} httr2::req_perform() |> httr2::resp_body_json() From 295cce93252977d2cc9a31d69442b36c5c14a1d0 Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Fri, 9 Jan 2026 11:35:17 +0000 Subject: [PATCH 12/39] Get list of all batches in the API under our org --- R/openai_batch_api.R | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/R/openai_batch_api.R b/R/openai_batch_api.R index 722429a..9840f87 100644 --- a/R/openai_batch_api.R +++ b/R/openai_batch_api.R @@ -79,6 +79,25 @@ oai_batch_file_upload <- function(jsonl_rows, key_name = "OPENAI_API_KEY", purpo return(result) } + +oai_batch_list <- function(limit = 20L, after = NULL, key_name = "OPENAI_API_KEY") { + + api_key <- get_api_key(key_name) + + req <- httr2::request("https://api.openai.com/v1/batches") |> + httr2::req_auth_bearer_token(api_key) |> + httr2::req_url_query(limit = limit) + + if (!is.null(after)) { + req <- httr2::req_url_query(req, after = after) + } + + req |> + httr2::req_error(is_error = ~ FALSE) |> + httr2::req_perform() |> + httr2::resp_body_json() +} + httr2::req_perform() |> httr2::resp_body_json() From 5a1a684fd8693cb24574db6592d2be9d428a097c Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Fri, 9 Jan 2026 11:36:57 +0000 Subject: [PATCH 13/39] make sure id is a string in oai_batch_prepare_embeddings, otherwise API rejects the requests --- R/openai_batch_api.R | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/R/openai_batch_api.R b/R/openai_batch_api.R index 9840f87..37b426b 100644 --- a/R/openai_batch_api.R +++ b/R/openai_batch_api.R @@ -36,7 +36,7 @@ oai_batch_prepare_embeddings <- function(df, text_var, id_var, model = "text-emb reqs <- purrr::map2_chr(.texts, .ids, \(x, y) { oai_batch_build_embed_req( input = x, - id = y, + id = as.character(y), model = model, dimensions = dimensions, method = method, From 0a71bc4333c0b389a550ac0ed44af23ba0c95068 Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Fri, 9 Jan 2026 11:37:46 +0000 Subject: [PATCH 14/39] commit validation func and add to main body --- R/openai_batch_api.R | 36 +++++++++++++++++++++++++++++++++++- 1 file changed, 35 insertions(+), 1 deletion(-) diff --git a/R/openai_batch_api.R b/R/openai_batch_api.R index 37b426b..843ba86 100644 --- a/R/openai_batch_api.R +++ b/R/openai_batch_api.R @@ -1,5 +1,4 @@ # the batch - oai_batch_build_embed_req <- function(input, id, model = "text-embedding-3-small", dimensions = NULL, method = "POST", encoding_format = "float", endpoint = "/v1/embeddings") { @@ -33,6 +32,10 @@ oai_batch_prepare_embeddings <- function(df, text_var, id_var, model = "text-emb .texts <- dplyr::pull(df, !!text_sym) .ids <- dplyr::pull(df, !!id_sym) + if (!.validate_batch_inputs(.ids, .texts)) { + return("") + } + reqs <- purrr::map2_chr(.texts, .ids, \(x, y) { oai_batch_build_embed_req( input = x, @@ -106,3 +109,34 @@ oai_batch_list <- function(limit = 20L, after = NULL, key_name = "OPENAI_API_KEY +# internal/helper +.validate_batch_inputs <- function(.ids, .texts, max_requests = 50000) { + n_requests <- length(.texts) + + if (n_requests == 0) { + cli::cli_warn("Input is empty. Returning empty JSONL string.") + return(FALSE) + } + + if (anyDuplicated(.ids)) { + duplicated_ids <- unique(.ids[duplicated(.ids)]) + cli::cli_abort(c( + "custom_id values must be unique within a batch", + "x" = "Found {length(duplicated_ids)} duplicate ID{?s}: {.val {head(duplicated_ids, 3)}}" + )) + } + + if (n_requests > max_requests) { + cli::cli_abort(c( + "OpenAI Batch API supports maximum {max_requests} requests per batch", + "x" = "Attempting to create {n_requests} requests", + "i" = "Consider splitting your data into multiple batches" + )) + } + + if (n_requests > 10000) { + cli::cli_alert_info("Large batch with {n_requests} requests - processing may take significant time") + } + + return(TRUE) +} From ad0e3e2413b9f5f50465732c55e1df71a9a953aa Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Fri, 9 Jan 2026 12:21:41 +0000 Subject: [PATCH 15/39] add batch job creation trigger func once file is uploaded --- R/openai_batch_api.R | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/R/openai_batch_api.R b/R/openai_batch_api.R index 843ba86..9c8a728 100644 --- a/R/openai_batch_api.R +++ b/R/openai_batch_api.R @@ -83,6 +83,33 @@ oai_batch_file_upload <- function(jsonl_rows, key_name = "OPENAI_API_KEY", purpo return(result) } +# batch job management ---- +oai_batch_create <- function(file_id, + endpoint = c("/v1/embeddings", "/v1/chat/completions"), + completion_window = "24h", + metadata = NULL, + key_name = "OPENAI_API_KEY") { + + endpoint <- match.arg(endpoint) + api_key <- get_api_key(key_name) + + body <- list( + input_file_id = file_id, + endpoint = endpoint, + completion_window = completion_window + ) + + if (!is.null(metadata)) { + body$metadata <- metadata + } + + httr2::request("https://api.openai.com/v1/batches") |> + httr2::req_auth_bearer_token(api_key) |> + httr2::req_body_json(body) |> + httr2::req_error(is_error = ~ FALSE) |> + httr2::req_perform() |> + httr2::resp_body_json() +} oai_batch_list <- function(limit = 20L, after = NULL, key_name = "OPENAI_API_KEY") { api_key <- get_api_key(key_name) From e2f86142459f8a315a8f4b42e64513681a8ccff8 Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Fri, 9 Jan 2026 12:22:58 +0000 Subject: [PATCH 16/39] parse embedding reuslts function --- R/openai_batch_api.R | 69 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 69 insertions(+) diff --git a/R/openai_batch_api.R b/R/openai_batch_api.R index 9c8a728..7ed9289 100644 --- a/R/openai_batch_api.R +++ b/R/openai_batch_api.R @@ -130,7 +130,76 @@ oai_batch_list <- function(limit = 20L, after = NULL, key_name = "OPENAI_API_KEY httr2::req_perform() |> httr2::resp_body_json() +oai_batch_parse_embeddings <- function(content, original_df = NULL, id_var = NULL) { + lines <- strsplit(content, "\n")[[1]] + lines <- lines[nchar(lines) > 0] + + if (length(lines) == 0) { + return(tibble::tibble( + custom_id = character(), + .error = logical(), + .error_msg = character() + )) + } + + parsed <- purrr::imap(lines, \(line, idx) { + tryCatch( + jsonlite::fromJSON(line, simplifyVector = FALSE), + error = function(e) { + list( + custom_id = paste0("__PARSE_ERROR_LINE_", idx), + error = list(message = paste("Failed to parse JSONL line", idx, ":", conditionMessage(e))) + ) + } + ) + }) + + results <- purrr::map(parsed, function(item) { + custom_id <- item$custom_id + + if (!is.null(item$error)) { + return(tibble::tibble( + custom_id = custom_id, + .error = TRUE, + .error_msg = item$error$message %||% "Unknown error" + )) + } + + embedding <- purrr::pluck(item, "response", "body", "data", 1, "embedding", + .default = NULL) + + if (is.null(embedding)) { + return(tibble::tibble( + custom_id = custom_id, + .error = TRUE, + .error_msg = "No embedding found in response" + )) + } + + embed_tibble <- embedding |> + as.list() |> + stats::setNames(paste0("V", seq_along(embedding))) |> + tibble::as_tibble() + + tibble::tibble( + custom_id = custom_id, + .error = FALSE, + .error_msg = NA_character_ + ) |> + dplyr::bind_cols(embed_tibble) + }) + + result <- purrr::list_rbind(results) + + if (!is.null(original_df) && !is.null(id_var)) { + id_sym <- rlang::ensym(id_var) + id_col_name <- rlang::as_name(id_sym) + result <- result |> + dplyr::rename(!!id_col_name := custom_id) + } + + return(result) } From 62a9db346c1bd351d5a4f6c2ba05adacf0a9fc40 Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Mon, 12 Jan 2026 10:01:06 +0000 Subject: [PATCH 17/39] Add status function for batches add cancel function for batches --- R/openai_batch_api.R | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/R/openai_batch_api.R b/R/openai_batch_api.R index 7ed9289..ae1ec08 100644 --- a/R/openai_batch_api.R +++ b/R/openai_batch_api.R @@ -110,6 +110,18 @@ oai_batch_create <- function(file_id, httr2::req_perform() |> httr2::resp_body_json() } + +oai_batch_status <- function(batch_id, key_name = "OPENAI_API_KEY") { + + api_key <- get_api_key(key_name) + + httr2::request(paste0("https://api.openai.com/v1/batches/", batch_id)) |> + httr2::req_auth_bearer_token(api_key) |> + httr2::req_error(is_error = ~ FALSE) |> + httr2::req_perform() |> + httr2::resp_body_json() +} + oai_batch_list <- function(limit = 20L, after = NULL, key_name = "OPENAI_API_KEY") { api_key <- get_api_key(key_name) @@ -128,8 +140,19 @@ oai_batch_list <- function(limit = 20L, after = NULL, key_name = "OPENAI_API_KEY httr2::resp_body_json() } +oai_batch_cancel <- function(batch_id, key_name = "OPENAI_API_KEY") { + + api_key <- get_api_key(key_name) + + httr2::request(paste0("https://api.openai.com/v1/batches/", batch_id, "/cancel")) |> + httr2::req_auth_bearer_token(api_key) |> + httr2::req_method("POST") |> + httr2::req_error(is_error = ~ FALSE) |> httr2::req_perform() |> httr2::resp_body_json() +} + + oai_batch_parse_embeddings <- function(content, original_df = NULL, id_var = NULL) { lines <- strsplit(content, "\n")[[1]] From f90e4d9a864a602c530598630efbf762c1b1d295 Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Mon, 12 Jan 2026 16:42:30 +0000 Subject: [PATCH 18/39] Add section headers for navigation prior to documenting and building completions Add start of Roxygen2 docs for Embeddings + some of the batch handlers (will add examples when happy/code is final) --- R/openai_batch_api.R | 33 ++++++++++++++++++++++++++++++--- 1 file changed, 30 insertions(+), 3 deletions(-) diff --git a/R/openai_batch_api.R b/R/openai_batch_api.R index ae1ec08..fbbc662 100644 --- a/R/openai_batch_api.R +++ b/R/openai_batch_api.R @@ -1,4 +1,20 @@ -# the batch +# embed request building ---- +#' @description Create a single OpenAI Batch API - Embedding request +#' +#' This function prepares a single row of data for the OpenAI Batch/Files APIs, where each row should be valid JSON. The APIs do not guarantee the results will be in the same order, so we need to provide an ID with each request. +#' +#' @param input Text input you wish to embed +#' @param id A custom, unique Row ID +#' @param model The embedding model to use +#' @param dimensions Number of embedding dimensions to return +#' @param method The http request type, usually 'POST' +#' @param encoding_format Data type of the embedding values +#' @param endpoint The internal suffix of the endpoint's url e.g. /v1/embeddings +#' +#' @returns a row of JSON +#' +#' @export +#' @examples oai_batch_build_embed_req <- function(input, id, model = "text-embedding-3-small", dimensions = NULL, method = "POST", encoding_format = "float", endpoint = "/v1/embeddings") { @@ -110,7 +126,17 @@ oai_batch_create <- function(file_id, httr2::req_perform() |> httr2::resp_body_json() } - +#' Check the status of a batch job on the OpenAI Batch API +#' +#' +#' +#' @param batch_id Batch Identifier, should start with 'batch_' and is returned by the `oai_create_batch` function +#' @param key_name Name of the API key, usually OPENAI_API_KEY +#' +#' @returns Metadata about an OpenAI Batch API Job, including status, error_file_id, output_file_id, input_file_id etc. +#' +#' @export +#' @examples oai_batch_status <- function(batch_id, key_name = "OPENAI_API_KEY") { api_key <- get_api_key(key_name) @@ -153,6 +179,7 @@ oai_batch_cancel <- function(batch_id, key_name = "OPENAI_API_KEY") { } +# results parsing ---- oai_batch_parse_embeddings <- function(content, original_df = NULL, id_var = NULL) { lines <- strsplit(content, "\n")[[1]] @@ -228,7 +255,7 @@ oai_batch_parse_embeddings <- function(content, original_df = NULL, id_var = NUL -# internal/helper +# internal/helpers ---- .validate_batch_inputs <- function(.ids, .texts, max_requests = 50000) { n_requests <- length(.texts) From f0cc9834ed796a381e5f1ee4b74fd90a11ff7cce Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Tue, 13 Jan 2026 11:22:04 +0000 Subject: [PATCH 19/39] add build completionn req func mainbody --- R/openai_batch_api.R | 45 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 44 insertions(+), 1 deletion(-) diff --git a/R/openai_batch_api.R b/R/openai_batch_api.R index fbbc662..7e6a966 100644 --- a/R/openai_batch_api.R +++ b/R/openai_batch_api.R @@ -68,7 +68,50 @@ oai_batch_prepare_embeddings <- function(df, text_var, id_var, model = "text-emb return(reqs) } - +oai_batch_build_completion_req <- function( + input, + id, + model = "gpt-4o-mini", + system_prompt = NULL, + temperature = 0, + max_tokens = 500L, + schema = NULL, + method = "POST", + endpoint = "/v1/chat/completions") { + + messages <- list() + + if (!is.null(system_prompt)) { + messages <- append(messages, list(list(role = "system", content = system_prompt))) + } + + messages <- append(messages, list(list(role = "user", content = input))) + + body <- list( + model = model, + messages = messages, + temperature = temperature, + max_tokens = max_tokens + ) + + if (!is.null(schema)) { + if (inherits(schema, "json_schema")) { + body$response_format <- json_dump(schema) + } else if (is.list(schema)) { + body$response_format <- schema + } + } + + req_row <- list( + custom_id = as.character(id), + method = method, + url = endpoint, + body = body + ) + + jsonlite::toJSON(req_row, auto_unbox = TRUE) +} + oai_batch_file_upload <- function(jsonl_rows, key_name = "OPENAI_API_KEY", purpose = "batch") { api_key <- get_api_key(key_name) From 9395c0ef8bb2715fbb453e8676f9c60577c4884b Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Tue, 13 Jan 2026 11:22:19 +0000 Subject: [PATCH 20/39] add batch_prepare completions main body --- R/openai_batch_api.R | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/R/openai_batch_api.R b/R/openai_batch_api.R index 7e6a966..f52c6f7 100644 --- a/R/openai_batch_api.R +++ b/R/openai_batch_api.R @@ -112,6 +112,49 @@ oai_batch_build_completion_req <- function( jsonlite::toJSON(req_row, auto_unbox = TRUE) } + oai_batch_prepare_completions <- function( + df, + text_var, + id_var, + model = "gpt-4o-mini", + system_prompt = NULL, + temperature = 0, + max_tokens = 500L, + schema = NULL, + method = "POST", + endpoint = "/v1/chat/completions") { + + text_sym <- rlang::ensym(text_var) + id_sym <- rlang::ensym(id_var) + + .texts <- dplyr::pull(df, !!text_sym) + .ids <- dplyr::pull(df, !!id_sym) + + if (!.validate_batch_inputs(.ids, .texts)) { + return("") + } + + ## pre-process schema once if S7 object to avoid repeated json_dump() calls + if (!is.null(schema) && inherits(schema, "json_schema")) { + schema <- json_dump(schema) + } + + reqs <- purrr::map2_chr(.texts, .ids, \(x, y) { + oai_batch_build_completion_req( + input = x, + id = as.character(y), + model = model, + system_prompt = system_prompt, + temperature = temperature, + max_tokens = max_tokens, + schema = schema, + method = method, + endpoint = endpoint + ) + }) + + return(paste0(reqs, collapse = "\n")) +} oai_batch_file_upload <- function(jsonl_rows, key_name = "OPENAI_API_KEY", purpose = "batch") { api_key <- get_api_key(key_name) From 9c442ef4bf8a84275c02c99a51f4fed328b23e09 Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Tue, 13 Jan 2026 11:22:42 +0000 Subject: [PATCH 21/39] add batch_parse_completions main body --- R/openai_batch_api.R | 66 +++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 62 insertions(+), 4 deletions(-) diff --git a/R/openai_batch_api.R b/R/openai_batch_api.R index f52c6f7..e58679c 100644 --- a/R/openai_batch_api.R +++ b/R/openai_batch_api.R @@ -337,10 +337,68 @@ oai_batch_parse_embeddings <- function(content, original_df = NULL, id_var = NUL return(result) } - - - - +oai_batch_parse_completions <- function(content, original_df = NULL, id_var = NULL) { + + lines <- strsplit(content, "\n")[[1]] + lines <- lines[nchar(lines) > 0] + + if (length(lines) == 0) { + return(tibble::tibble( + custom_id = character(), + content = character(), + .error = logical(), + .error_msg = character() + )) + } + + parsed <- purrr::imap(lines, \(line, idx) { + tryCatch( + jsonlite::fromJSON(line, simplifyVector = FALSE), + error = function(e) { + list( + custom_id = paste0("__PARSE_ERROR_LINE_", idx), + error = list(message = paste("Failed to parse JSONL line", idx, ":", conditionMessage(e))) + ) + } + ) + }) + + results <- purrr::map(parsed, function(item) { + custom_id <- item$custom_id + + if (!is.null(item$error)) { + return(tibble::tibble( + custom_id = custom_id, + content = NA_character_, + .error = TRUE, + .error_msg = item$error$message %||% "Unknown error" + )) + } + + response_content <- purrr::pluck( + item, "response", "body", "choices", 1, "message", "content", + .default = NA_character_ + ) + + tibble::tibble( + custom_id = custom_id, + content = response_content, + .error = FALSE, + .error_msg = NA_character_ + ) + }) + + result <- purrr::list_rbind(results) + + if (!is.null(original_df) && !is.null(id_var)) { + id_sym <- rlang::ensym(id_var) + id_col_name <- rlang::as_name(id_sym) + result <- result |> + dplyr::rename(!!id_col_name := custom_id) + } + + return(result) +} # internal/helpers ---- .validate_batch_inputs <- function(.ids, .texts, max_requests = 50000) { n_requests <- length(.texts) From 1736d1310ed8cb325cc7ae0d4ef5c7dbd4f3457d Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Tue, 13 Jan 2026 11:23:03 +0000 Subject: [PATCH 22/39] add roxygen2 outlines and some of the completed docs for batch functions --- R/openai_batch_api.R | 314 ++++++++++++++++++++++++++++++------------- 1 file changed, 219 insertions(+), 95 deletions(-) diff --git a/R/openai_batch_api.R b/R/openai_batch_api.R index e58679c..0706201 100644 --- a/R/openai_batch_api.R +++ b/R/openai_batch_api.R @@ -17,7 +17,6 @@ #' @examples oai_batch_build_embed_req <- function(input, id, model = "text-embedding-3-small", dimensions = NULL, method = "POST", encoding_format = "float", endpoint = "/v1/embeddings") { - body <- purrr::compact( # use compact so that if dimensions is NULL it gets dropped from the req list( @@ -25,33 +24,54 @@ oai_batch_build_embed_req <- function(input, id, model = "text-embedding-3-small model = model, dimensions = dimensions, encoding_format = encoding_format - )) - + )) + embed_row <- list( custom_id = id, method = method, url = endpoint, body = body ) - + embed_row_json <- jsonlite::toJSON(embed_row, - auto_unbox = TRUE) - + auto_unbox = TRUE) + return(embed_row_json) } - + +#' Prepare a Data Frame for the OpenAI Batch API - Embeddings +#' +#' @details Take an enitre data frame and turn each row into a valid line of JSON ready for a .jsonl file upload to the OpenAI Files API + Batch API job trigger. +#' +#' Each request must have its own ID, as the Batch API makes no guarantees about the order the results will be returned in. +#' +#' To reduce the overall size, and the explanatory power of the Embeddings, you can set dimensions to lower than the default (which vary based on model). +#' +#' @param df A data frame containing texts to embed +#' @param text_var Name of the column containing text to embed +#' @param id_var Name of the column to use as ID +#' @param model OpenAI embedding model to use (default: "text-embedding-3-small") +#' @param dimensions Number of embedding dimensions (NULL uses model default) +#' @param method The http request type, usually 'POST' +#' @param encoding_format Data type of the embedding values +#' @param endpoint The internal suffix of the endpoint's url e.g. /v1/embeddings +#' +#' @returns A list of JSON requests +#' +#' @export +#' @examples oai_batch_prepare_embeddings <- function(df, text_var, id_var, model = "text-embedding-3-small", dimensions = NULL, method = "POST", encoding_format = "float", endpoint = "/v1/embeddings") { - + text_sym <- rlang::ensym(text_var) id_sym <- rlang::ensym(id_var) - + .texts <- dplyr::pull(df, !!text_sym) .ids <- dplyr::pull(df, !!id_sym) - + if (!.validate_batch_inputs(.ids, .texts)) { return("") } - + reqs <- purrr::map2_chr(.texts, .ids, \(x, y) { oai_batch_build_embed_req( input = x, @@ -63,11 +83,29 @@ oai_batch_prepare_embeddings <- function(df, text_var, id_var, model = "text-emb endpoint = endpoint ) }) - + reqs <- paste0(reqs, collapse = "\n") - + return(reqs) } + +# completions request building ---- +#' Title +#' +#' @param input +#' @param id +#' @param model +#' @param system_prompt +#' @param temperature +#' @param max_tokens +#' @param schema +#' @param method +#' @param endpoint +#' +#' @returns +#' +#' @export +#' @examples oai_batch_build_completion_req <- function( input, id, @@ -112,6 +150,23 @@ oai_batch_build_completion_req <- function( jsonlite::toJSON(req_row, auto_unbox = TRUE) } + #' Title + #' + #' @param df + #' @param text_var + #' @param id_var + #' @param model + #' @param system_prompt + #' @param temperature + #' @param max_tokens + #' @param schema + #' @param method + #' @param endpoint + #' + #' @returns + #' + #' @export + #' @examples oai_batch_prepare_completions <- function( df, text_var, @@ -155,63 +210,97 @@ oai_batch_build_completion_req <- function( return(paste0(reqs, collapse = "\n")) } -oai_batch_file_upload <- function(jsonl_rows, key_name = "OPENAI_API_KEY", purpose = "batch") { + - api_key <- get_api_key(key_name) +#' Prepare and upload a file to be uploaded to the OpenAI Batch API +#' +#' +#' +#' @param jsonl_rows Rows of valid JSON, output of a oai_batch_prepare* function +#' @param key_name Name of the API key, usually OPENAI_API_KEY +#' @param purpose Tag, e.g. 'classification', 'batch', 'fine-tuning' +#' +#' @returns Metadata for an upload to the OpenAI Files API +#' +#' @export +#' @seealso `openai_files_api.R` +#' @examples +oai_batch_file_upload <- function(jsonl_rows, key_name = "OPENAI_API_KEY", purpose = "batch") { + +api_key <- get_api_key(key_name) - .tmp <- tempfile(fileext = ".jsonl") - on.exit(unlink(.tmp)) # if session crashes we drop the file from mem safely - writeLines(jsonl_rows, .tmp) # send the content to the temp file for uploading to OAI - # question here is whether to also save this somewhere by force... - # once OAI have the file it's backed up for 30 days. +.tmp <- tempfile(fileext = ".jsonl") +on.exit(unlink(.tmp)) # if session crashes we drop the file from mem safely +writeLines(jsonl_rows, .tmp) # send the content to the temp file for uploading to OAI +# question here is whether to also save this somewhere by force... +# once OAI have the file it's backed up for 30 days. - resp <- httr2::request(base_url = "https://api.openai.com/v1/files") |> - httr2::req_auth_bearer_token(api_key) |> - httr2::req_body_multipart(file = curl::form_file(.tmp), - purpose = purpose) |> - httr2::req_error(is_error = ~ FALSE) |> - httr2::req_perform() +resp <- httr2::request(base_url = "https://api.openai.com/v1/files") |> +httr2::req_auth_bearer_token(api_key) |> +httr2::req_body_multipart(file = curl::form_file(.tmp), +purpose = purpose) |> +httr2::req_error(is_error = ~ FALSE) |> +httr2::req_perform() - result <- httr2::resp_body_json(resp) +result <- httr2::resp_body_json(resp) - if (httr2::resp_status(resp) >= 400) { +if (httr2::resp_status(resp) >= 400) { error_msg <- result$error$message %||% "Unknown error" cli::cli_abort(c( "Failed to upload file to OpenAI Files API", "x" = error_msg )) } - + return(result) } - + # batch job management ---- +#' Trigger a batch job to run on an uploaded file +#' +#' @details Once a file has been uploaded to the OpenAI Files API it's necessary to trigger the batch job. This will ensure that your file is processed, and processing is finalised within the 24 hour guarantee. +#' +#' It's important to choose the right endpoint. If processing should be done by the Completions API, be sure to route to v1/chat/completions, and this must match each row in your uploaded file. +#' +#' Batch Job Ids start with "batch_", you'll receive a warning if you try to check batch status on a Files API file (the Files/Batch API set up is a lil bit clumsy for me) +#' +#' @param file_id Pointer to a file uploaded to the OpenAI API +#' @param endpoint The internal suffix of the endpoint's url e.g. /v1/embeddings +#' @param completion_window Time until the batch should be returned, NOTE: OpenAI makes 24 hour guarantees only. +#' @param metadata Any additional metadata you want to tag the batch with +#' @param key_name Name of the API key, usually OPENAI_API_KEY +#' +#' @returns Metadata about an OpenAI Batch Job Including the batch ID +#' +#' @export +#' @examples oai_batch_create <- function(file_id, - endpoint = c("/v1/embeddings", "/v1/chat/completions"), - completion_window = "24h", - metadata = NULL, - key_name = "OPENAI_API_KEY") { - + endpoint = c("/v1/embeddings", "/v1/chat/completions"), + completion_window = "24h", + metadata = NULL, + key_name = "OPENAI_API_KEY") { + endpoint <- match.arg(endpoint) api_key <- get_api_key(key_name) - + body <- list( input_file_id = file_id, endpoint = endpoint, completion_window = completion_window ) - + if (!is.null(metadata)) { body$metadata <- metadata } - + httr2::request("https://api.openai.com/v1/batches") |> - httr2::req_auth_bearer_token(api_key) |> - httr2::req_body_json(body) |> - httr2::req_error(is_error = ~ FALSE) |> - httr2::req_perform() |> - httr2::resp_body_json() + httr2::req_auth_bearer_token(api_key) |> + httr2::req_body_json(body) |> + httr2::req_error(is_error = ~ FALSE) |> + httr2::req_perform() |> + httr2::resp_body_json() } + #' Check the status of a batch job on the OpenAI Batch API #' #' @@ -224,53 +313,73 @@ oai_batch_create <- function(file_id, #' @export #' @examples oai_batch_status <- function(batch_id, key_name = "OPENAI_API_KEY") { - + api_key <- get_api_key(key_name) - + httr2::request(paste0("https://api.openai.com/v1/batches/", batch_id)) |> - httr2::req_auth_bearer_token(api_key) |> - httr2::req_error(is_error = ~ FALSE) |> - httr2::req_perform() |> - httr2::resp_body_json() + httr2::req_auth_bearer_token(api_key) |> + httr2::req_error(is_error = ~ FALSE) |> + httr2::req_perform() |> + httr2::resp_body_json() } - + +#' Title +#' +#' @param limit +#' @param after +#' @param key_name +#' +#' @returns +#' +#' @export +#' @examples oai_batch_list <- function(limit = 20L, after = NULL, key_name = "OPENAI_API_KEY") { - + api_key <- get_api_key(key_name) - + req <- httr2::request("https://api.openai.com/v1/batches") |> - httr2::req_auth_bearer_token(api_key) |> - httr2::req_url_query(limit = limit) - + httr2::req_auth_bearer_token(api_key) |> + httr2::req_url_query(limit = limit) + if (!is.null(after)) { req <- httr2::req_url_query(req, after = after) } - + req |> - httr2::req_error(is_error = ~ FALSE) |> - httr2::req_perform() |> - httr2::resp_body_json() + httr2::req_error(is_error = ~ FALSE) |> + httr2::req_perform() |> + httr2::resp_body_json() } - + oai_batch_cancel <- function(batch_id, key_name = "OPENAI_API_KEY") { - + api_key <- get_api_key(key_name) - + httr2::request(paste0("https://api.openai.com/v1/batches/", batch_id, "/cancel")) |> - httr2::req_auth_bearer_token(api_key) |> - httr2::req_method("POST") |> - httr2::req_error(is_error = ~ FALSE) |> - httr2::req_perform() |> - httr2::resp_body_json() + httr2::req_auth_bearer_token(api_key) |> + httr2::req_method("POST") |> + httr2::req_error(is_error = ~ FALSE) |> + httr2::req_perform() |> + httr2::resp_body_json() } - - + + # results parsing ---- +#' Title +#' +#' @param content +#' @param original_df +#' @param id_var +#' +#' @returns +#' +#' @export +#' @examples oai_batch_parse_embeddings <- function(content, original_df = NULL, id_var = NULL) { - + lines <- strsplit(content, "\n")[[1]] lines <- lines[nchar(lines) > 0] - + if (length(lines) == 0) { return(tibble::tibble( custom_id = character(), @@ -278,7 +387,7 @@ oai_batch_parse_embeddings <- function(content, original_df = NULL, id_var = NUL .error_msg = character() )) } - + parsed <- purrr::imap(lines, \(line, idx) { tryCatch( jsonlite::fromJSON(line, simplifyVector = FALSE), @@ -290,10 +399,10 @@ oai_batch_parse_embeddings <- function(content, original_df = NULL, id_var = NUL } ) }) - + results <- purrr::map(parsed, function(item) { custom_id <- item$custom_id - + if (!is.null(item$error)) { return(tibble::tibble( custom_id = custom_id, @@ -301,10 +410,9 @@ oai_batch_parse_embeddings <- function(content, original_df = NULL, id_var = NUL .error_msg = item$error$message %||% "Unknown error" )) } - - embedding <- purrr::pluck(item, "response", "body", "data", 1, "embedding", - .default = NULL) - + + embedding <- purrr::pluck(item, "response", "body", "data", 1, "embedding",.default = NULL) + if (is.null(embedding)) { return(tibble::tibble( custom_id = custom_id, @@ -312,31 +420,42 @@ oai_batch_parse_embeddings <- function(content, original_df = NULL, id_var = NUL .error_msg = "No embedding found in response" )) } - + embed_tibble <- embedding |> - as.list() |> - stats::setNames(paste0("V", seq_along(embedding))) |> - tibble::as_tibble() - + as.list() |> + stats::setNames(paste0("V", seq_along(embedding))) |> + tibble::as_tibble() + tibble::tibble( custom_id = custom_id, .error = FALSE, .error_msg = NA_character_ ) |> - dplyr::bind_cols(embed_tibble) + dplyr::bind_cols(embed_tibble) }) - + result <- purrr::list_rbind(results) - + if (!is.null(original_df) && !is.null(id_var)) { id_sym <- rlang::ensym(id_var) id_col_name <- rlang::as_name(id_sym) result <- result |> - dplyr::rename(!!id_col_name := custom_id) + dplyr::rename(!!id_col_name := custom_id) } - + return(result) } + +#' Title +#' +#' @param content +#' @param original_df +#' @param id_var +#' +#' @returns +#' +#' @export +#' @examples oai_batch_parse_completions <- function(content, original_df = NULL, id_var = NULL) { lines <- strsplit(content, "\n")[[1]] @@ -399,15 +518,20 @@ oai_batch_parse_completions <- function(content, original_df = NULL, id_var = NU return(result) } + + # internal/helpers ---- -.validate_batch_inputs <- function(.ids, .texts, max_requests = 50000) { +#' @keywords internal +.validate_batch_inputs <- function(.ids, + .texts, + max_requests = 50000) { n_requests <- length(.texts) - + if (n_requests == 0) { cli::cli_warn("Input is empty. Returning empty JSONL string.") return(FALSE) } - + if (anyDuplicated(.ids)) { duplicated_ids <- unique(.ids[duplicated(.ids)]) cli::cli_abort(c( @@ -415,7 +539,7 @@ oai_batch_parse_completions <- function(content, original_df = NULL, id_var = NU "x" = "Found {length(duplicated_ids)} duplicate ID{?s}: {.val {head(duplicated_ids, 3)}}" )) } - + if (n_requests > max_requests) { cli::cli_abort(c( "OpenAI Batch API supports maximum {max_requests} requests per batch", @@ -423,10 +547,10 @@ oai_batch_parse_completions <- function(content, original_df = NULL, id_var = NU "i" = "Consider splitting your data into multiple batches" )) } - + if (n_requests > 10000) { cli::cli_alert_info("Large batch with {n_requests} requests - processing may take significant time") } - + return(TRUE) -} +} \ No newline at end of file From c543453c845432b9bd93b548b771595e245df9bf Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Tue, 13 Jan 2026 11:24:30 +0000 Subject: [PATCH 23/39] add start of tests for batch API --- tests/testthat/test-oai_batch_api.R | 63 +++++++++++++++++++++++++++-- 1 file changed, 60 insertions(+), 3 deletions(-) diff --git a/tests/testthat/test-oai_batch_api.R b/tests/testthat/test-oai_batch_api.R index 7c88d53..a17e386 100644 --- a/tests/testthat/test-oai_batch_api.R +++ b/tests/testthat/test-oai_batch_api.R @@ -1,6 +1,6 @@ -test_that("oai_batch_build_embed_row creates a row of JSON and responds to its input arguments", { +test_that("oai_batch_build_embed_req creates a row of JSON and responds to its input arguments", { no_dims <- expect_no_error( - oai_batch_build_embed_row( + oai_batch_build_embed_req( "hello", "1234" ) @@ -9,7 +9,7 @@ test_that("oai_batch_build_embed_row creates a row of JSON and responds to its i no_dims_str <- jsonlite::fromJSON(no_dims) with_dims <- expect_no_error( - oai_batch_build_embed_row( + oai_batch_build_embed_req( "hello", "134", dimensions = 124 @@ -25,3 +25,60 @@ test_that("oai_batch_build_embed_row creates a row of JSON and responds to its i expect_equal(no_dims_str$url, "/v1/embeddings") expect_equal(no_dims_str$body$model, "text-embedding-3-small") }) + +test_that("oai_batch_prepare_completions creates valid JSONL", { + test_df <- tibble::tibble( + id = c("a", "b"), + text = c("Hello", "World") + ) + + result <- oai_batch_prepare_completions( + df = test_df, + text_var = text, + id_var = id + ) + + lines <- strsplit(result, "\n")[[1]] + expect_equal(length(lines), 2) + + parsed <- purrr::map(lines, \(x) jsonlite::fromJSON(x, simplifyVector = FALSE)) + expect_equal(parsed[[1]]$custom_id, "a") + expect_equal(parsed[[2]]$custom_id, "b") + expect_equal(parsed[[1]]$body$messages[[1]]$content, "Hello") +}) +test_that("oai_batch_parse_embeddings handles success response", { + mock_content <- '{"custom_id":"1","response":{"body":{"data":[{"embedding":[0.1,0.2,0.3]}]}},"error":null}' + + result <- oai_batch_parse_embeddings(mock_content) + + expect_equal(nrow(result), 1) + expect_equal(result$custom_id, "1") + expect_false(result$.error) + expect_true("V1" %in% names(result)) + expect_equal(result$V1, 0.1) + expect_equal(result$V2, 0.2) + expect_equal(result$V3, 0.3) +}) +test_that("oai_batch_parse_embeddings handles error response", { + mock_content <- '{"custom_id":"1","response":null,"error":{"message":"Rate limit exceeded"}}' + + result <- oai_batch_parse_embeddings(mock_content) + + expect_equal(nrow(result), 1) + expect_true(result$.error) + expect_equal(result$.error_msg, "Rate limit exceeded") +}) + +test_that("oai_batch_parse_embeddings handles multiple rows", { + mock_content <- paste0( + '{"custom_id":"1","response":{"body":{"data":[{"embedding":[0.1,0.2]}]}},"error":null}', + '\n', + '{"custom_id":"2","response":{"body":{"data":[{"embedding":[0.3,0.4]}]}},"error":null}' + ) + + result <- oai_batch_parse_embeddings(mock_content) + + expect_equal(nrow(result), 2) + expect_equal(result$custom_id, c("1", "2")) + expect_equal(result$V1, c(0.1, 0.3)) +}) From ab826ddc823702c0874f4aeae227a0243dcb21f5 Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Tue, 13 Jan 2026 11:25:27 +0000 Subject: [PATCH 24/39] continue batch tests - end embeddings & start completions --- tests/testthat/test-oai_batch_api.R | 70 +++++++++++++++++++++++++++++ 1 file changed, 70 insertions(+) diff --git a/tests/testthat/test-oai_batch_api.R b/tests/testthat/test-oai_batch_api.R index a17e386..429d3be 100644 --- a/tests/testthat/test-oai_batch_api.R +++ b/tests/testthat/test-oai_batch_api.R @@ -26,6 +26,59 @@ test_that("oai_batch_build_embed_req creates a row of JSON and responds to its i expect_equal(no_dims_str$body$model, "text-embedding-3-small") }) +test_that("oai_batch_build_completion_req creates valid JSON structure", { + result <- oai_batch_build_completion_req( + input = "Hello", + id = "test_1", + model = "gpt-4o-mini" + ) + + parsed <- jsonlite::fromJSON(result, simplifyVector = FALSE) + + expect_equal(parsed$custom_id, "test_1") + expect_equal(parsed$method, "POST") + expect_equal(parsed$url, "/v1/chat/completions") + expect_equal(parsed$body$model, "gpt-4o-mini") + expect_equal(length(parsed$body$messages), 1) + expect_equal(parsed$body$messages[[1]]$role, "user") + expect_equal(parsed$body$messages[[1]]$content, "Hello") +}) +test_that("oai_batch_build_completion_req handles system_prompt", { + result <- oai_batch_build_completion_req( + input = "Hello", + id = "test_2", + system_prompt = "You are helpful" + ) + + parsed <- jsonlite::fromJSON(result, simplifyVector = FALSE) + + expect_equal(length(parsed$body$messages), 2) + expect_equal(parsed$body$messages[[1]]$role, "system") + expect_equal(parsed$body$messages[[1]]$content, "You are helpful") + expect_equal(parsed$body$messages[[2]]$role, "user") +}) + +test_that("oai_batch_build_completion_req handles schema as list", { + test_schema <- list( + type = "json_schema", + json_schema = list( + name = "test", + schema = list(type = "object", properties = list(sentiment = list(type = "string"))) + ) + ) + + result <- oai_batch_build_completion_req( + input = "Hello", + id = "test_3", + schema = test_schema + ) + + parsed <- jsonlite::fromJSON(result) + + expect_true("response_format" %in% names(parsed$body)) + expect_equal(parsed$body$response_format$type, "json_schema") +}) + test_that("oai_batch_prepare_completions creates valid JSONL", { test_df <- tibble::tibble( id = c("a", "b"), @@ -82,3 +135,20 @@ test_that("oai_batch_parse_embeddings handles multiple rows", { expect_equal(result$custom_id, c("1", "2")) expect_equal(result$V1, c(0.1, 0.3)) }) +test_that("oai_batch_prepare_embeddings rejects duplicate IDs", { + test_df <- tibble::tibble( + id = c("a", "a", "b"), + text = c("Text 1", "Text 2", "Text 3") + ) + + expect_error( + oai_batch_prepare_embeddings(test_df, text, id), + "custom_id values must be unique" + ) +}) +test_that("oai_batch_parse_embeddings handles empty input", { + result <- oai_batch_parse_embeddings("") + expect_equal(nrow(result), 0) + expect_true("custom_id" %in% names(result)) + expect_true(".error" %in% names(result)) +}) From e60f0da14d24c947a3f8f33547ffdcac00f0e445 Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Tue, 13 Jan 2026 14:40:09 +0000 Subject: [PATCH 25/39] add roxygen2 skeletons to files API functions export batch and files functions update sync v async vignette --- NAMESPACE | 13 ++++++++ R/openai_files_api.R | 30 ++++++++++++++++- man/oai_batch_build_completion_req.Rd | 24 ++++++++++++++ man/oai_batch_create.Rd | 38 ++++++++++++++++++++++ man/oai_batch_file_upload.Rd | 28 ++++++++++++++++ man/oai_batch_list.Rd | 14 ++++++++ man/oai_batch_parse_completions.Rd | 14 ++++++++ man/oai_batch_parse_embeddings.Rd | 14 ++++++++ man/oai_batch_prepare_completions.Rd | 25 ++++++++++++++ man/oai_batch_prepare_embeddings.Rd | 47 +++++++++++++++++++++++++++ man/oai_batch_status.Rd | 19 +++++++++++ man/oai_file_content.Rd | 16 +++++++++ man/oai_file_delete.Rd | 16 +++++++++ man/oai_file_list.Rd | 19 +++++++++++ vignettes/sync_async.Rmd | 34 +++++++++++++++---- 15 files changed, 344 insertions(+), 7 deletions(-) create mode 100644 man/oai_batch_build_completion_req.Rd create mode 100644 man/oai_batch_create.Rd create mode 100644 man/oai_batch_file_upload.Rd create mode 100644 man/oai_batch_list.Rd create mode 100644 man/oai_batch_parse_completions.Rd create mode 100644 man/oai_batch_parse_embeddings.Rd create mode 100644 man/oai_batch_prepare_completions.Rd create mode 100644 man/oai_batch_prepare_embeddings.Rd create mode 100644 man/oai_batch_status.Rd create mode 100644 man/oai_file_content.Rd create mode 100644 man/oai_file_delete.Rd create mode 100644 man/oai_file_list.Rd diff --git a/NAMESPACE b/NAMESPACE index 92850c2..9f3f525 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -22,6 +22,16 @@ export(hf_get_model_max_length) export(hf_perform_request) export(json_dump) export(json_schema) +export(oai_batch_build_completion_req) +export(oai_batch_build_embed_req) +export(oai_batch_create) +export(oai_batch_file_upload) +export(oai_batch_list) +export(oai_batch_parse_completions) +export(oai_batch_parse_embeddings) +export(oai_batch_prepare_completions) +export(oai_batch_prepare_embeddings) +export(oai_batch_status) export(oai_build_completions_request) export(oai_build_completions_request_list) export(oai_build_embedding_request) @@ -32,6 +42,9 @@ export(oai_embed_batch) export(oai_embed_chunks) export(oai_embed_df) export(oai_embed_text) +export(oai_file_content) +export(oai_file_delete) +export(oai_file_list) export(perform_requests_with_strategy) export(process_response) export(safely_from_json) diff --git a/R/openai_files_api.R b/R/openai_files_api.R index 41dd2fd..06ff3bc 100644 --- a/R/openai_files_api.R +++ b/R/openai_files_api.R @@ -1,5 +1,15 @@ -oai_file_list <- function(purpose = "batch", key_name = "OPENAI_API_KEY") { +#' List files available in the OpenAI Files API +#' +#' @param purpose The intended purpose of the uploaded file, one of "batch", "fine-tune", "assistants", "vision", "user_data", "evals" +#' @param key_name The name of your API key, usually "OPENAI_API_KEY" +#' +#' @returns +#' +#' @export +#' @examples +oai_file_list <- function(purpose = c("batch", "fine-tune", "assistants", "vision", "user_data", "evals"), key_name = "OPENAI_API_KEY") { + purpose <- match.arg(purpose) api_key <- get_api_key(key_name) httr2::request("https://api.openai.com/v1/files") |> @@ -11,6 +21,15 @@ oai_file_list <- function(purpose = "batch", key_name = "OPENAI_API_KEY") { } +#' Delete a file from the OpenAI Files API +#' +#' @param file_id ID of the file given by OpenAI +#' @param key_name The name of your API key, usually "OPENAI_API_KEY" +#' +#' @returns +#' +#' @export +#' @examples oai_file_delete <- function(file_id, key_name = "OPENAI_API_KEY") { api_key <- get_api_key(key_name) @@ -23,6 +42,15 @@ oai_file_delete <- function(file_id, key_name = "OPENAI_API_KEY") { httr2::resp_body_json() } +#' Retrieve content from a file on the OpenAI Files API +#' +#' @param file_id ID of the file given by OpenAI +#' @param key_name The name of your API key, usually "OPENAI_API_KEY" +#' +#' @returns +#' +#' @export +#' @examples oai_file_content <- function(file_id, key_name = "OPENAI_API_KEY") { api_key <- get_api_key(key_name) diff --git a/man/oai_batch_build_completion_req.Rd b/man/oai_batch_build_completion_req.Rd new file mode 100644 index 0000000..c9c772c --- /dev/null +++ b/man/oai_batch_build_completion_req.Rd @@ -0,0 +1,24 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/openai_batch_api.R +\name{oai_batch_build_completion_req} +\alias{oai_batch_build_completion_req} +\title{Title} +\usage{ +oai_batch_build_completion_req( + input, + id, + model = "gpt-4o-mini", + system_prompt = NULL, + temperature = 0, + max_tokens = 500L, + schema = NULL, + method = "POST", + endpoint = "/v1/chat/completions" +) +} +\arguments{ +\item{endpoint}{} +} +\description{ +Title +} diff --git a/man/oai_batch_create.Rd b/man/oai_batch_create.Rd new file mode 100644 index 0000000..c0ea276 --- /dev/null +++ b/man/oai_batch_create.Rd @@ -0,0 +1,38 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/openai_batch_api.R +\name{oai_batch_create} +\alias{oai_batch_create} +\title{Trigger a batch job to run on an uploaded file} +\usage{ +oai_batch_create( + file_id, + endpoint = c("/v1/embeddings", "/v1/chat/completions"), + completion_window = "24h", + metadata = NULL, + key_name = "OPENAI_API_KEY" +) +} +\arguments{ +\item{file_id}{Pointer to a file uploaded to the OpenAI API} + +\item{endpoint}{The internal suffix of the endpoint's url e.g. /v1/embeddings} + +\item{completion_window}{Time until the batch should be returned, NOTE: OpenAI makes 24 hour guarantees only.} + +\item{metadata}{Any additional metadata you want to tag the batch with} + +\item{key_name}{Name of the API key, usually OPENAI_API_KEY} +} +\value{ +Metadata about an OpenAI Batch Job Including the batch ID +} +\description{ +Trigger a batch job to run on an uploaded file +} +\details{ +Once a file has been uploaded to the OpenAI Files API it's necessary to trigger the batch job. This will ensure that your file is processed, and processing is finalised within the 24 hour guarantee. + +It's important to choose the right endpoint. If processing should be done by the Completions API, be sure to route to v1/chat/completions, and this must match each row in your uploaded file. + +Batch Job Ids start with "batch_", you'll receive a warning if you try to check batch status on a Files API file (the Files/Batch API set up is a lil bit clumsy for me) +} diff --git a/man/oai_batch_file_upload.Rd b/man/oai_batch_file_upload.Rd new file mode 100644 index 0000000..d6481dc --- /dev/null +++ b/man/oai_batch_file_upload.Rd @@ -0,0 +1,28 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/openai_batch_api.R +\name{oai_batch_file_upload} +\alias{oai_batch_file_upload} +\title{Prepare and upload a file to be uploaded to the OpenAI Batch API} +\usage{ +oai_batch_file_upload( + jsonl_rows, + key_name = "OPENAI_API_KEY", + purpose = "batch" +) +} +\arguments{ +\item{jsonl_rows}{Rows of valid JSON, output of a oai_batch_prepare* function} + +\item{key_name}{Name of the API key, usually OPENAI_API_KEY} + +\item{purpose}{Tag, e.g. 'classification', 'batch', 'fine-tuning'} +} +\value{ +Metadata for an upload to the OpenAI Files API +} +\description{ +Prepare and upload a file to be uploaded to the OpenAI Batch API +} +\seealso{ +\code{openai_files_api.R} +} diff --git a/man/oai_batch_list.Rd b/man/oai_batch_list.Rd new file mode 100644 index 0000000..a75e16d --- /dev/null +++ b/man/oai_batch_list.Rd @@ -0,0 +1,14 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/openai_batch_api.R +\name{oai_batch_list} +\alias{oai_batch_list} +\title{Title} +\usage{ +oai_batch_list(limit = 20L, after = NULL, key_name = "OPENAI_API_KEY") +} +\arguments{ +\item{key_name}{} +} +\description{ +Title +} diff --git a/man/oai_batch_parse_completions.Rd b/man/oai_batch_parse_completions.Rd new file mode 100644 index 0000000..f845482 --- /dev/null +++ b/man/oai_batch_parse_completions.Rd @@ -0,0 +1,14 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/openai_batch_api.R +\name{oai_batch_parse_completions} +\alias{oai_batch_parse_completions} +\title{Title} +\usage{ +oai_batch_parse_completions(content, original_df = NULL, id_var = NULL) +} +\arguments{ +\item{id_var}{} +} +\description{ +Title +} diff --git a/man/oai_batch_parse_embeddings.Rd b/man/oai_batch_parse_embeddings.Rd new file mode 100644 index 0000000..80fb96b --- /dev/null +++ b/man/oai_batch_parse_embeddings.Rd @@ -0,0 +1,14 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/openai_batch_api.R +\name{oai_batch_parse_embeddings} +\alias{oai_batch_parse_embeddings} +\title{Title} +\usage{ +oai_batch_parse_embeddings(content, original_df = NULL, id_var = NULL) +} +\arguments{ +\item{id_var}{} +} +\description{ +Title +} diff --git a/man/oai_batch_prepare_completions.Rd b/man/oai_batch_prepare_completions.Rd new file mode 100644 index 0000000..dd428c1 --- /dev/null +++ b/man/oai_batch_prepare_completions.Rd @@ -0,0 +1,25 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/openai_batch_api.R +\name{oai_batch_prepare_completions} +\alias{oai_batch_prepare_completions} +\title{Title} +\usage{ +oai_batch_prepare_completions( + df, + text_var, + id_var, + model = "gpt-4o-mini", + system_prompt = NULL, + temperature = 0, + max_tokens = 500L, + schema = NULL, + method = "POST", + endpoint = "/v1/chat/completions" +) +} +\arguments{ +\item{endpoint}{} +} +\description{ +Title +} diff --git a/man/oai_batch_prepare_embeddings.Rd b/man/oai_batch_prepare_embeddings.Rd new file mode 100644 index 0000000..132e4ac --- /dev/null +++ b/man/oai_batch_prepare_embeddings.Rd @@ -0,0 +1,47 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/openai_batch_api.R +\name{oai_batch_prepare_embeddings} +\alias{oai_batch_prepare_embeddings} +\title{Prepare a Data Frame for the OpenAI Batch API - Embeddings} +\usage{ +oai_batch_prepare_embeddings( + df, + text_var, + id_var, + model = "text-embedding-3-small", + dimensions = NULL, + method = "POST", + encoding_format = "float", + endpoint = "/v1/embeddings" +) +} +\arguments{ +\item{df}{A data frame containing texts to embed} + +\item{text_var}{Name of the column containing text to embed} + +\item{id_var}{Name of the column to use as ID} + +\item{model}{OpenAI embedding model to use (default: "text-embedding-3-small")} + +\item{dimensions}{Number of embedding dimensions (NULL uses model default)} + +\item{method}{The http request type, usually 'POST'} + +\item{encoding_format}{Data type of the embedding values} + +\item{endpoint}{The internal suffix of the endpoint's url e.g. /v1/embeddings} +} +\value{ +A list of JSON requests +} +\description{ +Prepare a Data Frame for the OpenAI Batch API - Embeddings +} +\details{ +Take an enitre data frame and turn each row into a valid line of JSON ready for a .jsonl file upload to the OpenAI Files API + Batch API job trigger. + +Each request must have its own ID, as the Batch API makes no guarantees about the order the results will be returned in. + +To reduce the overall size, and the explanatory power of the Embeddings, you can set dimensions to lower than the default (which vary based on model). +} diff --git a/man/oai_batch_status.Rd b/man/oai_batch_status.Rd new file mode 100644 index 0000000..3490b9a --- /dev/null +++ b/man/oai_batch_status.Rd @@ -0,0 +1,19 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/openai_batch_api.R +\name{oai_batch_status} +\alias{oai_batch_status} +\title{Check the status of a batch job on the OpenAI Batch API} +\usage{ +oai_batch_status(batch_id, key_name = "OPENAI_API_KEY") +} +\arguments{ +\item{batch_id}{Batch Identifier, should start with 'batch_' and is returned by the \code{oai_create_batch} function} + +\item{key_name}{Name of the API key, usually OPENAI_API_KEY} +} +\value{ +Metadata about an OpenAI Batch API Job, including status, error_file_id, output_file_id, input_file_id etc. +} +\description{ +Check the status of a batch job on the OpenAI Batch API +} diff --git a/man/oai_file_content.Rd b/man/oai_file_content.Rd new file mode 100644 index 0000000..8d7239e --- /dev/null +++ b/man/oai_file_content.Rd @@ -0,0 +1,16 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/openai_files_api.R +\name{oai_file_content} +\alias{oai_file_content} +\title{Retrieve content from a file on the OpenAI Files API} +\usage{ +oai_file_content(file_id, key_name = "OPENAI_API_KEY") +} +\arguments{ +\item{file_id}{ID of the file given by OpenAI} + +\item{key_name}{The name of your API key, usually "OPENAI_API_KEY"} +} +\description{ +Retrieve content from a file on the OpenAI Files API +} diff --git a/man/oai_file_delete.Rd b/man/oai_file_delete.Rd new file mode 100644 index 0000000..3e3234c --- /dev/null +++ b/man/oai_file_delete.Rd @@ -0,0 +1,16 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/openai_files_api.R +\name{oai_file_delete} +\alias{oai_file_delete} +\title{Delete a file from the OpenAI Files API} +\usage{ +oai_file_delete(file_id, key_name = "OPENAI_API_KEY") +} +\arguments{ +\item{file_id}{ID of the file given by OpenAI} + +\item{key_name}{The name of your API key, usually "OPENAI_API_KEY"} +} +\description{ +Delete a file from the OpenAI Files API +} diff --git a/man/oai_file_list.Rd b/man/oai_file_list.Rd new file mode 100644 index 0000000..56b5128 --- /dev/null +++ b/man/oai_file_list.Rd @@ -0,0 +1,19 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/openai_files_api.R +\name{oai_file_list} +\alias{oai_file_list} +\title{List files available in the OpenAI Files API} +\usage{ +oai_file_list( + purpose = c("batch", "fine-tune", "assistants", "vision", "user_data", "evals"), + key_name = "OPENAI_API_KEY" +) +} +\arguments{ +\item{purpose}{The intended purpose of the uploaded file, one of "batch", "fine-tune", "assistants", "vision", "user_data", "evals"} + +\item{key_name}{The name of your API key, usually "OPENAI_API_KEY"} +} +\description{ +List files available in the OpenAI Files API +} diff --git a/vignettes/sync_async.Rmd b/vignettes/sync_async.Rmd index 40c25d6..ac8f191 100644 --- a/vignettes/sync_async.Rmd +++ b/vignettes/sync_async.Rmd @@ -14,6 +14,14 @@ knitr::opts_chunk$set( ) ``` +TODOS: + +- [ ] Quick start code when it's all ready +- [ ] OpenAI Batch API & Files API +- [ ] Batch API for embeddings +- [ ] Batch API for completions + + ```{r setup} library(EndpointR) ``` @@ -24,9 +32,12 @@ TODO: Code samples when the functions etc. are up and running. # Introduction -Most of EndpointR's integrations are with synchronous APIs such as [Completions](https://platform.openai.com/docs/api-reference/completions) by OpenAI, Hugging Face's [Inference Endpoints](https://huggingface.co/docs/inference-endpoints/en/index), and Messages by [Anthropic](https://platform.claude.com/docs/en/api/messages). When using these APIs, we send a request, wait a second or two and receive a response. +Most of EndpointR's integrations are with synchronous APIs such as [Completions](https://platform.openai.com/docs/api-reference/completions) by OpenAI, Hugging Face's [Inference Endpoints](https://huggingface.co/docs/inference-endpoints/en/index), and Messages by [Anthropic](https://platform.claude.com/docs/en/api/messages). When using these APIs, we send a request, wait a second or two and receive a response. + +For many use-cases the synchronous APIs work just fine. But often as data scientists we need to do the same thing to thousands, or millions of rows of data. Hammering the provider's servers in such cases is inefficient for us and them. Plus, we probably don't want to sit around with a locked R session for 5 hours as our results get returned to us. + +Most Generative AI providers also offer lower-cost, asynchronous APIs. The providers usually offer a guarantee of the results within a time frame, and an estimate of the average time to return the results. For example, they may guarantee results within 24 hours, but expect them within 1-3 hours. -However, most Generative AI providers also offer lower-cost, asynchronous APIs. The providers usually offer a guarantee of the results within a time frame, and an estimate of the average time to return the results. For example, they may guarantee results within 24 hours, but expect them within 1-3 hours. # When to choose Synchronous vs Asynchronous @@ -50,11 +61,21 @@ Batch processing jobs are often helpful in use cases like: +# Files API + +In order to use OpenAI's Batch API, we need to upload files to the Files API. Luckily this process is quite simple, but do keep in mind that to successfully run a batch job of embeddings, you will need to work with three APIs: + +- Embeddings API +- Batch API +- Files API + +Fortunately, the Batch and Files API dovetail quite well, and the same mental models will be useful for both. + # EndpointR Implementation of OpenAI Batch API Due to inherent differences between Synchronous and Asynchronous APIs, the EndpointR implementation of the OpenAI Batch API will feel more like submitting jobs to a cluster/server than automagically working with an entire data frame as in `oai_complete_df()` and `oai_embed_df()`. As such, different functions and workflows are needed. -You will likely want to use the Batch API for both embeddings and completions, so we have a separate function to prepare batches for each one: +You will likely want to use the Batch API for both embeddings and completions at separate times and with distinct arguments, so we have a separate function to prepare batches for each one: - `oai_batch_prepare_embeddings()` - `oai_batch_prepare_completions()` @@ -63,9 +84,9 @@ Each function expects a data frame as input: `oai_batch_prepare_embeddings()` wi > **NOTE:** For structured outputs the Batch API requires us to send the JSON schema with each request. Complex schemas will quickly lead to large file size, perhaps eclipsing the 200 MB limit. -EndpointR prepares each batch, writes it to a file in temporary storage, and then sends the file to the OpenAI Files API. Once in the Files API, EndpointR can trigger the batch to run. +EndpointR prepares each batch, writes it to a file in temporary storage, and then sends the file to the OpenAI Files API. Once in the Files API where it will receive a file ID and some other metadata. EndpointR can pass the file ID to the Batch API and trigger a batch job to run. Once running, the batch job's status can be checked and in the end we'll receive information on where to find the results in the **Files API**. -Each line of of the .jsonl file should form a self-contained request. And rather than routing to the endpoint's URL we route to a stub. For reference, the entire batch gets sent to its own, full URL. +Whether using the Batch API for embeddings or chat completions, each line of of the .jsonl file must form a self-contained request with a unique identifier. Example for embeddings (no structured output!): @@ -90,4 +111,5 @@ Prettify'd version: } ``` -> **NOTE:** The Embeddings API expects the input in an 'input' field rather than 'messages' as in the Completions API, and the batch requests must adhere to this. +> **NOTE:** The Embeddings API expects the input in an 'input' field rather than 'messages' as in the Completions API, and batch requests must adhere to this. + From 09a7e50aae187c28955430ff95e0dfdc2d027c86 Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Tue, 13 Jan 2026 14:46:43 +0000 Subject: [PATCH 26/39] add batch and file functions to _pkgdown.yml in sections --- NAMESPACE | 3 +- R/openai_batch_api.R | 50 +++++++------------ _pkgdown.yml | 23 +++++++++ ....Rd => oai_batch_build_completions_req.Rd} | 6 +-- man/oai_batch_build_embed_req.Rd | 37 ++++++++++++++ man/oai_batch_cancel.Rd | 14 ++++++ man/oai_batch_parse_completions.Rd | 4 +- man/oai_batch_parse_embeddings.Rd | 4 +- 8 files changed, 100 insertions(+), 41 deletions(-) rename man/{oai_batch_build_completion_req.Rd => oai_batch_build_completions_req.Rd} (75%) create mode 100644 man/oai_batch_build_embed_req.Rd create mode 100644 man/oai_batch_cancel.Rd diff --git a/NAMESPACE b/NAMESPACE index 9f3f525..69a710f 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -22,8 +22,9 @@ export(hf_get_model_max_length) export(hf_perform_request) export(json_dump) export(json_schema) -export(oai_batch_build_completion_req) +export(oai_batch_build_completions_req) export(oai_batch_build_embed_req) +export(oai_batch_cancel) export(oai_batch_create) export(oai_batch_file_upload) export(oai_batch_list) diff --git a/R/openai_batch_api.R b/R/openai_batch_api.R index 0706201..5d3a276 100644 --- a/R/openai_batch_api.R +++ b/R/openai_batch_api.R @@ -1,5 +1,5 @@ # embed request building ---- -#' @description Create a single OpenAI Batch API - Embedding request +#' Create a single OpenAI Batch API - Embedding request #' #' This function prepares a single row of data for the OpenAI Batch/Files APIs, where each row should be valid JSON. The APIs do not guarantee the results will be in the same order, so we need to provide an ID with each request. #' @@ -89,7 +89,7 @@ oai_batch_prepare_embeddings <- function(df, text_var, id_var, model = "text-emb return(reqs) } -# completions request building ---- + #' Title #' #' @param input @@ -106,16 +106,7 @@ oai_batch_prepare_embeddings <- function(df, text_var, id_var, model = "text-emb #' #' @export #' @examples -oai_batch_build_completion_req <- function( - input, - id, - model = "gpt-4o-mini", - system_prompt = NULL, - temperature = 0, - max_tokens = 500L, - schema = NULL, - method = "POST", - endpoint = "/v1/chat/completions") { +oai_batch_build_completions_req <- function(input, id, model = "gpt-4o-mini", system_prompt = NULL, temperature = 0, max_tokens = 500L, schema = NULL, method = "POST", endpoint = "/v1/chat/completions") { messages <- list() @@ -167,17 +158,7 @@ oai_batch_build_completion_req <- function( #' #' @export #' @examples - oai_batch_prepare_completions <- function( - df, - text_var, - id_var, - model = "gpt-4o-mini", - system_prompt = NULL, - temperature = 0, - max_tokens = 500L, - schema = NULL, - method = "POST", - endpoint = "/v1/chat/completions") { + oai_batch_prepare_completions <- function(df, text_var, id_var, model = "gpt-4o-mini", system_prompt = NULL, temperature = 0, max_tokens = 500L, schema = NULL, method = "POST", endpoint = "/v1/chat/completions") { text_sym <- rlang::ensym(text_var) id_sym <- rlang::ensym(id_var) @@ -274,11 +255,7 @@ if (httr2::resp_status(resp) >= 400) { #' #' @export #' @examples -oai_batch_create <- function(file_id, - endpoint = c("/v1/embeddings", "/v1/chat/completions"), - completion_window = "24h", - metadata = NULL, - key_name = "OPENAI_API_KEY") { +oai_batch_create <- function(file_id, endpoint = c("/v1/embeddings", "/v1/chat/completions"), completion_window = "24h", metadata = NULL, key_name = "OPENAI_API_KEY") { endpoint <- match.arg(endpoint) api_key <- get_api_key(key_name) @@ -351,6 +328,15 @@ oai_batch_list <- function(limit = 20L, after = NULL, key_name = "OPENAI_API_KEY httr2::resp_body_json() } +#' Cancel a running batch job on the OpenAI Batch API +#' +#' @param batch_id +#' @param key_name +#' +#' @returns +#' +#' @export +#' @examples oai_batch_cancel <- function(batch_id, key_name = "OPENAI_API_KEY") { api_key <- get_api_key(key_name) @@ -365,7 +351,7 @@ oai_batch_cancel <- function(batch_id, key_name = "OPENAI_API_KEY") { # results parsing ---- -#' Title +#' Parse an embeddings batch job into a data frame #' #' @param content #' @param original_df @@ -446,7 +432,7 @@ oai_batch_parse_embeddings <- function(content, original_df = NULL, id_var = NUL return(result) } -#' Title +#' Parse a completions batch job into a data frame #' #' @param content #' @param original_df @@ -522,9 +508,7 @@ oai_batch_parse_completions <- function(content, original_df = NULL, id_var = NU # internal/helpers ---- #' @keywords internal -.validate_batch_inputs <- function(.ids, - .texts, - max_requests = 50000) { +.validate_batch_inputs <- function(.ids, .texts, max_requests = 50000) { n_requests <- length(.texts) if (n_requests == 0) { diff --git a/_pkgdown.yml b/_pkgdown.yml index 551ee6f..65fc5fa 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -126,6 +126,29 @@ reference: - json_dump - validate_response +- title: "OpenAI Files API" + desc: "Functions for uploading and managing files on OpenAI's Files API" + contents: + - oai_file_list + - oai_file_delete + - oai_file_content + +- title: "OpenAI Batch API" + desc: "Functions for managing batches on OpenAI's Batch API" + contents: + - oai_batch_file_upload + - oai_batch_create + - oai_batch_status + - oai_batch_list + - oai_batch_cancel + - oai_batch_build_embed_req + - oai_batch_prepare_embeddings + - oai_batch_parse_embeddings + - oai_batch_build_completions_req + - oai_batch_prepare_completions + - oai_batch_parse_completions + + - title: "Schema Builders" desc: "Helper functions for creating different types of JSON schema properties" contents: diff --git a/man/oai_batch_build_completion_req.Rd b/man/oai_batch_build_completions_req.Rd similarity index 75% rename from man/oai_batch_build_completion_req.Rd rename to man/oai_batch_build_completions_req.Rd index c9c772c..1c3a40a 100644 --- a/man/oai_batch_build_completion_req.Rd +++ b/man/oai_batch_build_completions_req.Rd @@ -1,10 +1,10 @@ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/openai_batch_api.R -\name{oai_batch_build_completion_req} -\alias{oai_batch_build_completion_req} +\name{oai_batch_build_completions_req} +\alias{oai_batch_build_completions_req} \title{Title} \usage{ -oai_batch_build_completion_req( +oai_batch_build_completions_req( input, id, model = "gpt-4o-mini", diff --git a/man/oai_batch_build_embed_req.Rd b/man/oai_batch_build_embed_req.Rd new file mode 100644 index 0000000..6d34d6f --- /dev/null +++ b/man/oai_batch_build_embed_req.Rd @@ -0,0 +1,37 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/openai_batch_api.R +\name{oai_batch_build_embed_req} +\alias{oai_batch_build_embed_req} +\title{Create a single OpenAI Batch API - Embedding request} +\usage{ +oai_batch_build_embed_req( + input, + id, + model = "text-embedding-3-small", + dimensions = NULL, + method = "POST", + encoding_format = "float", + endpoint = "/v1/embeddings" +) +} +\arguments{ +\item{input}{Text input you wish to embed} + +\item{id}{A custom, unique Row ID} + +\item{model}{The embedding model to use} + +\item{dimensions}{Number of embedding dimensions to return} + +\item{method}{The http request type, usually 'POST'} + +\item{encoding_format}{Data type of the embedding values} + +\item{endpoint}{The internal suffix of the endpoint's url e.g. /v1/embeddings} +} +\value{ +a row of JSON +} +\description{ +This function prepares a single row of data for the OpenAI Batch/Files APIs, where each row should be valid JSON. The APIs do not guarantee the results will be in the same order, so we need to provide an ID with each request. +} diff --git a/man/oai_batch_cancel.Rd b/man/oai_batch_cancel.Rd new file mode 100644 index 0000000..a4b7da1 --- /dev/null +++ b/man/oai_batch_cancel.Rd @@ -0,0 +1,14 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/openai_batch_api.R +\name{oai_batch_cancel} +\alias{oai_batch_cancel} +\title{Cancel a running batch job on the OpenAI Batch API} +\usage{ +oai_batch_cancel(batch_id, key_name = "OPENAI_API_KEY") +} +\arguments{ +\item{key_name}{} +} +\description{ +Cancel a running batch job on the OpenAI Batch API +} diff --git a/man/oai_batch_parse_completions.Rd b/man/oai_batch_parse_completions.Rd index f845482..a2ba7d8 100644 --- a/man/oai_batch_parse_completions.Rd +++ b/man/oai_batch_parse_completions.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/openai_batch_api.R \name{oai_batch_parse_completions} \alias{oai_batch_parse_completions} -\title{Title} +\title{Parse a completions batch job into a data frame} \usage{ oai_batch_parse_completions(content, original_df = NULL, id_var = NULL) } @@ -10,5 +10,5 @@ oai_batch_parse_completions(content, original_df = NULL, id_var = NULL) \item{id_var}{} } \description{ -Title +Parse a completions batch job into a data frame } diff --git a/man/oai_batch_parse_embeddings.Rd b/man/oai_batch_parse_embeddings.Rd index 80fb96b..2d55a65 100644 --- a/man/oai_batch_parse_embeddings.Rd +++ b/man/oai_batch_parse_embeddings.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/openai_batch_api.R \name{oai_batch_parse_embeddings} \alias{oai_batch_parse_embeddings} -\title{Title} +\title{Parse an embeddings batch job into a data frame} \usage{ oai_batch_parse_embeddings(content, original_df = NULL, id_var = NULL) } @@ -10,5 +10,5 @@ oai_batch_parse_embeddings(content, original_df = NULL, id_var = NULL) \item{id_var}{} } \description{ -Title +Parse an embeddings batch job into a data frame } From 7680953ec2c7d84452feeea0d439c7fe4aee4a55 Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Tue, 13 Jan 2026 15:12:28 +0000 Subject: [PATCH 27/39] fix broken link in version releases for news/pkgdown.yml --- NEWS.md | 2 +- _pkgdown.yml | 8 ++++++-- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/NEWS.md b/NEWS.md index 6944178..999bb8f 100644 --- a/NEWS.md +++ b/NEWS.md @@ -2,7 +2,7 @@ - OpenAI Batch API for Embeddings and Completions -# EndpointR 0.2 +# EndpointR 0.2.0 - error message and status propagation improvement. Now writes .error, .error_msg (standardised across package), and .status. Main change is preventing httr2 eating the errors before we can deal with them - adds parquet writing to oai_complete_df and oai_embed_df diff --git a/_pkgdown.yml b/_pkgdown.yml index 65fc5fa..9add69b 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -184,6 +184,8 @@ reference: authors: Jack Penzer: href: https://github.com/jpcompartir + Claude: + href: https://claude.ai repo: url: @@ -195,8 +197,10 @@ development: news: releases: - - text: "Version 0.2" - href: nrews/index.html#endpointr-012 + - text: "Version 0.2.1" + href: news/index.html#endpointr-021 + - text: "Version 0.2.0" + href: news/index.html#endpointr-020 - text: "Version 0.1.2" href: news/index.html#endpointr-012 - text: "Version 0.1.1" From 69bd2adf24b62cb641089547cfde2281125d731c Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Tue, 13 Jan 2026 15:51:45 +0000 Subject: [PATCH 28/39] standardise param names and descriptions in openai_batch_api.R lean on @inheritParams for brevity/redude duplication --- R/openai_batch_api.R | 253 +++++++++++++++++-------- man/oai_batch_build_completions_req.Rd | 37 +++- man/oai_batch_build_embed_req.Rd | 17 +- man/oai_batch_cancel.Rd | 19 +- man/oai_batch_create.Rd | 20 +- man/oai_batch_file_upload.Rd | 19 +- man/oai_batch_list.Rd | 22 ++- man/oai_batch_parse_completions.Rd | 28 ++- man/oai_batch_parse_embeddings.Rd | 29 ++- man/oai_batch_prepare_completions.Rd | 47 ++++- man/oai_batch_prepare_embeddings.Rd | 23 ++- man/oai_batch_status.Rd | 15 +- 12 files changed, 409 insertions(+), 120 deletions(-) diff --git a/R/openai_batch_api.R b/R/openai_batch_api.R index 5d3a276..fd27502 100644 --- a/R/openai_batch_api.R +++ b/R/openai_batch_api.R @@ -3,18 +3,23 @@ #' #' This function prepares a single row of data for the OpenAI Batch/Files APIs, where each row should be valid JSON. The APIs do not guarantee the results will be in the same order, so we need to provide an ID with each request. #' -#' @param input Text input you wish to embed -#' @param id A custom, unique Row ID +#' @param input Text input to embed +#' @param id A custom, unique row ID #' @param model The embedding model to use -#' @param dimensions Number of embedding dimensions to return -#' @param method The http request type, usually 'POST' +#' @param dimensions Number of embedding dimensions (NULL uses model default) +#' @param method The HTTP request type, usually 'POST' #' @param encoding_format Data type of the embedding values -#' @param endpoint The internal suffix of the endpoint's url e.g. /v1/embeddings +#' @param endpoint The API endpoint path, e.g. /v1/embeddings #' #' @returns a row of JSON #' #' @export #' @examples +#' \dontrun{ +#' text <- "embed_me" +#' id <- "id_1" +#' batch_req <- oai_batch_build_embed_req(text, id) +#' } oai_batch_build_embed_req <- function(input, id, model = "text-embedding-3-small", dimensions = NULL, method = "POST", encoding_format = "float", endpoint = "/v1/embeddings") { body <- purrr::compact( @@ -41,25 +46,28 @@ oai_batch_build_embed_req <- function(input, id, model = "text-embedding-3-small #' Prepare a Data Frame for the OpenAI Batch API - Embeddings #' -#' @details Take an enitre data frame and turn each row into a valid line of JSON ready for a .jsonl file upload to the OpenAI Files API + Batch API job trigger. +#' @details Takes an entire data frame and turns each row into a valid line of JSON ready for a .jsonl file upload to the OpenAI Files API + Batch API job trigger. #' #' Each request must have its own ID, as the Batch API makes no guarantees about the order the results will be returned in. #' #' To reduce the overall size, and the explanatory power of the Embeddings, you can set dimensions to lower than the default (which vary based on model). #' -#' @param df A data frame containing texts to embed -#' @param text_var Name of the column containing text to embed -#' @param id_var Name of the column to use as ID -#' @param model OpenAI embedding model to use (default: "text-embedding-3-small") -#' @param dimensions Number of embedding dimensions (NULL uses model default) -#' @param method The http request type, usually 'POST' -#' @param encoding_format Data type of the embedding values -#' @param endpoint The internal suffix of the endpoint's url e.g. /v1/embeddings +#' @param df A data frame containing text to process +#' @param text_var Name of the column containing input text +#' @param id_var Name of the column to use as row ID +#' @inheritParams oai_batch_build_embed_req #' #' @returns A list of JSON requests #' #' @export #' @examples +#' \dontrun{ +#' df <- data.frame( +#' id = c("doc_1", "doc_2", "doc_3"), +#' text = c("Hello world", "Embedding text", "Another document") +#' ) +#' jsonl_content <- oai_batch_prepare_embeddings(df, text_var = text, id_var = id) +#' } oai_batch_prepare_embeddings <- function(df, text_var, id_var, model = "text-embedding-3-small", dimensions = NULL, method = "POST", encoding_format = "float", endpoint = "/v1/embeddings") { text_sym <- rlang::ensym(text_var) @@ -90,22 +98,34 @@ oai_batch_prepare_embeddings <- function(df, text_var, id_var, model = "text-emb } -#' Title +#' Create a Single OpenAI Batch API - Chat Completions Request #' -#' @param input -#' @param id -#' @param model -#' @param system_prompt -#' @param temperature -#' @param max_tokens -#' @param schema -#' @param method -#' @param endpoint +#' This function prepares a single row of data for the OpenAI Batch/Files APIs, +#' where each row should be valid JSON. The APIs do not guarantee the results +#' will be in the same order, so we need to provide an ID with each request. #' -#' @returns +#' @param input Text input (user message) for the completion +#' @param id A custom, unique row ID +#' @param model The chat completion model to use +#' @param system_prompt Optional system prompt to guide the model's behaviour +#' @param temperature Sampling temperature (0 = deterministic, higher = more random) +#' @param max_tokens Maximum number of tokens to generate +#' @param schema Optional JSON schema for structured output (json_schema object or list) +#' @param method The HTTP request type, usually 'POST' +#' @param endpoint The API endpoint path, e.g. /v1/chat/completions +#' +#' @returns A row of JSON suitable for the Batch API #' #' @export #' @examples +#' \dontrun{ +#' req <- oai_batch_build_completions_req( +#' input = "What is the capital of France?", +#' id = "query_1", +#' model = "gpt-4o-mini", +#' temperature = 0 +#' ) +#' } oai_batch_build_completions_req <- function(input, id, model = "gpt-4o-mini", system_prompt = NULL, temperature = 0, max_tokens = 500L, schema = NULL, method = "POST", endpoint = "/v1/chat/completions") { messages <- list() @@ -140,25 +160,38 @@ oai_batch_build_completions_req <- function(input, id, model = "gpt-4o-mini", sy jsonlite::toJSON(req_row, auto_unbox = TRUE) } - - #' Title - #' - #' @param df - #' @param text_var - #' @param id_var - #' @param model - #' @param system_prompt - #' @param temperature - #' @param max_tokens - #' @param schema - #' @param method - #' @param endpoint - #' - #' @returns - #' - #' @export - #' @examples - oai_batch_prepare_completions <- function(df, text_var, id_var, model = "gpt-4o-mini", system_prompt = NULL, temperature = 0, max_tokens = 500L, schema = NULL, method = "POST", endpoint = "/v1/chat/completions") { + +#' Prepare a Data Frame for the OpenAI Batch API - Chat Completions +#' +#' @description Takes an entire data frame and turns each row into a valid line +#' of JSON ready for a .jsonl file upload to the OpenAI Files API + Batch API +#' job trigger. +#' +#' @details Each request must have its own ID, as the Batch API makes no +#' guarantees about the order the results will be returned in. +#' +#' @param df A data frame containing text to process +#' @param text_var Name of the column containing input text +#' @param id_var Name of the column to use as row ID +#' @inheritParams oai_batch_build_completions_req +#' +#' @returns A character string of newline-separated JSON requests +#' +#' @export +#' @examples +#' \dontrun{ +#' df <- data.frame( +#' id = c("q1", "q2"), +#' prompt = c("What is 2+2?", "Explain gravity briefly.") +#' ) +#' jsonl_content <- oai_batch_prepare_completions( +#' df, +#' text_var = prompt, +#' id_var = id, +#' system_prompt = "You are a helpful assistant." +#' ) +#' } +oai_batch_prepare_completions <- function(df, text_var, id_var, model = "gpt-4o-mini", system_prompt = NULL, temperature = 0, max_tokens = 500L, schema = NULL, method = "POST", endpoint = "/v1/chat/completions") { text_sym <- rlang::ensym(text_var) id_sym <- rlang::ensym(id_var) @@ -197,15 +230,24 @@ oai_batch_build_completions_req <- function(input, id, model = "gpt-4o-mini", sy #' #' #' -#' @param jsonl_rows Rows of valid JSON, output of a oai_batch_prepare* function -#' @param key_name Name of the API key, usually OPENAI_API_KEY -#' @param purpose Tag, e.g. 'classification', 'batch', 'fine-tuning' +#' @param jsonl_rows Rows of valid JSON, output of an oai_batch_prepare* function +#' @param key_name Name of the environment variable containing your API key +#' @param purpose File purpose tag, e.g. 'batch', 'fine-tune' #' #' @returns Metadata for an upload to the OpenAI Files API #' #' @export -#' @seealso `openai_files_api.R` +#' @seealso `oai_files_upload()`, `oai_files_list()` #' @examples +#' \dontrun{ +#' df <- data.frame( +#' id = c("doc_1", "doc_2"), +#' text = c("Hello world", "Goodbye world") +#' ) +#' jsonl_content <- oai_batch_prepare_embeddings(df, text_var = text, id_var = id) +#' file_info <- oai_batch_file_upload(jsonl_content) +#' file_info$id # Use this ID to create a batch job +#' } oai_batch_file_upload <- function(jsonl_rows, key_name = "OPENAI_API_KEY", purpose = "batch") { api_key <- get_api_key(key_name) @@ -245,16 +287,24 @@ if (httr2::resp_status(resp) >= 400) { #' #' Batch Job Ids start with "batch_", you'll receive a warning if you try to check batch status on a Files API file (the Files/Batch API set up is a lil bit clumsy for me) #' -#' @param file_id Pointer to a file uploaded to the OpenAI API -#' @param endpoint The internal suffix of the endpoint's url e.g. /v1/embeddings -#' @param completion_window Time until the batch should be returned, NOTE: OpenAI makes 24 hour guarantees only. -#' @param metadata Any additional metadata you want to tag the batch with -#' @param key_name Name of the API key, usually OPENAI_API_KEY +#' @param file_id File ID returned by oai_batch_file_upload() +#' @param endpoint The API endpoint path, e.g. /v1/embeddings +#' @param completion_window Time window for batch completion (OpenAI guarantees 24h only) +#' @param metadata Optional list of metadata to tag the batch with +#' @inheritParams oai_batch_file_upload #' #' @returns Metadata about an OpenAI Batch Job Including the batch ID #' #' @export #' @examples +#' \dontrun{ +#' # After uploading a file with oai_batch_file_upload() +#' batch_job <- oai_batch_create( +#' file_id = "file-abc123", +#' endpoint = "/v1/embeddings" +#' ) +#' batch_job$id # Use this to check status later +#' } oai_batch_create <- function(file_id, endpoint = c("/v1/embeddings", "/v1/chat/completions"), completion_window = "24h", metadata = NULL, key_name = "OPENAI_API_KEY") { endpoint <- match.arg(endpoint) @@ -278,17 +328,20 @@ oai_batch_create <- function(file_id, endpoint = c("/v1/embeddings", "/v1/chat/c httr2::resp_body_json() } -#' Check the status of a batch job on the OpenAI Batch API -#' -#' +#' Check the Status of a Batch Job on the OpenAI Batch API #' -#' @param batch_id Batch Identifier, should start with 'batch_' and is returned by the `oai_create_batch` function -#' @param key_name Name of the API key, usually OPENAI_API_KEY +#' @param batch_id Batch identifier (starts with 'batch_'), returned by oai_batch_create() +#' @inheritParams oai_batch_file_upload #' #' @returns Metadata about an OpenAI Batch API Job, including status, error_file_id, output_file_id, input_file_id etc. #' #' @export #' @examples +#' \dontrun{ +#' status <- oai_batch_status("batch_abc123") +#' status$status # e.g., "completed", "in_progress", "failed" +#' status$output_file_id # File ID for results when completed +#' } oai_batch_status <- function(batch_id, key_name = "OPENAI_API_KEY") { api_key <- get_api_key(key_name) @@ -300,16 +353,25 @@ oai_batch_status <- function(batch_id, key_name = "OPENAI_API_KEY") { httr2::resp_body_json() } -#' Title +#' List Batch Jobs on the OpenAI Batch API +#' +#' Retrieve a paginated list of batch jobs associated with your API key. #' -#' @param limit -#' @param after -#' @param key_name +#' @param limit Maximum number of batch jobs to return +#' @param after Cursor for pagination; batch ID to start after +#' @inheritParams oai_batch_file_upload #' -#' @returns +#' @returns A list containing batch job metadata and pagination information #' #' @export #' @examples +#' \dontrun{ +#' # List recent batch jobs +#' batches <- oai_batch_list(limit = 10) +#' +#' # Paginate through results +#' next_page <- oai_batch_list(after = batches$last_id) +#' } oai_batch_list <- function(limit = 20L, after = NULL, key_name = "OPENAI_API_KEY") { api_key <- get_api_key(key_name) @@ -328,15 +390,23 @@ oai_batch_list <- function(limit = 20L, after = NULL, key_name = "OPENAI_API_KEY httr2::resp_body_json() } -#' Cancel a running batch job on the OpenAI Batch API +#' Cancel a Running Batch Job on the OpenAI Batch API #' -#' @param batch_id -#' @param key_name +#' Cancels an in-progress batch job. The batch will stop processing new +#' requests, but requests already being processed may still complete. #' -#' @returns +#' @inheritParams oai_batch_status +#' @inheritParams oai_batch_file_upload +#' +#' @returns Metadata about the cancelled batch job #' #' @export #' @examples +#' \dontrun{ +#' # Cancel a batch job that's taking too long +#' cancelled <- oai_batch_cancel("batch_abc123") +#' cancelled$status # Will be "cancelling" or "cancelled" +#' } oai_batch_cancel <- function(batch_id, key_name = "OPENAI_API_KEY") { api_key <- get_api_key(key_name) @@ -351,16 +421,32 @@ oai_batch_cancel <- function(batch_id, key_name = "OPENAI_API_KEY") { # results parsing ---- -#' Parse an embeddings batch job into a data frame +#' Parse an Embeddings Batch Job into a Data Frame +#' +#' Parses the JSONL content returned from a completed embeddings batch job +#' and converts it into a tidy data frame with one row per embedding. #' -#' @param content -#' @param original_df -#' @param id_var +#' @param content Character string of JSONL content from the batch output file +#' @param original_df Optional original data frame to rename custom_id column +#' @param id_var If original_df provided, the column name to rename custom_id to #' -#' @returns +#' @returns A tibble with custom_id (or renamed), .error, .error_msg, and +#' embedding dimensions (V1, V2, ..., Vn) #' #' @export #' @examples +#' \dontrun{ +#' # After downloading batch results with oai_files_content() +#' content <- oai_files_content(status$output_file_id) +#' embeddings_df <- oai_batch_parse_embeddings(content) +#' +#' # Optionally rename the ID column to match original data +#' embeddings_df <- oai_batch_parse_embeddings( +#' content, +#' original_df = my_df, +#' id_var = doc_id +#' ) +#' } oai_batch_parse_embeddings <- function(content, original_df = NULL, id_var = NULL) { lines <- strsplit(content, "\n")[[1]] @@ -432,16 +518,29 @@ oai_batch_parse_embeddings <- function(content, original_df = NULL, id_var = NUL return(result) } -#' Parse a completions batch job into a data frame +#' Parse a Completions Batch Job into a Data Frame #' -#' @param content -#' @param original_df -#' @param id_var +#' Parses the JSONL content returned from a completed chat completions batch +#' job and converts it into a tidy data frame with one row per response. #' -#' @returns +#' @inheritParams oai_batch_parse_embeddings +#' +#' @returns A tibble with custom_id (or renamed), content, .error, and .error_msg #' #' @export #' @examples +#' \dontrun{ +#' # After downloading batch results with oai_files_content() +#' content <- oai_files_content(status$output_file_id) +#' completions_df <- oai_batch_parse_completions(content) +#' +#' # Optionally rename the ID column to match original data +#' completions_df <- oai_batch_parse_completions( +#' content, +#' original_df = my_df, +#' id_var = query_id +#' ) +#' } oai_batch_parse_completions <- function(content, original_df = NULL, id_var = NULL) { lines <- strsplit(content, "\n")[[1]] diff --git a/man/oai_batch_build_completions_req.Rd b/man/oai_batch_build_completions_req.Rd index 1c3a40a..15bb799 100644 --- a/man/oai_batch_build_completions_req.Rd +++ b/man/oai_batch_build_completions_req.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/openai_batch_api.R \name{oai_batch_build_completions_req} \alias{oai_batch_build_completions_req} -\title{Title} +\title{Create a Single OpenAI Batch API - Chat Completions Request} \usage{ oai_batch_build_completions_req( input, @@ -17,8 +17,39 @@ oai_batch_build_completions_req( ) } \arguments{ -\item{endpoint}{} +\item{input}{Text input (user message) for the completion} + +\item{id}{A custom, unique row ID} + +\item{model}{The chat completion model to use} + +\item{system_prompt}{Optional system prompt to guide the model's behaviour} + +\item{temperature}{Sampling temperature (0 = deterministic, higher = more random)} + +\item{max_tokens}{Maximum number of tokens to generate} + +\item{schema}{Optional JSON schema for structured output (json_schema object or list)} + +\item{method}{The HTTP request type, usually 'POST'} + +\item{endpoint}{The API endpoint path, e.g. /v1/chat/completions} +} +\value{ +A row of JSON suitable for the Batch API } \description{ -Title +This function prepares a single row of data for the OpenAI Batch/Files APIs, +where each row should be valid JSON. The APIs do not guarantee the results +will be in the same order, so we need to provide an ID with each request. +} +\examples{ +\dontrun{ +req <- oai_batch_build_completions_req( + input = "What is the capital of France?", + id = "query_1", + model = "gpt-4o-mini", + temperature = 0 +) +} } diff --git a/man/oai_batch_build_embed_req.Rd b/man/oai_batch_build_embed_req.Rd index 6d34d6f..d30b31b 100644 --- a/man/oai_batch_build_embed_req.Rd +++ b/man/oai_batch_build_embed_req.Rd @@ -15,19 +15,19 @@ oai_batch_build_embed_req( ) } \arguments{ -\item{input}{Text input you wish to embed} +\item{input}{Text input to embed} -\item{id}{A custom, unique Row ID} +\item{id}{A custom, unique row ID} \item{model}{The embedding model to use} -\item{dimensions}{Number of embedding dimensions to return} +\item{dimensions}{Number of embedding dimensions (NULL uses model default)} -\item{method}{The http request type, usually 'POST'} +\item{method}{The HTTP request type, usually 'POST'} \item{encoding_format}{Data type of the embedding values} -\item{endpoint}{The internal suffix of the endpoint's url e.g. /v1/embeddings} +\item{endpoint}{The API endpoint path, e.g. /v1/embeddings} } \value{ a row of JSON @@ -35,3 +35,10 @@ a row of JSON \description{ This function prepares a single row of data for the OpenAI Batch/Files APIs, where each row should be valid JSON. The APIs do not guarantee the results will be in the same order, so we need to provide an ID with each request. } +\examples{ +\dontrun{ +text <- "embed_me" +id <- "id_1" +batch_req <- oai_batch_build_embed_req(text, id) +} +} diff --git a/man/oai_batch_cancel.Rd b/man/oai_batch_cancel.Rd index a4b7da1..20b0a7b 100644 --- a/man/oai_batch_cancel.Rd +++ b/man/oai_batch_cancel.Rd @@ -2,13 +2,26 @@ % Please edit documentation in R/openai_batch_api.R \name{oai_batch_cancel} \alias{oai_batch_cancel} -\title{Cancel a running batch job on the OpenAI Batch API} +\title{Cancel a Running Batch Job on the OpenAI Batch API} \usage{ oai_batch_cancel(batch_id, key_name = "OPENAI_API_KEY") } \arguments{ -\item{key_name}{} +\item{batch_id}{Batch identifier (starts with 'batch_'), returned by oai_batch_create()} + +\item{key_name}{Name of the environment variable containing your API key} +} +\value{ +Metadata about the cancelled batch job } \description{ -Cancel a running batch job on the OpenAI Batch API +Cancels an in-progress batch job. The batch will stop processing new +requests, but requests already being processed may still complete. +} +\examples{ +\dontrun{ +# Cancel a batch job that's taking too long +cancelled <- oai_batch_cancel("batch_abc123") +cancelled$status # Will be "cancelling" or "cancelled" +} } diff --git a/man/oai_batch_create.Rd b/man/oai_batch_create.Rd index c0ea276..3061b3b 100644 --- a/man/oai_batch_create.Rd +++ b/man/oai_batch_create.Rd @@ -13,15 +13,15 @@ oai_batch_create( ) } \arguments{ -\item{file_id}{Pointer to a file uploaded to the OpenAI API} +\item{file_id}{File ID returned by oai_batch_file_upload()} -\item{endpoint}{The internal suffix of the endpoint's url e.g. /v1/embeddings} +\item{endpoint}{The API endpoint path, e.g. /v1/embeddings} -\item{completion_window}{Time until the batch should be returned, NOTE: OpenAI makes 24 hour guarantees only.} +\item{completion_window}{Time window for batch completion (OpenAI guarantees 24h only)} -\item{metadata}{Any additional metadata you want to tag the batch with} +\item{metadata}{Optional list of metadata to tag the batch with} -\item{key_name}{Name of the API key, usually OPENAI_API_KEY} +\item{key_name}{Name of the environment variable containing your API key} } \value{ Metadata about an OpenAI Batch Job Including the batch ID @@ -36,3 +36,13 @@ It's important to choose the right endpoint. If processing should be done by the Batch Job Ids start with "batch_", you'll receive a warning if you try to check batch status on a Files API file (the Files/Batch API set up is a lil bit clumsy for me) } +\examples{ +\dontrun{ +# After uploading a file with oai_batch_file_upload() +batch_job <- oai_batch_create( + file_id = "file-abc123", + endpoint = "/v1/embeddings" +) +batch_job$id # Use this to check status later +} +} diff --git a/man/oai_batch_file_upload.Rd b/man/oai_batch_file_upload.Rd index d6481dc..221c0f6 100644 --- a/man/oai_batch_file_upload.Rd +++ b/man/oai_batch_file_upload.Rd @@ -11,11 +11,11 @@ oai_batch_file_upload( ) } \arguments{ -\item{jsonl_rows}{Rows of valid JSON, output of a oai_batch_prepare* function} +\item{jsonl_rows}{Rows of valid JSON, output of an oai_batch_prepare* function} -\item{key_name}{Name of the API key, usually OPENAI_API_KEY} +\item{key_name}{Name of the environment variable containing your API key} -\item{purpose}{Tag, e.g. 'classification', 'batch', 'fine-tuning'} +\item{purpose}{File purpose tag, e.g. 'batch', 'fine-tune'} } \value{ Metadata for an upload to the OpenAI Files API @@ -23,6 +23,17 @@ Metadata for an upload to the OpenAI Files API \description{ Prepare and upload a file to be uploaded to the OpenAI Batch API } +\examples{ +\dontrun{ +df <- data.frame( + id = c("doc_1", "doc_2"), + text = c("Hello world", "Goodbye world") +) +jsonl_content <- oai_batch_prepare_embeddings(df, text_var = text, id_var = id) +file_info <- oai_batch_file_upload(jsonl_content) +file_info$id # Use this ID to create a batch job +} +} \seealso{ -\code{openai_files_api.R} +\code{oai_files_upload()}, \code{oai_files_list()} } diff --git a/man/oai_batch_list.Rd b/man/oai_batch_list.Rd index a75e16d..38f87e7 100644 --- a/man/oai_batch_list.Rd +++ b/man/oai_batch_list.Rd @@ -2,13 +2,29 @@ % Please edit documentation in R/openai_batch_api.R \name{oai_batch_list} \alias{oai_batch_list} -\title{Title} +\title{List Batch Jobs on the OpenAI Batch API} \usage{ oai_batch_list(limit = 20L, after = NULL, key_name = "OPENAI_API_KEY") } \arguments{ -\item{key_name}{} +\item{limit}{Maximum number of batch jobs to return} + +\item{after}{Cursor for pagination; batch ID to start after} + +\item{key_name}{Name of the environment variable containing your API key} +} +\value{ +A list containing batch job metadata and pagination information } \description{ -Title +Retrieve a paginated list of batch jobs associated with your API key. +} +\examples{ +\dontrun{ +# List recent batch jobs +batches <- oai_batch_list(limit = 10) + +# Paginate through results +next_page <- oai_batch_list(after = batches$last_id) +} } diff --git a/man/oai_batch_parse_completions.Rd b/man/oai_batch_parse_completions.Rd index a2ba7d8..e4aa101 100644 --- a/man/oai_batch_parse_completions.Rd +++ b/man/oai_batch_parse_completions.Rd @@ -2,13 +2,35 @@ % Please edit documentation in R/openai_batch_api.R \name{oai_batch_parse_completions} \alias{oai_batch_parse_completions} -\title{Parse a completions batch job into a data frame} +\title{Parse a Completions Batch Job into a Data Frame} \usage{ oai_batch_parse_completions(content, original_df = NULL, id_var = NULL) } \arguments{ -\item{id_var}{} +\item{content}{Character string of JSONL content from the batch output file} + +\item{original_df}{Optional original data frame to rename custom_id column} + +\item{id_var}{If original_df provided, the column name to rename custom_id to} +} +\value{ +A tibble with custom_id (or renamed), content, .error, and .error_msg } \description{ -Parse a completions batch job into a data frame +Parses the JSONL content returned from a completed chat completions batch +job and converts it into a tidy data frame with one row per response. +} +\examples{ +\dontrun{ +# After downloading batch results with oai_files_content() +content <- oai_files_content(status$output_file_id) +completions_df <- oai_batch_parse_completions(content) + +# Optionally rename the ID column to match original data +completions_df <- oai_batch_parse_completions( + content, + original_df = my_df, + id_var = query_id +) +} } diff --git a/man/oai_batch_parse_embeddings.Rd b/man/oai_batch_parse_embeddings.Rd index 2d55a65..a977c88 100644 --- a/man/oai_batch_parse_embeddings.Rd +++ b/man/oai_batch_parse_embeddings.Rd @@ -2,13 +2,36 @@ % Please edit documentation in R/openai_batch_api.R \name{oai_batch_parse_embeddings} \alias{oai_batch_parse_embeddings} -\title{Parse an embeddings batch job into a data frame} +\title{Parse an Embeddings Batch Job into a Data Frame} \usage{ oai_batch_parse_embeddings(content, original_df = NULL, id_var = NULL) } \arguments{ -\item{id_var}{} +\item{content}{Character string of JSONL content from the batch output file} + +\item{original_df}{Optional original data frame to rename custom_id column} + +\item{id_var}{If original_df provided, the column name to rename custom_id to} +} +\value{ +A tibble with custom_id (or renamed), .error, .error_msg, and +embedding dimensions (V1, V2, ..., Vn) } \description{ -Parse an embeddings batch job into a data frame +Parses the JSONL content returned from a completed embeddings batch job +and converts it into a tidy data frame with one row per embedding. +} +\examples{ +\dontrun{ +# After downloading batch results with oai_files_content() +content <- oai_files_content(status$output_file_id) +embeddings_df <- oai_batch_parse_embeddings(content) + +# Optionally rename the ID column to match original data +embeddings_df <- oai_batch_parse_embeddings( + content, + original_df = my_df, + id_var = doc_id +) +} } diff --git a/man/oai_batch_prepare_completions.Rd b/man/oai_batch_prepare_completions.Rd index dd428c1..98bc0fd 100644 --- a/man/oai_batch_prepare_completions.Rd +++ b/man/oai_batch_prepare_completions.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/openai_batch_api.R \name{oai_batch_prepare_completions} \alias{oai_batch_prepare_completions} -\title{Title} +\title{Prepare a Data Frame for the OpenAI Batch API - Chat Completions} \usage{ oai_batch_prepare_completions( df, @@ -18,8 +18,49 @@ oai_batch_prepare_completions( ) } \arguments{ -\item{endpoint}{} +\item{df}{A data frame containing text to process} + +\item{text_var}{Name of the column containing input text} + +\item{id_var}{Name of the column to use as row ID} + +\item{model}{The chat completion model to use} + +\item{system_prompt}{Optional system prompt to guide the model's behaviour} + +\item{temperature}{Sampling temperature (0 = deterministic, higher = more random)} + +\item{max_tokens}{Maximum number of tokens to generate} + +\item{schema}{Optional JSON schema for structured output (json_schema object or list)} + +\item{method}{The HTTP request type, usually 'POST'} + +\item{endpoint}{The API endpoint path, e.g. /v1/chat/completions} +} +\value{ +A character string of newline-separated JSON requests } \description{ -Title +Takes an entire data frame and turns each row into a valid line +of JSON ready for a .jsonl file upload to the OpenAI Files API + Batch API +job trigger. +} +\details{ +Each request must have its own ID, as the Batch API makes no +guarantees about the order the results will be returned in. +} +\examples{ +\dontrun{ +df <- data.frame( + id = c("q1", "q2"), + prompt = c("What is 2+2?", "Explain gravity briefly.") +) +jsonl_content <- oai_batch_prepare_completions( + df, + text_var = prompt, + id_var = id, + system_prompt = "You are a helpful assistant." +) +} } diff --git a/man/oai_batch_prepare_embeddings.Rd b/man/oai_batch_prepare_embeddings.Rd index 132e4ac..045768b 100644 --- a/man/oai_batch_prepare_embeddings.Rd +++ b/man/oai_batch_prepare_embeddings.Rd @@ -16,21 +16,21 @@ oai_batch_prepare_embeddings( ) } \arguments{ -\item{df}{A data frame containing texts to embed} +\item{df}{A data frame containing text to process} -\item{text_var}{Name of the column containing text to embed} +\item{text_var}{Name of the column containing input text} -\item{id_var}{Name of the column to use as ID} +\item{id_var}{Name of the column to use as row ID} -\item{model}{OpenAI embedding model to use (default: "text-embedding-3-small")} +\item{model}{The embedding model to use} \item{dimensions}{Number of embedding dimensions (NULL uses model default)} -\item{method}{The http request type, usually 'POST'} +\item{method}{The HTTP request type, usually 'POST'} \item{encoding_format}{Data type of the embedding values} -\item{endpoint}{The internal suffix of the endpoint's url e.g. /v1/embeddings} +\item{endpoint}{The API endpoint path, e.g. /v1/embeddings} } \value{ A list of JSON requests @@ -39,9 +39,18 @@ A list of JSON requests Prepare a Data Frame for the OpenAI Batch API - Embeddings } \details{ -Take an enitre data frame and turn each row into a valid line of JSON ready for a .jsonl file upload to the OpenAI Files API + Batch API job trigger. +Takes an entire data frame and turns each row into a valid line of JSON ready for a .jsonl file upload to the OpenAI Files API + Batch API job trigger. Each request must have its own ID, as the Batch API makes no guarantees about the order the results will be returned in. To reduce the overall size, and the explanatory power of the Embeddings, you can set dimensions to lower than the default (which vary based on model). } +\examples{ +\dontrun{ +df <- data.frame( + id = c("doc_1", "doc_2", "doc_3"), + text = c("Hello world", "Embedding text", "Another document") +) +jsonl_content <- oai_batch_prepare_embeddings(df, text_var = text, id_var = id) +} +} diff --git a/man/oai_batch_status.Rd b/man/oai_batch_status.Rd index 3490b9a..0988a21 100644 --- a/man/oai_batch_status.Rd +++ b/man/oai_batch_status.Rd @@ -2,18 +2,25 @@ % Please edit documentation in R/openai_batch_api.R \name{oai_batch_status} \alias{oai_batch_status} -\title{Check the status of a batch job on the OpenAI Batch API} +\title{Check the Status of a Batch Job on the OpenAI Batch API} \usage{ oai_batch_status(batch_id, key_name = "OPENAI_API_KEY") } \arguments{ -\item{batch_id}{Batch Identifier, should start with 'batch_' and is returned by the \code{oai_create_batch} function} +\item{batch_id}{Batch identifier (starts with 'batch_'), returned by oai_batch_create()} -\item{key_name}{Name of the API key, usually OPENAI_API_KEY} +\item{key_name}{Name of the environment variable containing your API key} } \value{ Metadata about an OpenAI Batch API Job, including status, error_file_id, output_file_id, input_file_id etc. } \description{ -Check the status of a batch job on the OpenAI Batch API +Check the Status of a Batch Job on the OpenAI Batch API +} +\examples{ +\dontrun{ +status <- oai_batch_status("batch_abc123") +status$status # e.g., "completed", "in_progress", "failed" +status$output_file_id # File ID for results when completed +} } From e18e5adbe3be386c67229e5cc565e0bf4758211f Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Tue, 13 Jan 2026 16:48:21 +0000 Subject: [PATCH 29/39] add roxygen2 docs and examples to openai_files_api docs export relevant functions --- R/openai_files_api.R | 74 ++++++++++++++++++++++++++++++++++------- man/oai_file_content.Rd | 31 ++++++++++++++--- man/oai_file_delete.Rd | 27 ++++++++++++--- man/oai_file_list.Rd | 31 ++++++++++++++--- 4 files changed, 139 insertions(+), 24 deletions(-) diff --git a/R/openai_files_api.R b/R/openai_files_api.R index 06ff3bc..afe7953 100644 --- a/R/openai_files_api.R +++ b/R/openai_files_api.R @@ -1,12 +1,30 @@ -#' List files available in the OpenAI Files API +#' List Files on the OpenAI Files API #' -#' @param purpose The intended purpose of the uploaded file, one of "batch", "fine-tune", "assistants", "vision", "user_data", "evals" -#' @param key_name The name of your API key, usually "OPENAI_API_KEY" +#' Retrieve a list of files that have been uploaded to the OpenAI Files API, +#' filtered by purpose. Files are retained for 30 days after upload. #' -#' @returns +#' @param purpose The intended purpose of the uploaded file. Must be one of +#' "batch", "fine-tune", "assistants", "vision", "user_data", or "evals". +#' @param key_name Name of the environment variable containing your API key +#' +#' @returns A list containing file metadata and pagination information. Each +#' file entry includes id, filename, purpose, bytes, created_at, and status. #' #' @export +#' @seealso [oai_file_content()] to retrieve file contents, +#' [oai_file_delete()] to remove files, +#' [oai_batch_file_upload()] to upload batch files #' @examples +#' \dontrun{ +#' # List all batch files +#' batch_files <- oai_file_list(purpose = "batch") +#' +#' # List fine-tuning files +#' ft_files <- oai_file_list(purpose = "fine-tune") +#' +#' # Access file IDs +#' file_ids <- purrr::map_chr(batch_files$data, "id") +#' } oai_file_list <- function(purpose = c("batch", "fine-tune", "assistants", "vision", "user_data", "evals"), key_name = "OPENAI_API_KEY") { purpose <- match.arg(purpose) @@ -21,15 +39,29 @@ oai_file_list <- function(purpose = c("batch", "fine-tune", "assistants", "visio } -#' Delete a file from the OpenAI Files API +#' Delete a File from the OpenAI Files API +#' +#' Permanently deletes a file from the OpenAI Files API. This action cannot +#' be undone. Note that files associated with active batch jobs cannot be +#' deleted until the job completes. #' -#' @param file_id ID of the file given by OpenAI -#' @param key_name The name of your API key, usually "OPENAI_API_KEY" +#' @param file_id File identifier (starts with 'file-'), returned by +#' [oai_batch_file_upload()] or [oai_file_list()] +#' @param key_name Name of the environment variable containing your API key #' -#' @returns +#' @returns A list containing the file id, object type, and deletion status +#' (deleted = TRUE/FALSE) #' #' @export +#' @seealso [oai_file_list()] to find file IDs, +#' [oai_file_content()] to retrieve file contents before deletion #' @examples +#' \dontrun{ +#' # Delete a specific file +#' result <- oai_file_delete("file-abc123") +#' result$deleted # TRUE if successful +#' +#' } oai_file_delete <- function(file_id, key_name = "OPENAI_API_KEY") { api_key <- get_api_key(key_name) @@ -42,15 +74,33 @@ oai_file_delete <- function(file_id, key_name = "OPENAI_API_KEY") { httr2::resp_body_json() } -#' Retrieve content from a file on the OpenAI Files API +#' Retrieve Content from a File on the OpenAI Files API +#' +#' Downloads and returns the content of a file stored on the OpenAI Files API. +#' For batch job outputs, this returns JSONL content that can be parsed with +#' [oai_batch_parse_embeddings()] or [oai_batch_parse_completions()]. #' -#' @param file_id ID of the file given by OpenAI -#' @param key_name The name of your API key, usually "OPENAI_API_KEY" +#' @param file_id File identifier (starts with 'file-'), typically the +#' output_file_id from [oai_batch_status()] +#' @param key_name Name of the environment variable containing your API key #' -#' @returns +#' @returns A character string containing the file contents. For batch outputs, +#' this is JSONL format (one JSON object per line). #' #' @export +#' @seealso [oai_batch_status()] to get output_file_id from completed batches, +#' [oai_batch_parse_embeddings()] and [oai_batch_parse_completions()] to +#' parse batch results #' @examples +#' \dontrun{ +#' # Get batch job status and download results +#' status <- oai_batch_status("batch_abc123") +#' +#' if (status$status == "completed") { +#' content <- oai_file_content(status$output_file_id) +#' results <- oai_batch_parse_embeddings(content) +#' } +#' } oai_file_content <- function(file_id, key_name = "OPENAI_API_KEY") { api_key <- get_api_key(key_name) diff --git a/man/oai_file_content.Rd b/man/oai_file_content.Rd index 8d7239e..471c658 100644 --- a/man/oai_file_content.Rd +++ b/man/oai_file_content.Rd @@ -2,15 +2,38 @@ % Please edit documentation in R/openai_files_api.R \name{oai_file_content} \alias{oai_file_content} -\title{Retrieve content from a file on the OpenAI Files API} +\title{Retrieve Content from a File on the OpenAI Files API} \usage{ oai_file_content(file_id, key_name = "OPENAI_API_KEY") } \arguments{ -\item{file_id}{ID of the file given by OpenAI} +\item{file_id}{File identifier (starts with 'file-'), typically the +output_file_id from \code{\link[=oai_batch_status]{oai_batch_status()}}} -\item{key_name}{The name of your API key, usually "OPENAI_API_KEY"} +\item{key_name}{Name of the environment variable containing your API key} +} +\value{ +A character string containing the file contents. For batch outputs, +this is JSONL format (one JSON object per line). } \description{ -Retrieve content from a file on the OpenAI Files API +Downloads and returns the content of a file stored on the OpenAI Files API. +For batch job outputs, this returns JSONL content that can be parsed with +\code{\link[=oai_batch_parse_embeddings]{oai_batch_parse_embeddings()}} or \code{\link[=oai_batch_parse_completions]{oai_batch_parse_completions()}}. +} +\examples{ +\dontrun{ +# Get batch job status and download results +status <- oai_batch_status("batch_abc123") + +if (status$status == "completed") { + content <- oai_file_content(status$output_file_id) + results <- oai_batch_parse_embeddings(content) +} +} +} +\seealso{ +\code{\link[=oai_batch_status]{oai_batch_status()}} to get output_file_id from completed batches, +\code{\link[=oai_batch_parse_embeddings]{oai_batch_parse_embeddings()}} and \code{\link[=oai_batch_parse_completions]{oai_batch_parse_completions()}} to +parse batch results } diff --git a/man/oai_file_delete.Rd b/man/oai_file_delete.Rd index 3e3234c..87b8d49 100644 --- a/man/oai_file_delete.Rd +++ b/man/oai_file_delete.Rd @@ -2,15 +2,34 @@ % Please edit documentation in R/openai_files_api.R \name{oai_file_delete} \alias{oai_file_delete} -\title{Delete a file from the OpenAI Files API} +\title{Delete a File from the OpenAI Files API} \usage{ oai_file_delete(file_id, key_name = "OPENAI_API_KEY") } \arguments{ -\item{file_id}{ID of the file given by OpenAI} +\item{file_id}{File identifier (starts with 'file-'), returned by +\code{\link[=oai_batch_file_upload]{oai_batch_file_upload()}} or \code{\link[=oai_file_list]{oai_file_list()}}} -\item{key_name}{The name of your API key, usually "OPENAI_API_KEY"} +\item{key_name}{Name of the environment variable containing your API key} +} +\value{ +A list containing the file id, object type, and deletion status +(deleted = TRUE/FALSE) } \description{ -Delete a file from the OpenAI Files API +Permanently deletes a file from the OpenAI Files API. This action cannot +be undone. Note that files associated with active batch jobs cannot be +deleted until the job completes. +} +\examples{ +\dontrun{ +# Delete a specific file +result <- oai_file_delete("file-abc123") +result$deleted # TRUE if successful + +} +} +\seealso{ +\code{\link[=oai_file_list]{oai_file_list()}} to find file IDs, +\code{\link[=oai_file_content]{oai_file_content()}} to retrieve file contents before deletion } diff --git a/man/oai_file_list.Rd b/man/oai_file_list.Rd index 56b5128..1fcdf97 100644 --- a/man/oai_file_list.Rd +++ b/man/oai_file_list.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/openai_files_api.R \name{oai_file_list} \alias{oai_file_list} -\title{List files available in the OpenAI Files API} +\title{List Files on the OpenAI Files API} \usage{ oai_file_list( purpose = c("batch", "fine-tune", "assistants", "vision", "user_data", "evals"), @@ -10,10 +10,33 @@ oai_file_list( ) } \arguments{ -\item{purpose}{The intended purpose of the uploaded file, one of "batch", "fine-tune", "assistants", "vision", "user_data", "evals"} +\item{purpose}{The intended purpose of the uploaded file. Must be one of +"batch", "fine-tune", "assistants", "vision", "user_data", or "evals".} -\item{key_name}{The name of your API key, usually "OPENAI_API_KEY"} +\item{key_name}{Name of the environment variable containing your API key} +} +\value{ +A list containing file metadata and pagination information. Each +file entry includes id, filename, purpose, bytes, created_at, and status. } \description{ -List files available in the OpenAI Files API +Retrieve a list of files that have been uploaded to the OpenAI Files API, +filtered by purpose. Files are retained for 30 days after upload. +} +\examples{ +\dontrun{ +# List all batch files +batch_files <- oai_file_list(purpose = "batch") + +# List fine-tuning files +ft_files <- oai_file_list(purpose = "fine-tune") + +# Access file IDs +file_ids <- purrr::map_chr(batch_files$data, "id") +} +} +\seealso{ +\code{\link[=oai_file_content]{oai_file_content()}} to retrieve file contents, +\code{\link[=oai_file_delete]{oai_file_delete()}} to remove files, +\code{\link[=oai_batch_file_upload]{oai_batch_file_upload()}} to upload batch files } From b7b762f73885da1fc2d95e736ec0bbf9d939e15a Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Tue, 13 Jan 2026 16:49:34 +0000 Subject: [PATCH 30/39] save vignette progress and dev docs --- dev_docs/01_integrations.qmd | 4 + dev_docs/initial_release.qmd | 1 + dev_docs/openai_batch_api.qmd | 178 ++++++++++++++++++++++++++++++++++ vignettes/sync_async.Rmd | 8 +- 4 files changed, 187 insertions(+), 4 deletions(-) diff --git a/dev_docs/01_integrations.qmd b/dev_docs/01_integrations.qmd index f85a047..bbcb769 100644 --- a/dev_docs/01_integrations.qmd +++ b/dev_docs/01_integrations.qmd @@ -103,6 +103,10 @@ oai_complete_good_auth <- oai_complete_df( oai_complete_good_auth ``` +## Batch API + +TODO: + # Hugging Face ## hf embed diff --git a/dev_docs/initial_release.qmd b/dev_docs/initial_release.qmd index 38752e9..105a717 100644 --- a/dev_docs/initial_release.qmd +++ b/dev_docs/initial_release.qmd @@ -495,3 +495,4 @@ How much information to print? From {httr}'s docs: \> This is a wrapper around req_verbose() that uses an integer to control verbosity: - 0: no output - 1: show headers - 2: show headers and bodies - 3: show headers, bodies, and curl status messages You can also pass in a value for 'path', which will save the response to a file, we'll look more at how to manage this later. + diff --git a/dev_docs/openai_batch_api.qmd b/dev_docs/openai_batch_api.qmd index 2f6c3a9..02b2926 100644 --- a/dev_docs/openai_batch_api.qmd +++ b/dev_docs/openai_batch_api.qmd @@ -102,3 +102,181 @@ oai_batch_file_delete(temp_id) oai_batch_file_list() ``` + +# Testing + +## Embeddings + +```{r} +uploaded_files <- oai_file_list( + purpose = "batch" +) +assertthat::validate_that(length(uploaded_files$data) == 0) +``` + +First batch failed due to ID beingan integer, so ID has to be a string... + +```{r} +embedding_rows <- test_df |> + oai_batch_prepare_embeddings( + x, + y + ) + +embedding_file <- oai_batch_file_upload(embedding_rows) + +embedding_batch <- oai_batch_create(embedding_file$id, + endpoint = "/v1/embeddings") + +batch_jobs <- oai_batch_list() +batch_jobs +oai_batch_status(embedding_batch$id) +``` + +And then this time, so we need to fix how the tmpfile is created and prepend with batch: + +``` +$error$message +[1] "Invalid 'batch_id': 'file-K6vaHgwcJsE5z1MMFvVMix'. Expected an ID that begins with 'batch'." +``` + +Once we've made sure the ID is a string, then we create the file, upload it, start the batch, check the status, download it. + +I would say that this still feels quite janky. There's a few different file ids going on, and then we need to handle any errors separately it seems? + +```{r} +embedding_rows_id_string <- test_df |> + oai_batch_prepare_embeddings( + x, + y + ) + +embedding_file_id_string <- oai_batch_file_upload(embedding_rows_id_string) + +embedding_batch_id_string <- oai_batch_create(embedding_file_id_string$id, + endpoint = "/v1/embeddings") + +embedding_batch_metadata <- oai_batch_status( + embedding_batch_id_string$id # "batch_6960e0b48bf481909751c76756ac9fec" +) + +output_file_contents <- oai_file_content( + embedding_batch_metadata$output_file_id +) + + +oai_batch_parse_embeddings(output_file_contents, original_df = NULL) +``` + +## Low dimensions + +Looks good, 327 columns = 324 + 3 (id, .error, .error_msg) + +```{r} +ld_embed_rows <- test_df |> + oai_batch_prepare_embeddings( + x, + y, + dimensions = 324 + ) + +ld_file <- oai_batch_file_upload( + jsonl_rows = ld_embed_rows, + purpose = "batch" +) + +ld_batch_job <- oai_batch_create(ld_file$id, endpoint = "/v1/embeddings") + +oai_batch_status(ld_batch_job$id)[["status"]] +ld_results <- oai_file_content(oai_batch_status(ld_batch_job$id)[["output_file_id"]]) + +oai_batch_parse_embeddings(ld_results) + +``` + +Delete/clean up + +We can delete the input/output files, but it doesn't seem like we can actually delete batches. + +```{r} +oai_batch_list()[["data"]] |> + map( pluck("id") +) |> unlist() + +oai_batch_status("batch_6960ed03eea8819080aaa69a8982de66") +oai_file_delete("file-4f8STaon74XE5yP6M7mWmH") +oai_file_delete("file-FteMgeWV4mK85kdcntGU29") + +oai_batch_status("batch_6960dfe7d8ac8190acf682a59e844b71") +oai_file_delete("file-HRFKx63PimYZYqsraQHR6T") +oai_file_delete("file-HRFKx63PimYZYqsraQHR6T") +oai_batch_status("batch_6960e0b48bf481909751c76756ac9fec") +oai_file_delete('file-TcV6dGkFEsrzGTNxkPKYCb') # output file +oai_file_delete('file-K6vaHgwcJsE5z1MMFvVMix') # input file + +``` + +## Completions + +testing the funcs for Completions, we need to get an input, make a file, start the batch, checl the status, retrieve the content +```{r} +completions_req <- oai_batch_build_completions_req( + input = "Tell me a joke about my country, the United Kingdom", + id = "id_1" +) + +completions_file <- oai_batch_file_upload( + completions_req +) +``` + +Do need to remember to fill in endpoint = "/v1/chat/completions" here instead of default arg for embeddings +```{r} +completions_batch <- oai_batch_create( + completions_file$id, + endpoint = "/v1/chat/completions" +) + +completions_status <- oai_batch_status( + completions_batch$id +) + +completions_status$status +completions_status$output_file_id # file-Q8TaFRoCYGHRZJKiYQKhx9 + +output <- oai_file_content(completions_status$output_file_id) + +oai_batch_parse_completions(output) |> + purrr::pluck("content", 1) +``` + +### With Schema + +```{r} +joke_schema <- create_json_schema( + name = "joke_schema", + description = "A set up and a punchline", + schema = schema_object( + setup = schema_string("The set up for the joke"), + punchline = schema_string("The punchline of the joke, make it pop"), + required = c("setup", "punchline") + ) +) + +completions_req_w_schema <- oai_batch_build_completions_req( + input = "Tell me a joke about my country, the United Kingdom", + id = "id_1", + schema = joke_schema, + temperature = 1 +) + +.file <- oai_batch_file_upload( + completions_req_w_schema +) + +.batch <- oai_batch_create(.file$id, endpoint = "/v1/chat/completions") + +oai_batch_status(.batch$id)[["status"]] +``` + + diff --git a/vignettes/sync_async.Rmd b/vignettes/sync_async.Rmd index ac8f191..f7cfd24 100644 --- a/vignettes/sync_async.Rmd +++ b/vignettes/sync_async.Rmd @@ -34,7 +34,7 @@ TODO: Code samples when the functions etc. are up and running. Most of EndpointR's integrations are with synchronous APIs such as [Completions](https://platform.openai.com/docs/api-reference/completions) by OpenAI, Hugging Face's [Inference Endpoints](https://huggingface.co/docs/inference-endpoints/en/index), and Messages by [Anthropic](https://platform.claude.com/docs/en/api/messages). When using these APIs, we send a request, wait a second or two and receive a response. -For many use-cases the synchronous APIs work just fine. But often as data scientists we need to do the same thing to thousands, or millions of rows of data. Hammering the provider's servers in such cases is inefficient for us and them. Plus, we probably don't want to sit around with a locked R session for 5 hours as our results get returned to us. +For many use-cases the synchronous APIs work just fine. But often as data scientists we need to do the same thing to thousands, or millions of rows of data. Hammering the provider's servers with thousands/millions of requests is inefficient for us and them. Plus, we don't want to sit around with a blocked R session for 5 hours as our results get returned to us. Most Generative AI providers also offer lower-cost, asynchronous APIs. The providers usually offer a guarantee of the results within a time frame, and an estimate of the average time to return the results. For example, they may guarantee results within 24 hours, but expect them within 1-3 hours. @@ -69,13 +69,13 @@ In order to use OpenAI's Batch API, we need to upload files to the Files API. Lu - Batch API - Files API -Fortunately, the Batch and Files API dovetail quite well, and the same mental models will be useful for both. +Fortunately, the same mental models will be useful for both the Batch and the Files APIs. # EndpointR Implementation of OpenAI Batch API Due to inherent differences between Synchronous and Asynchronous APIs, the EndpointR implementation of the OpenAI Batch API will feel more like submitting jobs to a cluster/server than automagically working with an entire data frame as in `oai_complete_df()` and `oai_embed_df()`. As such, different functions and workflows are needed. -You will likely want to use the Batch API for both embeddings and completions at separate times and with distinct arguments, so we have a separate function to prepare batches for each one: +The two main functions for **preparing the requests** are - `oai_batch_prepare_embeddings()` - `oai_batch_prepare_completions()` @@ -88,7 +88,7 @@ EndpointR prepares each batch, writes it to a file in temporary storage, and the Whether using the Batch API for embeddings or chat completions, each line of of the .jsonl file must form a self-contained request with a unique identifier. -Example for embeddings (no structured output!): +E.g. for an embeddings task on the Batch API, the request in each row should look something like: Row version: From 69c0e8fee66d239b793e3388bf423cb88d34aead Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Tue, 13 Jan 2026 16:50:09 +0000 Subject: [PATCH 31/39] add tests for completions in batch API --- tests/testthat/test-oai_batch_api.R | 128 ++++++++++++++++++++++++++++ 1 file changed, 128 insertions(+) diff --git a/tests/testthat/test-oai_batch_api.R b/tests/testthat/test-oai_batch_api.R index 429d3be..7202119 100644 --- a/tests/testthat/test-oai_batch_api.R +++ b/tests/testthat/test-oai_batch_api.R @@ -43,6 +43,7 @@ test_that("oai_batch_build_completion_req creates valid JSON structure", { expect_equal(parsed$body$messages[[1]]$role, "user") expect_equal(parsed$body$messages[[1]]$content, "Hello") }) + test_that("oai_batch_build_completion_req handles system_prompt", { result <- oai_batch_build_completion_req( input = "Hello", @@ -79,6 +80,20 @@ test_that("oai_batch_build_completion_req handles schema as list", { expect_equal(parsed$body$response_format$type, "json_schema") }) +test_that("oai_batch_build_completion_req respects temperature and max_tokens", { + result <- oai_batch_build_completion_req( + input = "Hello", + id = "test_4", + temperature = 0.7, + max_tokens = 1000L + ) + + parsed <- jsonlite::fromJSON(result) + + expect_equal(parsed$body$temperature, 0.7) + expect_equal(parsed$body$max_tokens, 1000) +}) + test_that("oai_batch_prepare_completions creates valid JSONL", { test_df <- tibble::tibble( id = c("a", "b"), @@ -99,6 +114,28 @@ test_that("oai_batch_prepare_completions creates valid JSONL", { expect_equal(parsed[[2]]$custom_id, "b") expect_equal(parsed[[1]]$body$messages[[1]]$content, "Hello") }) + +test_that("oai_batch_prepare_completions handles system_prompt across all rows", { + test_df <- tibble::tibble( + id = c("a", "b"), + text = c("Hello", "World") + ) + + result <- oai_batch_prepare_completions( + df = test_df, + text_var = text, + id_var = id, + system_prompt = "Be brief" + ) + + lines <- strsplit(result, "\n")[[1]] + parsed <- purrr::map(lines, \(x) jsonlite::fromJSON(x, simplifyVector = FALSE)) + + expect_equal(parsed[[1]]$body$messages[[1]]$role, "system") + expect_equal(parsed[[2]]$body$messages[[1]]$role, "system") +}) + + test_that("oai_batch_parse_embeddings handles success response", { mock_content <- '{"custom_id":"1","response":{"body":{"data":[{"embedding":[0.1,0.2,0.3]}]}},"error":null}' @@ -112,6 +149,7 @@ test_that("oai_batch_parse_embeddings handles success response", { expect_equal(result$V2, 0.2) expect_equal(result$V3, 0.3) }) + test_that("oai_batch_parse_embeddings handles error response", { mock_content <- '{"custom_id":"1","response":null,"error":{"message":"Rate limit exceeded"}}' @@ -135,6 +173,55 @@ test_that("oai_batch_parse_embeddings handles multiple rows", { expect_equal(result$custom_id, c("1", "2")) expect_equal(result$V1, c(0.1, 0.3)) }) + + +test_that("oai_batch_parse_completions handles success response", { + mock_content <- '{"custom_id":"1","response":{"body":{"choices":[{"message":{"content":"Hello back"}}]}},"error":null}' + + result <- oai_batch_parse_completions(mock_content) + + expect_equal(nrow(result), 1) + expect_equal(result$custom_id, "1") + expect_equal(result$content, "Hello back") + expect_false(result$.error) +}) + +test_that("oai_batch_parse_completions handles error response", { + mock_content <- '{"custom_id":"1","response":null,"error":{"message":"API error"}}' + + result <- oai_batch_parse_completions(mock_content) + + expect_equal(nrow(result), 1) + expect_true(result$.error) + expect_equal(result$.error_msg, "API error") + expect_true(is.na(result$content)) +}) + +test_that("oai_batch_parse_completions handles JSON schema content", { + mock_content <- '{"custom_id":"1","response":{"body":{"choices":[{"message":{"content":"{\\"sentiment\\":\\"positive\\"}"}}]}},"error":null}' + + result <- oai_batch_parse_completions(mock_content) + + expect_equal(result$content, '{"sentiment":"positive"}') + parsed_content <- jsonlite::fromJSON(result$content) + expect_equal(parsed_content$sentiment, "positive") +}) + +test_that("oai_batch_parse_completions renames id column when original_df provided", { + mock_content <- '{"custom_id":"doc_1","response":{"body":{"choices":[{"message":{"content":"test"}}]}},"error":null}' + + original_df <- tibble::tibble( + my_id = "doc_1", + text = "Hello" + ) + + result <- oai_batch_parse_completions(mock_content, original_df, id_var = "my_id") + + expect_true("my_id" %in% names(result)) + expect_false("custom_id" %in% names(result)) + expect_equal(result$my_id, "doc_1") +}) + test_that("oai_batch_prepare_embeddings rejects duplicate IDs", { test_df <- tibble::tibble( id = c("a", "a", "b"), @@ -146,9 +233,50 @@ test_that("oai_batch_prepare_embeddings rejects duplicate IDs", { "custom_id values must be unique" ) }) + +test_that("oai_batch_prepare_completions rejects duplicate IDs", { + test_df <- tibble::tibble( + id = c("x", "y", "x"), + text = c("Hello", "World", "Again") + ) + + expect_error( + oai_batch_prepare_completions(test_df, text, id), + "custom_id values must be unique" + ) +}) + +test_that("oai_batch_prepare_embeddings handles empty dataframe with warning", { + test_df <- tibble::tibble(id = character(), text = character()) + + expect_warning( + result <- oai_batch_prepare_embeddings(test_df, text, id), + "Input is empty" + ) + expect_equal(result, "") +}) + +test_that("oai_batch_prepare_completions handles empty dataframe with warning", { + test_df <- tibble::tibble(id = character(), text = character()) + + expect_warning( + result <- oai_batch_prepare_completions(test_df, text, id), + "Input is empty" + ) + expect_equal(result, "") +}) + test_that("oai_batch_parse_embeddings handles empty input", { result <- oai_batch_parse_embeddings("") expect_equal(nrow(result), 0) expect_true("custom_id" %in% names(result)) expect_true(".error" %in% names(result)) }) + +test_that("oai_batch_parse_completions handles empty input", { + result <- oai_batch_parse_completions("") + + expect_equal(nrow(result), 0) + expect_true("custom_id" %in% names(result)) + expect_true("content" %in% names(result)) +}) From 35ea08f762af3cbe0bb618d567a4bd80dc6b8b8a Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Tue, 13 Jan 2026 17:11:46 +0000 Subject: [PATCH 32/39] Make sure we call completions_req after namechange update NEWS.md to summarise batch/files implementation --- NEWS.md | 30 ++++++++++++++++++++++++++++- R/openai_batch_api.R | 2 +- tests/testthat/test-oai_batch_api.R | 16 +++++++-------- 3 files changed, 38 insertions(+), 10 deletions(-) diff --git a/NEWS.md b/NEWS.md index 999bb8f..4a983a8 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,6 +1,34 @@ # EndpointR 0.2.1 -- OpenAI Batch API for Embeddings and Completions +## OpenAI Batch API + +Adds support for OpenAI's asynchronous Batch API, offering 50% cost savings and higher rate limits compared to synchronous endpoints. Ideal for large-scale embeddings, classifications, and batch inference tasks. + +**Request preparation:** + +- `oai_batch_build_embed_req()` - Build a single embedding request row +- `oai_batch_prepare_embeddings()` - Prepare an entire data frame for batch embeddings +- `oai_batch_build_completions_req()` - Build a single chat completions request row +- `oai_batch_prepare_completions()` - Prepare an entire data frame for batch completions (supports structured outputs via JSON schema) + +**Job management:** + +- `oai_batch_file_upload()` - Upload prepared JSONL to OpenAI Files API +- `oai_batch_create()` - Trigger a batch job on an uploaded file +- `oai_batch_status()` - Check the status of a running batch job +- `oai_batch_list()` - List all batch jobs associated with your API key +- `oai_batch_cancel()` - Cancel an in-progress batch job + +**Results parsing:** + +- `oai_batch_parse_embeddings()` - Parse batch embedding results into a tidy data frame +- `oai_batch_parse_completions()` - Parse batch completion results into a tidy data frame + +## OpenAI Files API + +- `oai_file_list()` - List files uploaded to the OpenAI Files API +- `oai_file_content()` - Retrieve the content of a file (e.g., batch results) +- `oai_file_delete()` - Delete a file from the Files API # EndpointR 0.2.0 diff --git a/R/openai_batch_api.R b/R/openai_batch_api.R index fd27502..b8ffcba 100644 --- a/R/openai_batch_api.R +++ b/R/openai_batch_api.R @@ -209,7 +209,7 @@ oai_batch_prepare_completions <- function(df, text_var, id_var, model = "gpt-4o- } reqs <- purrr::map2_chr(.texts, .ids, \(x, y) { - oai_batch_build_completion_req( + oai_batch_build_completions_req( input = x, id = as.character(y), model = model, diff --git a/tests/testthat/test-oai_batch_api.R b/tests/testthat/test-oai_batch_api.R index 7202119..82a2bed 100644 --- a/tests/testthat/test-oai_batch_api.R +++ b/tests/testthat/test-oai_batch_api.R @@ -26,8 +26,8 @@ test_that("oai_batch_build_embed_req creates a row of JSON and responds to its i expect_equal(no_dims_str$body$model, "text-embedding-3-small") }) -test_that("oai_batch_build_completion_req creates valid JSON structure", { - result <- oai_batch_build_completion_req( +test_that("oai_batch_build_completions_req creates valid JSON structure", { + result <- oai_batch_build_completions_req( input = "Hello", id = "test_1", model = "gpt-4o-mini" @@ -44,8 +44,8 @@ test_that("oai_batch_build_completion_req creates valid JSON structure", { expect_equal(parsed$body$messages[[1]]$content, "Hello") }) -test_that("oai_batch_build_completion_req handles system_prompt", { - result <- oai_batch_build_completion_req( +test_that("oai_batch_build_completions_req handles system_prompt", { + result <- oai_batch_build_completions_req( input = "Hello", id = "test_2", system_prompt = "You are helpful" @@ -59,7 +59,7 @@ test_that("oai_batch_build_completion_req handles system_prompt", { expect_equal(parsed$body$messages[[2]]$role, "user") }) -test_that("oai_batch_build_completion_req handles schema as list", { +test_that("oai_batch_build_completions_req handles schema as list", { test_schema <- list( type = "json_schema", json_schema = list( @@ -68,7 +68,7 @@ test_that("oai_batch_build_completion_req handles schema as list", { ) ) - result <- oai_batch_build_completion_req( + result <- oai_batch_build_completions_req( input = "Hello", id = "test_3", schema = test_schema @@ -80,8 +80,8 @@ test_that("oai_batch_build_completion_req handles schema as list", { expect_equal(parsed$body$response_format$type, "json_schema") }) -test_that("oai_batch_build_completion_req respects temperature and max_tokens", { - result <- oai_batch_build_completion_req( +test_that("oai_batch_build_completions_req respects temperature and max_tokens", { + result <- oai_batch_build_completions_req( input = "Hello", id = "test_4", temperature = 0.7, From 6679438dd76728721cdd24f5ab7faafcaabc7dac Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Tue, 13 Jan 2026 17:15:39 +0000 Subject: [PATCH 33/39] Update vignette with code for quickstarts, just need to finish the prose --- vignettes/sync_async.Rmd | 183 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 174 insertions(+), 9 deletions(-) diff --git a/vignettes/sync_async.Rmd b/vignettes/sync_async.Rmd index f7cfd24..f67fd09 100644 --- a/vignettes/sync_async.Rmd +++ b/vignettes/sync_async.Rmd @@ -14,21 +14,186 @@ knitr::opts_chunk$set( ) ``` -TODOS: - -- [ ] Quick start code when it's all ready -- [ ] OpenAI Batch API & Files API -- [ ] Batch API for embeddings -- [ ] Batch API for completions - - ```{r setup} library(EndpointR) ``` # Quickstart -TODO: Code samples when the functions etc. are up and running. +The Batch API workflow follows three stages: **prepare**, **submit**, and **retrieve**. Below are complete examples for embeddings and completions. + +## Batch Embeddings + +```{r batch-embeddings, eval = FALSE} +# 1. Prepare your data +df <- data.frame( + id = c("doc_1", "doc_2", "doc_3"), + text = c( + "The quick brown fox jumps over the lazy dog", + "Machine learning is transforming data science", + "R is a powerful language for statistical computing" + ) +) + +# 2. Prepare requests for the Batch API +jsonl_content <- oai_batch_prepare_embeddings( + df, + text_var = text, + id_var = id, + model = "text-embedding-3-small", + dimensions = 256 +) + +# 3. Upload to the Files API +file_info <- oai_batch_file_upload(jsonl_content) +file_info$id +#> "file-abc123..." + +# 4. Trigger the batch job +batch_job <- oai_batch_create( + file_id = file_info$id, + endpoint = "/v1/embeddings" +) +batch_job$id +#> "batch-xyz789..." + +# 5. Check status (repeat until completed) +status <- oai_batch_status(batch_job$id) +status$status +#> "in_progress" ... later ... "completed" + +# 6. Download and parse results +content <- oai_file_content(status$output_file_id) +embeddings_df <- oai_batch_parse_embeddings(content) + +# Result: tidy data frame with id and embedding dimensions (V1, V2, ..., V256) +embeddings_df +#> # A tibble +#> custom_id .error .error_msg V1 V2 V3 ... +#> ... +#> 1 doc_1 FALSE NA 0.023 -0.041 0.018 ... +#> 2 doc_2 FALSE NA -0.015 0.032 0.044 ... +#> 3 doc_3 FALSE NA 0.008 -0.027 0.031 ... +``` + +## Batch Completions + +```{r batch-completions, eval = FALSE} +# 1. Prepare your data +df <- data.frame( + id = c("q1", "q2", "q3"), + prompt = c( + "What is the capital of France?", + "Explain photosynthesis in one sentence.", + "What is 2 + 2?" + ) +) + +# 2. Prepare requests +jsonl_content <- oai_batch_prepare_completions( + df, + text_var = prompt, + id_var = id, + model = "gpt-4o-mini", + system_prompt = "You are a helpful assistant. Be concise.", + temperature = 0, + max_tokens = 100 +) + +# 3. Upload and trigger batch job +file_info <- oai_batch_file_upload(jsonl_content) +batch_job <- oai_batch_create( + file_id = file_info$id, + endpoint = "/v1/chat/completions" +) + +# 4. Check status and retrieve results +status <- oai_batch_status(batch_job$id) +# ... wait for status$status == "completed" ... + +content <- oai_file_content(status$output_file_id) +completions_df <- oai_batch_parse_completions(content) + +completions_df +#> # A tibble +#> custom_id content .error .error_msg +#> +#> 1 q1 The capital of France is Paris. FALSE NA +#> 2 q2 Photosynthesis converts sunlight into energy FALSE NA +#> 3 q3 2 + 2 equals 4. FALSE NA +``` + +## Batch Completions with Structured Output + +For classification tasks or when you need structured data back, combine the Batch API with JSON schemas: + +```{r batch-completions-schema, eval = FALSE} +# 1. Define a schema for sentiment classification +sentiment_schema <- create_json_schema( + name = "sentiment_analysis", + schema_object( + sentiment = schema_enum( + c("positive", "negative", "neutral"), + description = "The sentiment of the text" + ), + confidence = schema_number( + description = "Confidence score between 0 and 1" + ) + ) +) + +# 2. Prepare data +df <- data.frame( + id = c("review_1", "review_2", "review_3"), + text = c( + "This product is absolutely fantastic! Best purchase ever.", + "Terrible quality, broke after one day. Complete waste of money.", + "It's okay, nothing special but does the job." + ) +) + +# 3. Prepare requests with schema +jsonl_content <- oai_batch_prepare_completions( + df, + text_var = text, + id_var = id, + model = "gpt-4o-mini", + system_prompt = "Analyse the sentiment of the following text.", + schema = sentiment_schema, + temperature = 0 +) + +# 4. Upload and trigger batch job +file_info <- oai_batch_file_upload(jsonl_content) +batch_job <- oai_batch_create( + file_id = file_info$id, + endpoint = "/v1/chat/completions" +) + +# 5. Retrieve and parse results +status <- oai_batch_status(batch_job$id) +content <- oai_file_content(status$output_file_id) +results_df <- oai_batch_parse_completions(content) + +# The content column contains JSON that can be parsed +results_df$content +#> [1] "{\"sentiment\":\"positive\",\"confidence\":0.95}" +#> [2] "{\"sentiment\":\"negative\",\"confidence\":0.92}" +#> [3] "{\"sentiment\":\"neutral\",\"confidence\":0.78}" + +# Parse the JSON content into columns +results_df |> + dplyr::mutate( + parsed = purrr::map(content, jsonlite::fromJSON) + ) |> + tidyr::unnest_wider(parsed) +#> # A tibble +#> custom_id sentiment confidence .error .error_msg +#> +#> 1 review_1 positive 0.95 FALSE NA +#> 2 review_2 negative 0.92 FALSE NA +#> 3 review_3 neutral 0.78 FALSE NA +``` # Introduction From f80e4aa9df98c469c533a8f327c3ebf423f69c02 Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Tue, 13 Jan 2026 21:44:56 +0000 Subject: [PATCH 34/39] Patch the inherits (need to S7::S7_inherits, been stung by this before) in openai_batch_api.R in prep and req funcs for Completions API Add test case --- R/openai_batch_api.R | 4 ++-- tests/testthat/test-oai_batch_api.R | 24 ++++++++++++++++++++++++ 2 files changed, 26 insertions(+), 2 deletions(-) diff --git a/R/openai_batch_api.R b/R/openai_batch_api.R index b8ffcba..fda1d96 100644 --- a/R/openai_batch_api.R +++ b/R/openai_batch_api.R @@ -144,7 +144,7 @@ oai_batch_build_completions_req <- function(input, id, model = "gpt-4o-mini", sy ) if (!is.null(schema)) { - if (inherits(schema, "json_schema")) { + if (S7::S7_inherits(schema, json_schema)) { body$response_format <- json_dump(schema) } else if (is.list(schema)) { body$response_format <- schema @@ -204,7 +204,7 @@ oai_batch_prepare_completions <- function(df, text_var, id_var, model = "gpt-4o- } ## pre-process schema once if S7 object to avoid repeated json_dump() calls - if (!is.null(schema) && inherits(schema, "json_schema")) { + if (!is.null(schema) && S7::S7_inherits(schema, json_schema)) { schema <- json_dump(schema) } diff --git a/tests/testthat/test-oai_batch_api.R b/tests/testthat/test-oai_batch_api.R index 82a2bed..e1e368f 100644 --- a/tests/testthat/test-oai_batch_api.R +++ b/tests/testthat/test-oai_batch_api.R @@ -80,6 +80,30 @@ test_that("oai_batch_build_completions_req handles schema as list", { expect_equal(parsed$body$response_format$type, "json_schema") }) +test_that("oai_batch_build_completions_req handles json_schema S7 object", { + test_schema <- create_json_schema( + name = "sentiment_schema", + description = "Sentiment analysis result", + schema = schema_object( + sentiment = schema_string("The sentiment", enum = c("positive", "negative", "neutral")), + required = c("sentiment") + ) + ) + + result <- oai_batch_build_completions_req( + input = "Analyse the sentiment of this text", + id = "test_s7_schema", + schema = test_schema + ) + + parsed <- jsonlite::fromJSON(result, simplifyVector = FALSE) + + expect_true("response_format" %in% names(parsed$body)) + expect_equal(parsed$body$response_format$type, "json_schema") + expect_equal(parsed$body$response_format$json_schema$name, "sentiment_schema") + expect_equal(parsed$body$response_format$json_schema$strict, TRUE) +}) + test_that("oai_batch_build_completions_req respects temperature and max_tokens", { result <- oai_batch_build_completions_req( input = "Hello", From 58615cc27414f2b490de6ee5d0eec51eb0af28ff Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Wed, 14 Jan 2026 10:32:52 +0000 Subject: [PATCH 35/39] Add Claudy to authors Link to sync v a sync vignette on README build readme.md() --- DESCRIPTION | 6 ++++-- README.Rmd | 4 ++++ README.md | 5 +++++ man/EndpointR-package.Rd | 5 +++++ 4 files changed, 18 insertions(+), 2 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index e52f29b..d8147be 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,8 +1,10 @@ Package: EndpointR Title: Connects to various Machine Learning inference providers Version: 0.2.1 -Authors@R: - person("Jack", "Penzer", , "Jack.penzer@sharecreative.com", role = c("aut", "cre")) +Authors@R: c( + person("Jack", "Penzer", , "Jack.penzer@sharecreative.com", role = c("aut", "cre")), + person("Claude", "AI", role = "aut") + ) Description: EndpointR is a 'batteries included', open-source R package for connecting to various APIs for Machine Learning model predictions. EndpointR is built for company-specific use cases, so may not be useful to a wide audience. License: MIT + file LICENSE Encoding: UTF-8 diff --git a/README.Rmd b/README.Rmd index ea55d42..3ad13c6 100644 --- a/README.Rmd +++ b/README.Rmd @@ -277,3 +277,7 @@ Read the [LLM Providers Vignette](articles/llm_providers.html), and the [Structu - Read the [httr2 vignette](https://httr2.r-lib.org/articles/wrapping-apis.html#basics){target="_blank"} on managing your API keys securely and encrypting them. - Read the [EndpointR API Keys](articles/api_keys.html) vignette for information on which API keys you need for wach endpoint we support, and how to securely import those API keys into your .Renvironfile. + +# Batch Jobs + +- Read the [EndpointR vignette](articles/sync_async.html) on Synchronous vs Asynchronous APIs \ No newline at end of file diff --git a/README.md b/README.md index fb69873..7feefb4 100644 --- a/README.md +++ b/README.md @@ -291,5 +291,10 @@ information on common workflows with the OpenAI Chat Completions API information on which API keys you need for wach endpoint we support, and how to securely import those API keys into your .Renvironfile. +# Batch Jobs + +- Read the [EndpointR vignette](articles/sync_async.html) on Synchronous + vs Asynchronous APIs + [^1]: Content pending implementation for Anthropic Messages API, Gemini API, and OpenAI Responses API diff --git a/man/EndpointR-package.Rd b/man/EndpointR-package.Rd index 1935951..de47fdd 100644 --- a/man/EndpointR-package.Rd +++ b/man/EndpointR-package.Rd @@ -18,5 +18,10 @@ Useful links: \author{ \strong{Maintainer}: Jack Penzer \email{Jack.penzer@sharecreative.com} +Authors: +\itemize{ + \item Claude AI +} + } \keyword{internal} From e0b5e8a0c6e74a5fa25b2474470311d851ec2121 Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Wed, 14 Jan 2026 12:02:22 +0000 Subject: [PATCH 36/39] rename oai_batch_file_upload -> oai_batch_upload rename oai_batch_create -> oai_batch_start create oai_file_upload func, and then call it in oai_batch_upload add tests for input args rename test file for openai_batch_api.R add endpoint_url argument to file/batch funcs (future proofing) make sure purpose has match.args where it's called --- NAMESPACE | 4 +- NEWS.md | 4 +- R/openai_batch_api.R | 60 ++++++++--------- R/openai_files_api.R | 26 ++++++++ _pkgdown.yml | 4 +- dev_docs/openai_batch_api.qmd | 65 +++++++++++++++---- man/oai_batch_cancel.Rd | 2 +- ...oai_batch_create.Rd => oai_batch_start.Rd} | 12 ++-- man/oai_batch_status.Rd | 2 +- ...tch_file_upload.Rd => oai_batch_upload.Rd} | 15 +++-- ...ai_batch_api.R => test-openai_batch_api.R} | 0 tests/testthat/test-openai_files_api.R | 15 +++++ vignettes/sync_async.Rmd | 8 +-- 13 files changed, 148 insertions(+), 69 deletions(-) rename man/{oai_batch_create.Rd => oai_batch_start.Rd} (87%) rename man/{oai_batch_file_upload.Rd => oai_batch_upload.Rd} (73%) rename tests/testthat/{test-oai_batch_api.R => test-openai_batch_api.R} (100%) create mode 100644 tests/testthat/test-openai_files_api.R diff --git a/NAMESPACE b/NAMESPACE index 69a710f..60d99e3 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -25,14 +25,14 @@ export(json_schema) export(oai_batch_build_completions_req) export(oai_batch_build_embed_req) export(oai_batch_cancel) -export(oai_batch_create) -export(oai_batch_file_upload) export(oai_batch_list) export(oai_batch_parse_completions) export(oai_batch_parse_embeddings) export(oai_batch_prepare_completions) export(oai_batch_prepare_embeddings) +export(oai_batch_start) export(oai_batch_status) +export(oai_batch_upload) export(oai_build_completions_request) export(oai_build_completions_request_list) export(oai_build_embedding_request) diff --git a/NEWS.md b/NEWS.md index 4a983a8..865a503 100644 --- a/NEWS.md +++ b/NEWS.md @@ -13,8 +13,8 @@ Adds support for OpenAI's asynchronous Batch API, offering 50% cost savings and **Job management:** -- `oai_batch_file_upload()` - Upload prepared JSONL to OpenAI Files API -- `oai_batch_create()` - Trigger a batch job on an uploaded file +- `oai_batch_upload()` - Upload prepared JSONL to OpenAI Files API +- `oai_batch_start()` - Trigger a batch job on an uploaded file - `oai_batch_status()` - Check the status of a running batch job - `oai_batch_list()` - List all batch jobs associated with your API key - `oai_batch_cancel()` - Cancel an in-progress batch job diff --git a/R/openai_batch_api.R b/R/openai_batch_api.R index fda1d96..5ea0b3f 100644 --- a/R/openai_batch_api.R +++ b/R/openai_batch_api.R @@ -231,8 +231,9 @@ oai_batch_prepare_completions <- function(df, text_var, id_var, model = "gpt-4o- #' #' #' @param jsonl_rows Rows of valid JSON, output of an oai_batch_prepare* function -#' @param key_name Name of the environment variable containing your API key #' @param purpose File purpose tag, e.g. 'batch', 'fine-tune' +#' @param key_name Name of the environment variable containing your API key +#' @param endpoint_url OpenAI API endpoint URL (default: OpenAI's Files API V1) #' #' @returns Metadata for an upload to the OpenAI Files API #' @@ -245,36 +246,27 @@ oai_batch_prepare_completions <- function(df, text_var, id_var, model = "gpt-4o- #' text = c("Hello world", "Goodbye world") #' ) #' jsonl_content <- oai_batch_prepare_embeddings(df, text_var = text, id_var = id) -#' file_info <- oai_batch_file_upload(jsonl_content) +#' file_info <- oai_batch_upload(jsonl_content) #' file_info$id # Use this ID to create a batch job #' } -oai_batch_file_upload <- function(jsonl_rows, key_name = "OPENAI_API_KEY", purpose = "batch") { +oai_batch_upload <- function(jsonl_rows, purpose = c("batch", "fine-tune", "assistants", "vision", "user_data", "evals"), key_name = "OPENAI_API_KEY", endpoint_url = "https://api.openai.com/v1/files") { -api_key <- get_api_key(key_name) - -.tmp <- tempfile(fileext = ".jsonl") -on.exit(unlink(.tmp)) # if session crashes we drop the file from mem safely -writeLines(jsonl_rows, .tmp) # send the content to the temp file for uploading to OAI -# question here is whether to also save this somewhere by force... -# once OAI have the file it's backed up for 30 days. + purpose <- match.arg(purpose) -resp <- httr2::request(base_url = "https://api.openai.com/v1/files") |> -httr2::req_auth_bearer_token(api_key) |> -httr2::req_body_multipart(file = curl::form_file(.tmp), -purpose = purpose) |> -httr2::req_error(is_error = ~ FALSE) |> -httr2::req_perform() + .tmp <- tempfile(fileext = ".jsonl") + on.exit(unlink(.tmp)) # if session crashes we drop the file from mem safely + writeLines(jsonl_rows, .tmp) # send the content to the temp file for uploading to OAI + # question here is whether to also save this somewhere by force... + # once OAI have the file it's backed up for 30 days. -result <- httr2::resp_body_json(resp) + result <- oai_file_upload( + file = .tmp, + purpose = purpose, + key_name = key_name, + endpoint_url = endpoint_url + ) -if (httr2::resp_status(resp) >= 400) { - error_msg <- result$error$message %||% "Unknown error" - cli::cli_abort(c( - "Failed to upload file to OpenAI Files API", - "x" = error_msg - )) - } - + return(result) } @@ -287,25 +279,25 @@ if (httr2::resp_status(resp) >= 400) { #' #' Batch Job Ids start with "batch_", you'll receive a warning if you try to check batch status on a Files API file (the Files/Batch API set up is a lil bit clumsy for me) #' -#' @param file_id File ID returned by oai_batch_file_upload() +#' @param file_id File ID returned by oai_batch_upload() #' @param endpoint The API endpoint path, e.g. /v1/embeddings #' @param completion_window Time window for batch completion (OpenAI guarantees 24h only) #' @param metadata Optional list of metadata to tag the batch with -#' @inheritParams oai_batch_file_upload +#' @inheritParams oai_batch_upload #' #' @returns Metadata about an OpenAI Batch Job Including the batch ID #' #' @export #' @examples #' \dontrun{ -#' # After uploading a file with oai_batch_file_upload() -#' batch_job <- oai_batch_create( +#' # After uploading a file with oai_batch_upload() +#' batch_job <- oai_batch_start( #' file_id = "file-abc123", #' endpoint = "/v1/embeddings" #' ) #' batch_job$id # Use this to check status later #' } -oai_batch_create <- function(file_id, endpoint = c("/v1/embeddings", "/v1/chat/completions"), completion_window = "24h", metadata = NULL, key_name = "OPENAI_API_KEY") { +oai_batch_start <- function(file_id, endpoint = c("/v1/embeddings", "/v1/chat/completions"), completion_window = "24h", metadata = NULL, key_name = "OPENAI_API_KEY") { endpoint <- match.arg(endpoint) api_key <- get_api_key(key_name) @@ -330,8 +322,8 @@ oai_batch_create <- function(file_id, endpoint = c("/v1/embeddings", "/v1/chat/c #' Check the Status of a Batch Job on the OpenAI Batch API #' -#' @param batch_id Batch identifier (starts with 'batch_'), returned by oai_batch_create() -#' @inheritParams oai_batch_file_upload +#' @param batch_id Batch identifier (starts with 'batch_'), returned by oai_batch_start() +#' @inheritParams oai_batch_upload #' #' @returns Metadata about an OpenAI Batch API Job, including status, error_file_id, output_file_id, input_file_id etc. #' @@ -359,7 +351,7 @@ oai_batch_status <- function(batch_id, key_name = "OPENAI_API_KEY") { #' #' @param limit Maximum number of batch jobs to return #' @param after Cursor for pagination; batch ID to start after -#' @inheritParams oai_batch_file_upload +#' @inheritParams oai_batch_upload #' #' @returns A list containing batch job metadata and pagination information #' @@ -396,7 +388,7 @@ oai_batch_list <- function(limit = 20L, after = NULL, key_name = "OPENAI_API_KEY #' requests, but requests already being processed may still complete. #' #' @inheritParams oai_batch_status -#' @inheritParams oai_batch_file_upload +#' @inheritParams oai_batch_upload #' #' @returns Metadata about the cancelled batch job #' diff --git a/R/openai_files_api.R b/R/openai_files_api.R index afe7953..5e48999 100644 --- a/R/openai_files_api.R +++ b/R/openai_files_api.R @@ -39,6 +39,32 @@ oai_file_list <- function(purpose = c("batch", "fine-tune", "assistants", "visio } +oai_file_upload <- function(file, purpose = c("batch", "fine-tune", "assistants", "vision", "user_data", "evals"), key_name = "OPENAI_API_KEY", endpoint_url = "https://api.openai.com/v1/files") { + + api_key <- get_api_key(key_name) + purpose <- match.arg(purpose) + stopifnot("`file` must be a file object" = is.character(file) && file.exists(file)) + + resp <- httr2::request(base_url = endpoint_url) |> + httr2::req_auth_bearer_token(api_key) |> + httr2::req_body_multipart(file = curl::form_file(file), purpose = purpose) |> # use `req_body_multipart` instead of `req_body_file` to send 'purpose' with file + httr2::req_error(is_error = ~ FALSE) |> # let errors from providers surface rather than be caught by httr2. v.helpful for developing prompts/schemas and debugging APIs + httr2::req_perform() + + result <- httr2::resp_body_json(resp) + + if(httr2::resp_status(resp) >= 400) { + error_msg <- result$error$message %||% "Unknown error" + cli::cli_abort(c( + "Failed to upload file to OpenAI Files API", + "x" = error_msg + )) + } + + return(result) +} + + #' Delete a File from the OpenAI Files API #' #' Permanently deletes a file from the OpenAI Files API. This action cannot diff --git a/_pkgdown.yml b/_pkgdown.yml index 9add69b..e782087 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -136,8 +136,8 @@ reference: - title: "OpenAI Batch API" desc: "Functions for managing batches on OpenAI's Batch API" contents: - - oai_batch_file_upload - - oai_batch_create + - oai_batch_upload + - oai_batch_start - oai_batch_status - oai_batch_list - oai_batch_cancel diff --git a/dev_docs/openai_batch_api.qmd b/dev_docs/openai_batch_api.qmd index 02b2926..8d1a733 100644 --- a/dev_docs/openai_batch_api.qmd +++ b/dev_docs/openai_batch_api.qmd @@ -91,7 +91,7 @@ xx <- test_df |> y ) -oai_batch_file_upload( +oai_batch_upload( xx ) @@ -123,9 +123,9 @@ embedding_rows <- test_df |> y ) -embedding_file <- oai_batch_file_upload(embedding_rows) +embedding_file <- oai_batch_upload(embedding_rows) -embedding_batch <- oai_batch_create(embedding_file$id, +embedding_batch <- oai_batch_start(embedding_file$id, endpoint = "/v1/embeddings") batch_jobs <- oai_batch_list() @@ -151,9 +151,9 @@ embedding_rows_id_string <- test_df |> y ) -embedding_file_id_string <- oai_batch_file_upload(embedding_rows_id_string) +embedding_file_id_string <- oai_batch_upload(embedding_rows_id_string) -embedding_batch_id_string <- oai_batch_create(embedding_file_id_string$id, +embedding_batch_id_string <- oai_batch_start(embedding_file_id_string$id, endpoint = "/v1/embeddings") embedding_batch_metadata <- oai_batch_status( @@ -180,12 +180,12 @@ ld_embed_rows <- test_df |> dimensions = 324 ) -ld_file <- oai_batch_file_upload( +ld_file <- oai_batch_upload( jsonl_rows = ld_embed_rows, purpose = "batch" ) -ld_batch_job <- oai_batch_create(ld_file$id, endpoint = "/v1/embeddings") +ld_batch_job <- oai_batch_start(ld_file$id, endpoint = "/v1/embeddings") oai_batch_status(ld_batch_job$id)[["status"]] ld_results <- oai_file_content(oai_batch_status(ld_batch_job$id)[["output_file_id"]]) @@ -225,14 +225,14 @@ completions_req <- oai_batch_build_completions_req( id = "id_1" ) -completions_file <- oai_batch_file_upload( +completions_file <- oai_batch_upload( completions_req ) ``` Do need to remember to fill in endpoint = "/v1/chat/completions" here instead of default arg for embeddings ```{r} -completions_batch <- oai_batch_create( +completions_batch <- oai_batch_start( completions_file$id, endpoint = "/v1/chat/completions" ) @@ -263,6 +263,8 @@ joke_schema <- create_json_schema( ) ) +joke_schema + completions_req_w_schema <- oai_batch_build_completions_req( input = "Tell me a joke about my country, the United Kingdom", id = "id_1", @@ -270,13 +272,54 @@ completions_req_w_schema <- oai_batch_build_completions_req( temperature = 1 ) -.file <- oai_batch_file_upload( +.file <- oai_batch_upload( completions_req_w_schema ) -.batch <- oai_batch_create(.file$id, endpoint = "/v1/chat/completions") +.batch <- oai_batch_start(.file$id, endpoint = "/v1/chat/completions") oai_batch_status(.batch$id)[["status"]] + +oai_batch_status(.batch$id)[["output_file_id"]] |> + oai_file_content() +oai_batch_status(.batch$id)[[""]] +``` + +To deal with a batch output that had strutured outputs, we need to first parse the batch, and then use the same approach we use elsewhere in Endpoint - may work on this as it still feels quite clunky +```{r} +.content <- oai_batch_status(.batch$id)[["output_file_id"]] |> + oai_file_content() + +parsed_batch <- oai_batch_parse_completions(.content) # custom_id, content, .error, .error_msg + +parsed_batch |> + dplyr::mutate( + parsed = purrr::map(content, + \(x) safely_from_json(x)) + ) |> + tidyr::unnest_wider(parsed) +``` + + +# Files API + +Write the generic oai_file_upload func and then call that it in oai_batch_upload +Rename oai_batch_file_upload --> oai_batch_upload(?) + +```{r} +tmp <- tempfile(fileext = ".jsonl") +writeLines("Hello!", tmp) +readLines(tmp) +file.path(tmp) + +test_upload <- oai_file_upload( + file = tmp, + purpose = "user_data" +) # file must be a file object + +file(tmp) ``` +```{r} +``` \ No newline at end of file diff --git a/man/oai_batch_cancel.Rd b/man/oai_batch_cancel.Rd index 20b0a7b..9340942 100644 --- a/man/oai_batch_cancel.Rd +++ b/man/oai_batch_cancel.Rd @@ -7,7 +7,7 @@ oai_batch_cancel(batch_id, key_name = "OPENAI_API_KEY") } \arguments{ -\item{batch_id}{Batch identifier (starts with 'batch_'), returned by oai_batch_create()} +\item{batch_id}{Batch identifier (starts with 'batch_'), returned by oai_batch_start()} \item{key_name}{Name of the environment variable containing your API key} } diff --git a/man/oai_batch_create.Rd b/man/oai_batch_start.Rd similarity index 87% rename from man/oai_batch_create.Rd rename to man/oai_batch_start.Rd index 3061b3b..592b67a 100644 --- a/man/oai_batch_create.Rd +++ b/man/oai_batch_start.Rd @@ -1,10 +1,10 @@ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/openai_batch_api.R -\name{oai_batch_create} -\alias{oai_batch_create} +\name{oai_batch_start} +\alias{oai_batch_start} \title{Trigger a batch job to run on an uploaded file} \usage{ -oai_batch_create( +oai_batch_start( file_id, endpoint = c("/v1/embeddings", "/v1/chat/completions"), completion_window = "24h", @@ -13,7 +13,7 @@ oai_batch_create( ) } \arguments{ -\item{file_id}{File ID returned by oai_batch_file_upload()} +\item{file_id}{File ID returned by oai_batch_upload()} \item{endpoint}{The API endpoint path, e.g. /v1/embeddings} @@ -38,8 +38,8 @@ Batch Job Ids start with "batch_", you'll receive a warning if you try to check } \examples{ \dontrun{ -# After uploading a file with oai_batch_file_upload() -batch_job <- oai_batch_create( +# After uploading a file with oai_batch_upload() +batch_job <- oai_batch_start( file_id = "file-abc123", endpoint = "/v1/embeddings" ) diff --git a/man/oai_batch_status.Rd b/man/oai_batch_status.Rd index 0988a21..635b467 100644 --- a/man/oai_batch_status.Rd +++ b/man/oai_batch_status.Rd @@ -7,7 +7,7 @@ oai_batch_status(batch_id, key_name = "OPENAI_API_KEY") } \arguments{ -\item{batch_id}{Batch identifier (starts with 'batch_'), returned by oai_batch_create()} +\item{batch_id}{Batch identifier (starts with 'batch_'), returned by oai_batch_start()} \item{key_name}{Name of the environment variable containing your API key} } diff --git a/man/oai_batch_file_upload.Rd b/man/oai_batch_upload.Rd similarity index 73% rename from man/oai_batch_file_upload.Rd rename to man/oai_batch_upload.Rd index 221c0f6..ba5e941 100644 --- a/man/oai_batch_file_upload.Rd +++ b/man/oai_batch_upload.Rd @@ -1,21 +1,24 @@ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/openai_batch_api.R -\name{oai_batch_file_upload} -\alias{oai_batch_file_upload} +\name{oai_batch_upload} +\alias{oai_batch_upload} \title{Prepare and upload a file to be uploaded to the OpenAI Batch API} \usage{ -oai_batch_file_upload( +oai_batch_upload( jsonl_rows, + purpose = c("batch", "fine-tune", "assistants", "vision", "user_data", "evals"), key_name = "OPENAI_API_KEY", - purpose = "batch" + endpoint_url = "https://api.openai.com/v1/files" ) } \arguments{ \item{jsonl_rows}{Rows of valid JSON, output of an oai_batch_prepare* function} +\item{purpose}{File purpose tag, e.g. 'batch', 'fine-tune'} + \item{key_name}{Name of the environment variable containing your API key} -\item{purpose}{File purpose tag, e.g. 'batch', 'fine-tune'} +\item{endpoint_url}{OpenAI API endpoint URL (default: OpenAI's Files API V1)} } \value{ Metadata for an upload to the OpenAI Files API @@ -30,7 +33,7 @@ df <- data.frame( text = c("Hello world", "Goodbye world") ) jsonl_content <- oai_batch_prepare_embeddings(df, text_var = text, id_var = id) -file_info <- oai_batch_file_upload(jsonl_content) +file_info <- oai_batch_upload(jsonl_content) file_info$id # Use this ID to create a batch job } } diff --git a/tests/testthat/test-oai_batch_api.R b/tests/testthat/test-openai_batch_api.R similarity index 100% rename from tests/testthat/test-oai_batch_api.R rename to tests/testthat/test-openai_batch_api.R diff --git a/tests/testthat/test-openai_files_api.R b/tests/testthat/test-openai_files_api.R new file mode 100644 index 0000000..8ce10c1 --- /dev/null +++ b/tests/testthat/test-openai_files_api.R @@ -0,0 +1,15 @@ +test_that("oai_file_upload errors when given inappropriate inputs", { + expect_error( + oai_file_upload("tmp"), + "must be a file" + ) + + .tmp <- tempfile() + writeLines(.tmp, "Hello!") + + expect_error( + oai_file_upload(.tmp, purpose = "life"), + "should be one of" + ) + +}) diff --git a/vignettes/sync_async.Rmd b/vignettes/sync_async.Rmd index f67fd09..2e76dbe 100644 --- a/vignettes/sync_async.Rmd +++ b/vignettes/sync_async.Rmd @@ -20,7 +20,7 @@ library(EndpointR) # Quickstart -The Batch API workflow follows three stages: **prepare**, **submit**, and **retrieve**. Below are complete examples for embeddings and completions. +The OpenAI Batch API workflow follows three stages: **prepare**, **submit**, and **retrieve**. Below are complete examples for embeddings and completions. ## Batch Embeddings @@ -50,7 +50,7 @@ file_info$id #> "file-abc123..." # 4. Trigger the batch job -batch_job <- oai_batch_create( +batch_job <- oai_batch_start( file_id = file_info$id, endpoint = "/v1/embeddings" ) @@ -102,7 +102,7 @@ jsonl_content <- oai_batch_prepare_completions( # 3. Upload and trigger batch job file_info <- oai_batch_file_upload(jsonl_content) -batch_job <- oai_batch_create( +batch_job <- oai_batch_start( file_id = file_info$id, endpoint = "/v1/chat/completions" ) @@ -165,7 +165,7 @@ jsonl_content <- oai_batch_prepare_completions( # 4. Upload and trigger batch job file_info <- oai_batch_file_upload(jsonl_content) -batch_job <- oai_batch_create( +batch_job <- oai_batch_start( file_id = file_info$id, endpoint = "/v1/chat/completions" ) From 14fddf600b2952f70b802008c029cc25c1a8b661 Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Wed, 14 Jan 2026 12:23:06 +0000 Subject: [PATCH 37/39] change oai_batch_file_upload pointers in docs to oai_batch_upload export oai_batch_upload add examples and return for oai_file_upload --- NAMESPACE | 1 + R/openai_batch_api.R | 3 ++- R/openai_files_api.R | 26 +++++++++++++++++++++++-- man/oai_batch_upload.Rd | 3 ++- man/oai_file_delete.Rd | 2 +- man/oai_file_list.Rd | 2 +- man/oai_file_upload.Rd | 43 +++++++++++++++++++++++++++++++++++++++++ 7 files changed, 74 insertions(+), 6 deletions(-) create mode 100644 man/oai_file_upload.Rd diff --git a/NAMESPACE b/NAMESPACE index 60d99e3..4485b08 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -46,6 +46,7 @@ export(oai_embed_text) export(oai_file_content) export(oai_file_delete) export(oai_file_list) +export(oai_file_upload) export(perform_requests_with_strategy) export(process_response) export(safely_from_json) diff --git a/R/openai_batch_api.R b/R/openai_batch_api.R index 5ea0b3f..246e8cf 100644 --- a/R/openai_batch_api.R +++ b/R/openai_batch_api.R @@ -231,7 +231,8 @@ oai_batch_prepare_completions <- function(df, text_var, id_var, model = "gpt-4o- #' #' #' @param jsonl_rows Rows of valid JSON, output of an oai_batch_prepare* function -#' @param purpose File purpose tag, e.g. 'batch', 'fine-tune' +#' @param purpose The intended purpose of the uploaded file. Must be one of +#' "batch", "fine-tune", "assistants", "vision", "user_data", or "evals". #' @param key_name Name of the environment variable containing your API key #' @param endpoint_url OpenAI API endpoint URL (default: OpenAI's Files API V1) #' diff --git a/R/openai_files_api.R b/R/openai_files_api.R index 5e48999..fb03bb8 100644 --- a/R/openai_files_api.R +++ b/R/openai_files_api.R @@ -13,7 +13,7 @@ #' @export #' @seealso [oai_file_content()] to retrieve file contents, #' [oai_file_delete()] to remove files, -#' [oai_batch_file_upload()] to upload batch files +#' [oai_batch_upload()] to upload batch files #' @examples #' \dontrun{ #' # List all batch files @@ -39,6 +39,28 @@ oai_file_list <- function(purpose = c("batch", "fine-tune", "assistants", "visio } + +#' Upload a file to the OpenAI Files API +#' +#' @param file File object you wish to upload +#' @param purpose The intended purpose of the uploaded file. Must be one of +#' "batch", "fine-tune", "assistants", "vision", "user_data", or "evals". +#' @param key_name Name of the environment variable containing your API key +#' @param endpoint_url OpenAI API endpoint URL (default: OpenAI's Files API V1) +#' +#' @returns File upload status and metadata inlcuding id, purpose, filename, created_at etc. +#' @seealso \url{https://platform.openai.com/docs/api-reference/files?lang=curl} +#' @export +#' @examples +#' \dontrun{ +#' tmp <- tempfile(fileext = ".jsonl") +#' writeLines("Hello!", tmp) +#' oai_file_upload( +#' file = tmp, +#' purpose = "user_data" +#' ) +#' +#' } oai_file_upload <- function(file, purpose = c("batch", "fine-tune", "assistants", "vision", "user_data", "evals"), key_name = "OPENAI_API_KEY", endpoint_url = "https://api.openai.com/v1/files") { api_key <- get_api_key(key_name) @@ -72,7 +94,7 @@ oai_file_upload <- function(file, purpose = c("batch", "fine-tune", "assistants" #' deleted until the job completes. #' #' @param file_id File identifier (starts with 'file-'), returned by -#' [oai_batch_file_upload()] or [oai_file_list()] +#' [oai_batch_upload()] or [oai_file_list()] #' @param key_name Name of the environment variable containing your API key #' #' @returns A list containing the file id, object type, and deletion status diff --git a/man/oai_batch_upload.Rd b/man/oai_batch_upload.Rd index ba5e941..220348f 100644 --- a/man/oai_batch_upload.Rd +++ b/man/oai_batch_upload.Rd @@ -14,7 +14,8 @@ oai_batch_upload( \arguments{ \item{jsonl_rows}{Rows of valid JSON, output of an oai_batch_prepare* function} -\item{purpose}{File purpose tag, e.g. 'batch', 'fine-tune'} +\item{purpose}{The intended purpose of the uploaded file. Must be one of +"batch", "fine-tune", "assistants", "vision", "user_data", or "evals".} \item{key_name}{Name of the environment variable containing your API key} diff --git a/man/oai_file_delete.Rd b/man/oai_file_delete.Rd index 87b8d49..b552eba 100644 --- a/man/oai_file_delete.Rd +++ b/man/oai_file_delete.Rd @@ -8,7 +8,7 @@ oai_file_delete(file_id, key_name = "OPENAI_API_KEY") } \arguments{ \item{file_id}{File identifier (starts with 'file-'), returned by -\code{\link[=oai_batch_file_upload]{oai_batch_file_upload()}} or \code{\link[=oai_file_list]{oai_file_list()}}} +\code{\link[=oai_batch_upload]{oai_batch_upload()}} or \code{\link[=oai_file_list]{oai_file_list()}}} \item{key_name}{Name of the environment variable containing your API key} } diff --git a/man/oai_file_list.Rd b/man/oai_file_list.Rd index 1fcdf97..55c25d4 100644 --- a/man/oai_file_list.Rd +++ b/man/oai_file_list.Rd @@ -38,5 +38,5 @@ file_ids <- purrr::map_chr(batch_files$data, "id") \seealso{ \code{\link[=oai_file_content]{oai_file_content()}} to retrieve file contents, \code{\link[=oai_file_delete]{oai_file_delete()}} to remove files, -\code{\link[=oai_batch_file_upload]{oai_batch_file_upload()}} to upload batch files +\code{\link[=oai_batch_upload]{oai_batch_upload()}} to upload batch files } diff --git a/man/oai_file_upload.Rd b/man/oai_file_upload.Rd new file mode 100644 index 0000000..9386629 --- /dev/null +++ b/man/oai_file_upload.Rd @@ -0,0 +1,43 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/openai_files_api.R +\name{oai_file_upload} +\alias{oai_file_upload} +\title{Upload a file to the OpenAI Files API} +\usage{ +oai_file_upload( + file, + purpose = c("batch", "fine-tune", "assistants", "vision", "user_data", "evals"), + key_name = "OPENAI_API_KEY", + endpoint_url = "https://api.openai.com/v1/files" +) +} +\arguments{ +\item{file}{File object you wish to upload} + +\item{purpose}{The intended purpose of the uploaded file. Must be one of +"batch", "fine-tune", "assistants", "vision", "user_data", or "evals".} + +\item{key_name}{Name of the environment variable containing your API key} + +\item{endpoint_url}{OpenAI API endpoint URL (default: OpenAI's Files API V1)} +} +\value{ +File upload status and metadata inlcuding id, purpose, filename, created_at etc. +} +\description{ +Upload a file to the OpenAI Files API +} +\examples{ +\dontrun{ + tmp <- tempfile(fileext = ".jsonl") + writeLines("Hello!", tmp) + oai_file_upload( + file = tmp, + purpose = "user_data" +) + +} +} +\seealso{ +\url{https://platform.openai.com/docs/api-reference/files?lang=curl} +} From 95b4127bfa0b478d23c1488f6ca6368861fcbb8d Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Wed, 14 Jan 2026 12:31:28 +0000 Subject: [PATCH 38/39] add curl to dependencies (for file upload) catch a 'watch' -> 'each' typo in README add custom_id to globals fix error_message -> error_msg in HF vignette change .error_message in df_sentiment_classification_example to .error_msg and then overwrite package data and in df_embeddings_hf data add oai_file_upload to _pkgdown.yml --- DESCRIPTION | 3 ++- R/zzz.R | 2 +- README.Rmd | 2 +- README.md | 2 +- _pkgdown.yml | 1 + data/df_embeddings_hf.rda | Bin 11534 -> 11527 bytes data/df_sentiment_classification_example.rda | Bin 410 -> 408 bytes vignettes/hugging_face_inference.Rmd | 4 ++-- 8 files changed, 8 insertions(+), 6 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index d8147be..7916adc 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -35,7 +35,8 @@ Imports: tibble, S7, jsonvalidate, - arrow + arrow, + curl VignetteBuilder: knitr Depends: R (>= 3.5) diff --git a/R/zzz.R b/R/zzz.R index a925ff3..e6dfe96 100644 --- a/R/zzz.R +++ b/R/zzz.R @@ -1,5 +1,5 @@ utils::globalVariables(c(".embeddings", ".request", ".response", ".row_num", ".data", ".error", - ".error_msg", ".status", "original_index", "text", ":=", ".row_id", "id", "label", "score", "verbose")) + ".error_msg", ".status", "original_index", "text", ":=", ".row_id", "id", "label", "score", "verbose", "custom_id")) .onLoad <- function(...) { S7::methods_register() diff --git a/README.Rmd b/README.Rmd index 3ad13c6..554a507 100644 --- a/README.Rmd +++ b/README.Rmd @@ -276,7 +276,7 @@ Read the [LLM Providers Vignette](articles/llm_providers.html), and the [Structu - Read the [httr2 vignette](https://httr2.r-lib.org/articles/wrapping-apis.html#basics){target="_blank"} on managing your API keys securely and encrypting them. -- Read the [EndpointR API Keys](articles/api_keys.html) vignette for information on which API keys you need for wach endpoint we support, and how to securely import those API keys into your .Renvironfile. +- Read the [EndpointR API Keys](articles/api_keys.html) vignette for information on which API keys you need for each endpoint we support, and how to securely import those API keys into your .Renvironfile. # Batch Jobs diff --git a/README.md b/README.md index 7feefb4..3dc38b0 100644 --- a/README.md +++ b/README.md @@ -288,7 +288,7 @@ information on common workflows with the OpenAI Chat Completions API and encrypting them. - Read the [EndpointR API Keys](articles/api_keys.html) vignette for - information on which API keys you need for wach endpoint we support, + information on which API keys you need for each endpoint we support, and how to securely import those API keys into your .Renvironfile. # Batch Jobs diff --git a/_pkgdown.yml b/_pkgdown.yml index e782087..eca46bd 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -130,6 +130,7 @@ reference: desc: "Functions for uploading and managing files on OpenAI's Files API" contents: - oai_file_list + - oai_file_upload - oai_file_delete - oai_file_content diff --git a/data/df_embeddings_hf.rda b/data/df_embeddings_hf.rda index 66a4a80752245b30313066cd79e806c471afb234..6cfcdd59fdba650491daca48c497a6154ca10390 100644 GIT binary patch literal 11527 zcmb_>Wpo|8mgX@tGh@um5HmZbV`g?tF~`hopO~4MnVI1@W@bvvOfgd?_r7<#*YxyE z|Cw)Tl~h~Ox1qLFT2 z9%fk@`@;_bIsgDdun$1w6{4pP)xMfkWfuTt?Jf^010b{!15k^~#{mF?@5RkgunlUc z75;@ZC0u|UF`g1GbA-r{HQGFyoFMDHIeckuu6c|A074T&$k5Q7moen$L>dS#uS~gs z(m*!o+YB3TQAK6w;g0|Y0%ZwF^U?T{66iG!fWi>6bTFkI_s5bPa&COJp*s#c2xWwT zboTN>`s2tFHDpElXJsr*9vDs7%R{)3Q^3djqdoJMku z07`NmYvGo*JWv^nCZ_}iQZdMiYqWeP2YsHwFPbnXqqOLaHo=Z2!$etP41dAKq$#Jg z!cIt`G{G;vM3Egfox|$88GYfLGNoNnSSUl%?C*CCkv9!BAm;}W=@Of0*Z?ko6_j)U zY=(6n05F3BfKmj^0Js1se`EkISn@xSB?MRz0zkPxjLFP5xMfZ651xUfvE6(s?SBYV_fEUBI+x$w+lBkcGM!WIi+P@8CG73 zMR$;cla>KG!84yP(~Co%cDJbLEz{*W@`k2-r@PH!X{&sdc>)D4lDN+{>Skl&Ox)MI z;x|=sQh9k$*!|^1FwG6CqN8?q#19I@Llt1}j3<(-C&;mMs=>ImEs>e^e2y@(H!?WSp}wuYRdb?dtUKZ&IB?) zhlUq^ptboyP~}Y2uK1O2qoCU!td!(d=8O@QpxNyA_0sAERrJ8n($bPJ-EX}$CAV(% zI9mD}XzQD*k`Bb0ixXldlMR!T4{~j!*6=R)1Ey&{{rpNFkWgUt;^3TjQMD=1Bd+KS znL?m{D6(ruFV+8y2Rnr+!7=oG<^aP%Yx4Ge91kUgH^XMrN8HdT0SgRbZg77?@j-Vp z9%L_CX+ubN)UIL-jBiD#Dxf6blZk>YOCc{IE;6Zsy~2oq0&qu}g~_K;oA3vsQ<8Ry84J=D-;jVSI^!O_B~ z2?U}y1uGz0AyPQiUndMAu@6b_reyJL>GgGfHVxi=*z=I4_lMid80kvv8W@LD5jteBTqme zhnAy;S5*il%#@%1b+@jE72rr1wajbp08FUdS{kT#@hHKeAez> z)}9w{Njb&gvZAN*;Hs8ASE7+eu@PX)gNe4~QTc}FL^m|wsT64AoFTqQwN^AtgDF9G zl1j?ibQ&~pExJw+kA9FfLr@B%h98}74-tmBzuv)hkYptFy%OliSMvC%F|X zKKyP)v>N7yoq+mx4io976y(lu`WQ1p@Gb=3txhA|W}WsD^GwcUe}Mk78OYdKElT7RzdhbtSVWHXp1`qM~PBCc#Yec>sRcnJ!bc^fKbL zfoaBhGe)N^vR*hjq&&HK^@v!$f&^q77dc%xZe@q-ItsY$BP=lUn^Jtcj{SO~!AhS1 zPG5Su49XC(ni=b-`g8s|rWhJ`=WL#N19wB11YN5BHbN_uO(qg|43BH@5*u-4P=P%O z(|KNJ+0nS+uI|m4DZc#xmKo5L4#fdhQ@b+>Lw>kfmb&Wr$7d!_*ukb?Zj@BlpP~XK zhJ^0HF~l&XCsA-<__-Y#IK+8 z9FM?PMWWo)acM+6-_0gxR$^?v*LA|LQE+eQbbc91UJPV!!s%!s%5?2s5RI`{YY>cT zPXQkxN~0FnVo*WbG2@v+Q#TQ0OwLp<{O@+AVD`lCWOI>lf3UJh7WjHRU-Z|m{5PPq)c%3WKeGRSwYMJ>$*I<+ zLylVZJ@YR^7yXGMa&h8gNC-@c1GWS?G%VNU_tdSq$JOeoiHv=x9=IR`@)eO0=&fdqhU{Ld|%i|Z;jcCMr*C*rcKlH);4Fs%A;S8 z3IYQAqrfCJHHUwvF0wxvQRJZc06-yD>c|aF>;7669idq9?f8vV%`Wq_i|qBg7!are zzyY*gZWE!`%-&=l51CUxavD?UK||UHh+Jlv5k<^G!gx9C;TR-2y@)`)Mc4J9>*gp$ zH7Z?LzTzpCMioOp`_|#$!m7Q7XMI=aVCsAMspHsrwv}~60T<3-#I}Ab27m@YAbw^P zoKTgVc}O)Jms<$n>MhCZ&S-ZIlNy%n_P0kQ^S7j|>Jmm;9-y|4JeR91Ldx+DU$CBAB327-}FUL!5R#WxQ;i4swB$P1 zio6mR8ppT{Kpya?Kub`J)^e!H7{z9SHN&rpF0s0;1KZY!CS+%I9ug~mXxzLDo8|#b z&~HBVpcFJt-n%P{b4U_{@sk3yW@T1n3V*0?f#;)6Bp8TvqqNeC%FVG!s@ zITlycSqbKxcKQI%WX`f7Yt$qA-@KvF{g|p*Km?>V@~g?0YlsqkRe1mNJ_QQyqwOK@|Q5jo2RV=nFD% z4gG7(hU*q@-GEGttD1aoJwI+YQQdyIPa=$2!Q0F<0jeyz$_H_14J%bK`hBhrM)pUA zko4W-@#3lTOZ%c-`hG%w3$dkWn*D>~Wq7^r zFGj6R-<~z%aEfZS{z%;T_SWI6@^I0}H=6Bc*d`e3vfg6|k!Gr$V&yQ_qR2v5u?eS%1M@g-7zNlcQX=dS8 zMuC&vUBejuv2xzD0`>1mG?8kNcr4|rUh^x#Fx|mg!U(wI0tb@#6Nd^!%-^E(MI*sT zVQi;AtU)_EhTu1f!TL)WL2`3G!S$x^i2Biv(xL&s{7S1`pH|e6q0#2&>hZqrd28S& z$ylrRW!zNXTBRL*$96##=T99d1Vz-Oo?33C4~D=q6^{^KUW z+wLV`mnV17cA_hld9}@t_u$m_Z`eP0*ZrFUN>z?}p?!+EW!)*ub+aUzkjrYe;E=Cn z-EPXsu4O%IXWg!qfB26Umv4408;7{-cJ1H)r3?~~R+?lu?3eh{axP-k`oa0>`N@B3 zbaV6iZ`|G8Nnrz}{!!yuvTir;lBH+Y^8Y5E!u;EpKtL7tydu}UBDpB*mpu+P14l>m zEMH;PO`cral(W%98ijNY>&AwE;^6S=@4WAAxc&EAZ~b3p{^isEtgvm0Z;RD)-L1!{ zz-23E!=9HY8ocY&xp~p|Z*GJYxAhu9-}%x%-0uHvJpONr2u7?AHbf{oezX}fhCHoh zbo`A@BBFn+hQnCr>G%`xLg}UdAt?WwD`2M=H;)*s9*3O7sjKegl#{Iy+xCqMcfstb zLo+kuf7pLL+qG=+{+DO>-`EC(ZIeRzx%4g)mpf`(h?akW@Xt(!b z+vsn&c5ONQ!;OY^%12U%#XpoF$C)KZo%nw?2>x{aohOc)cx@N07jQu*WQXy4d_uot zw|&4pf1!Kz5B^~^+FOEF6}azNB>gliJT=z-)h)DBC;Dacmq@pBE*>Ug;)GG=Q z87c~fanz+`^H1-vb+qerzR#*)W8;@KAhK&0Vp398!h}cyDH|9fDk}OlqJZeYNE|5@ zNgWJTCJhjiDrA4e5DtF9Fbq%~mktS9yJ8pv^d|%`=Gsi42oRfgB?c~{R~3E4Fo;1I zOO`ztKd-E;-juvT7?0DI5k|v|98?kX4cmA~1E|Un6bF<7qDvX)QRWTdM+|950W_qf zpn>q6i>kpe+8{9u4Rh@%NnE-*(_zypZMq>0P?a>bB90miw8`a62^}b)l9CQo4=pMR z4Fh428|U6u;xdMUx2t#fP$d$(WM*8QA|rjeUnN6^l4;nBI%YP13o`?J_%9L%0sllpw$lFhE`YurHz;!Xn!m0N7N>5Tiy|2M;7N z*>ZFEMd-v5ve`m*@J-LPK+3K`d zii2iNN8Yh%r1nq`p-|h^ESaA!=d0C)h8(`T*^p?yXGH#xS{2j`^3eLF${Hz3K&RVYIy$7}?L|b3mxxMS_7X`&#-*NhK3vBFmoQ&M zHMYF#a?hT|FZ{DJ7aJl050{1qtD~fI<}XWXS-E!J@mNrtsy zG8T)}*Va!3+50$m6I5z+JcXDXkU#W4q$zIE2D*~(A{hh+HoT>PLpZ>WT0JC_1OLev%l7yrx;dKF9S-x_6H5MDOabfFGg$c z?%P7&KF8@N(*<|LDewY8Xgfu{A}Q-#Gh2)Z`S#6VZse<}C--7ipha#{EdJz>gm&^E+{@EkN49nx0iT!?;STJ zmf)k(a+`Ja>+Y$)n1?S?3RSzj8uSv;)ympB{o z*B9q94}Mb}ReX>r1gQ(C4XB2`XTju=%Tzk=?AbIb^WC=5J#N-)sAmf7J0TnGN2yOu ze{}EgLZf{+EFk(wjDU+)mY;Rez&H#gTTMfXJuEj9F!-a&uR&k;M7%>C@M*T8iMp3M zu$)x5Z_zMUS}gcZP6Ik`l&b!If=pBTy?1-m*Yiug0+Z&csUN%>?6&<8J=IobW2obR z27*fVb-K&m_vQzePDmy;@k)t)_>5xsWHVc_IS4vuZNC74@11by=Oaeb;{w*b#{ew1 z$WGm(zNnWSi9F7ek8^KbS)rZF?@~fvGiJ)T`Uew`@oDCj$pdh=%1oFc+M5PmiHPiX zf|^Kvhuuf}nHrYE6>(X11?e|^ZVr7G6m}L&7%TKdBG$rpAFKWFs=wKCgP|P&br>4| zn1w5_KP{T+vDwWLe$tQ5lT6zI43wn6SWSgQeNDLg6LpBn6j4v z^M}2P2B(lQseE-PlJD%NRWSn(7_DY8=Q#2|$c$7irSw^in6-^Rh9Pcz$PCXw3Y#eb zUX7}qHhRE@L`0zskh^|^%*0#(M|fH9^J4tZa^}bv!a# z;3njb<=g{^wEcNim@x=y?Sc2Cb~fR?2DM2UTCxt~BYOQpyjwhSpKekZk&)Ho0kBlLTLykgatwa^rtwpzN#OfDSLubj4GsoJ_2M_PbZxysv}>0qny=6c9K2v zuFUD7D{YE2@05>Ywp*(Nicq2Wx+e*wD2wwbfaBL{ojxH`((NZke( z+)phIlN%1@ZBijFpeaERp(_;OzXDlP&*TaQ3#B*ua+)NJ*hg69FzsG6KeQcDH6(a9 zf8X3$;{3VDh)Ov?p_b4G8)lm{wvUcAUCzIGky@QqcGY54eOGCt?w|kq8v2I(IbppD zPD8uODO($QE{jHQcdNtN@#ZsuK16M4tPIp*Ph5uKPt>h8_kpELpD#y^7ZG^jh3+I# zbw^9W`>RJiHxo=DjW2wtkuOO=1f6QeH>29ZdjwNLq2$k{8n=U3u&mvyH~p8Y1+1K) zaO-_TG$;vXYxYyI%$EcTv}qh4Qb^3yWA7!F@4>NQF0nbzur!11 z_DRE|kiNXmcVl8ikS=4AtZh;Z{_&#Bzo7=zvzpxZ-P)^2-BW8wSj#HLoAB?_>yKSh zuerFvzFzEEzrD>m9R~Q3tiPy4oDQ}2vWvcmp$3E+ss#8Vd{kEezW6!^N<*|z1oTtT z$G!TFFGe?|Zw0OA62ed=##XsdsNfKaHIrHyI&(BJ<4r z+Dm^!%StLkP#7$>?8`LA*4s0fHh2;j)L%|>sdorv@AS;4$!$KjxfBC_NqEEsEBTYF zG7*;?R8cjJdY|{;Q{MHU=%_n$GdO;ATcUqoOQP#}2Td%LQDzWsRRMe6#&KUXB(QGj zO1ue^b55BNuYSKko_5VXbWfjr=6J+k)32^V+pJLoY8gV7*%bR}hM@(_+zU12EthAG)iFWS$2V^EM$IV%rXoic_><jcTp%U^hc>!e9fsIM6{mIju~rYNt==sYK$!6|$qR&5YQO@4bt!5u#NdQPMv? zXG@syyM=n4YX9`g5&`ccvSOX;j*jiLq9hkm{cz?-sxh++*9qyv??m&|#4s#qNAmI; zd(w1BGdFbG+tst}V)C}qM=lYn2D@ptjI_11+^*IrRr**<-R1Cl6z=l5%4*&jB!2dp zPSnh%>8IiemqM`wM*@jEOw&a zTl%t$LThS-En;0)7I9m(qx*_@cWp5Dbm#Rp36UyY0iamYh?M8F4X~#=ydywM6zN`2EURXnS<%H7m2Esh&7ij&7;fA2wEhd0ixn=2fr?QaSGq_yEo z*nn78C8mg?*?Q}y_5EkjWsQz3h~394Ira4ZW9S6M+S^)#^RuAh)@>cOk8Y{?OveM& z(5y#?d_E3wH%VB_v-AmBrnzDfxM)MU6Xk)tYg#Z(Vn6)#p_&7g$aiPpjWjJ-O_^ggyS{M~_=xh8`RFF4UjMvN=$zo*f~;>3 zOX9xzi%m^r&GOg_?)k<6y;|4eN{jN(#bjX`ua)B;G!Ua&6gHc8>e`ZUvGU?&%58aa zPW?`=Q!_tN={|bW!pArDO?1*Y5?tO7c+m_BE3Clx3+Z%P+W0PNCEIDLs~wJ+DNj8$KB?;JV7&QAN$A144IF-wP}_O za-N--bCiB-R$xytWaJR7OHT*{IPcNWbCZe%?%H@b+^|1*)g}X6u0pqHElW+elv~9lp+L(EWIB?_5#JE^QiHY+O8Q!S>l$ zv?eJq>w04tIRFFC`Ehm zU=1)V&Da1MfBQboh46MF*lQN^3_Dk-Vg>u7EYuTYo;T%-qz87xYy>A-V(wR*Y5a^N zP3Y$NowX`Vrjsraq%nh%X+Z}8HMRDqiiprpF0=kcN5Uf<^nkAg&{OCEkbEMw05D43 zm|Bz#wFamnuO6R{$uuttI|#%8V!|=yU|{$op{$ceWHQ6X0dY`$ob3LD7(`wWR0S&d z9>ES`3dV>i0Oi2Ragp{g&L1;i{`dhz$G{MU7PT|LNlfgWY)Y=QuAY|~wOO#_9_Cud)ceifH_2iMZJ>Ps=s;WJKZ-$mi9A@cygSn!SZ7dY=j}gB^K86DSfgo2@a|ATP8nZ#dj3jDSjsD3XzNg`xYNrxa4Dj z=u9IM9Mtv0O`JUIvVGTMRb-PUVfo$oATlO=L0mfP8wP^IP*2Y|amDPQEFx5(mx5Le zGWyl`Q3Bq8VhLsiG=&!>y-;iKySvnZ)=!vLnysja1njDmdwf%jKdQYZI=Glq!5Z{p zaPF2sD8gn<84l0It7WUHV%3=&2#KL*1fk=EP!1h9UAUV1QL={`6@KgY@&>C1m9= z!$)AX+7|EAEaOVaehT7z;C>j;h^=FxNM<5}nFoI|md)O8M8Cq+MAdwItCu-V$sObQ zEa%;wi(5Qdmq-|Tvp@F10jym6+@jkgt5nqAYa9Kg_`=$Kb)%(*$w^5jIu zgxh~PFn4G3lyTp;xdippO!rp~O)#+m>?mIa#VXpH7JBTPp9sCVx|G_$M|sqGWEPL0 z5{(4irs>@BnpH^pujL;HKg{=*0S=)WL>%aXlaneX<99>@x%1pTe5CRCeMe< z+s#U+Gjj;;O`9h-K#$~4B6s63p;;-&w}dq**Vc2dE47}+r0m6GT=UBVv0kL=IiXWw z8hgj=bz*Gmi1aicWZ8-3p%z$`A?m4fB+#_$ndx2YIu}cNi=<)^V^)h*YQO{I zL9{Y**K{q&D|<4j_&|t=tjVuZKeBFs-eyKqmTM~`LHk!Z7~&v)_%n2+h6>YCG-$UX z$TK^U+n|ZF)o}Y(iMz;tC&|tNeum&Bc{s4>ce0y-nehD~$j-VW2H0VAj9sv$lC%;S zy*m$VbKKNfEJCOUZ>=y%G~Mtl)jkMS#Gj*nLeKsNQ3%6Rql>1$b$X|xRy@h#Km2p- zoYFXMv%4ebd(xH2vU@|=>w1&BUx(+jmj`EXI<3#OdAsL;id3V{l#PCuvUGZ=dfb?OPF5S5jJ^DSbzj~cXx;2?hq`vyF1@`&wcNApPHKL z?y0AzyQ_cnOoU#n9-6qG4Ii7BF15O@w1lQvJQ@S|A_D*bAa4RNE&^I`N;W@vqe1>L zcy<7wnoE=_UC{dXxIloyGB5fv0DyYa1rQ+k%eMjmWjNIk+ieFfS! z1Z>8Ng^d(ylV)bh2}qX~1%_BmV9okkDCPNJ^+RU?6Yb1J?Xs{GmDp*14iyRsVMzyn zFSt!jq{t4Yo{9gtVrP3qBAQMu?+jWolU@*!kft34s)%7oD{*piVtoqY;!sygaHPp) zD?Oy7t3`!R;}TS#zn!B_oRgfuk}2~e2j{ER*<}$>5~$B`535MX&e^F<+bEKms)*(I za?Y>FZO1eeDIsan<$t`EfuGBY@~q?oV-yz%U_@J!OQZ0XhqxZWGB$aW);LV7&N{T7A}6WjAhZza{l#^F_D%QT|!SqseP$`O{c(@pi7EPu)IN$O25y zdxwoDf3=Cgu;pFWhc&EFd-FO~IZ=jSY;i;Fmj#Q8JQ83U*t%^R>&}b&9i$m(k?bqe zPn-r_4fqV2JQf^e=V%gKb0Y9v#}*y-PV@D(8K8MxDI;~Z=}QiiT*pBvg>>ed z=`IVgt7t$Crr;=;Bwc#;ltIRvVMmu?ULuH@t`TP#Iqw3#K6W{6Nu-`w5gaERM$57) zgh?GMDEjA@9ncu7RL}4r3_jJYh_TQ;DflVCsbz$Apm^=pZQTwBwx)G+C`w28q{?Sw z)8FG3Tt^(~*RFUusYP}xah{aDta3a%_v&-r_WIGXpx$_Jv^11^=;HC?-q;Se-s@xi z4gq>U2O~EULkt!J8Z!p~oMsIsNA$)dB|Q-_2PX9dqlJbh;004*iXtY+Vicszf4yf2 z-#=cjyds1@D;hsKE1t8Ue`u_!P!dYf2(?!LQ`;_659Hg%1J1`Gop5c{Ti>9Yzp zC!(NFw806@#PDHkC`h8yL3u=aRYyw>JOrDU_|i^H{L*`t?M_Nw=72+OXe&!19t>Bd z=uYOLhW(y8-qiI`_f;j2b#ELwn>hGQ+Z-PE8jm%pogqSm?18f-GB!38^dE77rJ>m1fd{f9DhS#RKLO z#7!Jl$>Vq=%=pSF`c5frLYG{@R%i8?+FG!K(^|*NPL7KYYS4VG%ERaE`}%}Aba?zp z9g@+&h6ck(jD%?`d-g8WJ#VM_Jg1#Octi~a4Nbh?GdyrJIVP0cv@XCzOBTj(w$SAV zw1PEwyx;n1+?KozPW$5$$fgDtGEuRxpU{z+BoSQ@DJ9+4T0%GVk(SotiRz;!o;K1sj^Weq2xY2_^1w5iGE?4ZhT|PZ9kT zciAc2aV)$@6%0s|TsKNwyca}S(yn_=nTCSJ0AMg)RBcEn*gvV>P_`chObCT41P6_^ z@GwlI55a(PLz);?2nG|R2#ofbtCp$f9A%Up>I2WMPjS+Xf%N>J8ETLMY9n>gTSI{n z9Kp7*9KgctsZ12S*E{tMz4o9;GC&Z3Tzr3P*G&#fLU;MfsAv{2?PFI{YB__VkHiO7 z!GtCZ)+Z~d(f6A2JWMadv3P4h`baOEpua91X(cAn{Z5S`Yr*@F)0R-Lg!Zg9)q}zc zVC_l7{v-rWkW&mPrC`+&GpV(RtRTx9WS1f<$AQH*!GJ*lps+&oWt@Ke^fiyz#OT

IJ7dEpUj>$Cl?$M}LnSqp!bVGv;8$ZFF27Wd}|Ny6vdxG!f*YPv02y_F|? zdv_@I1b!78cRtv-r+*x;KZM5&pP%Lh9Blc^0%-}sC1!BthN;0}eS|2!781TgM*^n{ z)Ll38Y-N*2C>FZ9IoXGHNTdxV0f{1KR^d2Zvop^#SZryMv-O?@GU{ zl$V(Ff(|PWFR;-sr#2Jp+g%#+OPtc@NM+-ucu2xeI++^ofP)rBmC%ilRSwwC0b z?NH3Gd_%3`)tW0kw@G=_BCeO^Ad{ficf@VE7W@WfPeFRRGF&6|oQZkXQnvcZH!;O& zqy=wJ=|wp`@yETiWAz5cmg%PqB8EnKiUdy{a2E~Fok~Q;wq&Yu&g%L*HLhkDkN0SJ ztpUZ=_Wel)^zJ-LqYKUNxgo$mVPQ^&z_{!uC&YpdPxHlx_!^Ow|FY?G&JIkW#O^xp zj5x$}2Bh{&zEr$)jlM~|dj7?Q5Px_2QOo_but$Yn=!I4)XsC&9O@g0TjBDRncA+N+ zo~3b($OL!{FPe1N%^@&7#A;-QDPWy8^an;i;Tp02$lvqU2Ivg_-KnZV-@~*dfuxN> z&b4o9nat`#OFVCJohzf=nU&2GzmjyxyP7fmxZjBD<@`dxW~d;?_N3+XIgz2qX&{LG zOE~r0Xt{3S`qj4(qq$O&SeUImFYavtP#j7)ZT)egzxCHB?!tt1->%S!*$F=ISOy*}jC8fm)y&8Ynf+O3A;1WZE*+NM1XONwW!3q*!7?IJdxQ zXza;*$x{mNqR*{PaW*@$YC*OEQm+d2uEWz1XRTT|ZXO+F&dt)CKr-z%WNINvFv4Em zXSyvEnL1f-E$9-4K=48>y4Vg*{IY^+l>vh)=sE_le1j~+GZ(kV1mY#i8|8R~@crEGul9Csgz{#k*d7zH-At zK6XVM7yIP9@-J39H%|Z8$9#eu7qeTGW7C>7v9oQybPNHfO`;MS3BM}rzh22iv5>4a zo$PMb;og#P_?bEy_9UAjm3B+X|Dw6M;b^W{`M=&!_tPDll@ib?rO5s#Bt$scBE?j8 zi;)Ca80(_A%k^=W^A)x;^>@@cQ+I=8W`owRd#eHhWdG&Ee+;`CG~2HJ$7QB~U!o}* zRw5ocdZGau5;Oh~@%BSzwf&wi4b)tE%;1h|%bF99U+40FxljV|s~$raZSMaeza(y< zQgJdht-59WQLPuk2 znG`Xa0`@&*-pu%P5w?DP(_T9B63Co9aBE#Vb#VA!CW)6)(`r%T3mS!}Ylb6t(yGIx zD^-G`HK{_|^#)AJ?jzWA)Z^k{<8e~t{p{{vJm^pCU;Z!uGOIjz6^bK6jw&&+79B@~ z?H}pi+R>oTcvmjI^;`ysyj~l`ed@j@yhvL57wrQ7S2E<)ZtLaUmdvh9e)pr*kwi5N z5ZFa))p7gQ;VHoD(;V3O{$A&Z=s{n%@*i~mZEm0b@6`y=R~<&H|H<01;J*#@f4%ch z3g^=#}-?q(|s?6vQZb|fnBH<@(XBx4_#S~N@)DVCZ;M0s`NJ@-Gy z?avO2mX%;4U!GYxF;R`mB~0M5XPn;q*}aw>eLYFZY_(#W6WxjNvq++#?k8hWj{@yc zBBU>@FlrxaJv>`7Mu>R+{PM0*4`vR--kI?6Js7~?d=$Wl*cqN6rxtw(8HB!uL$Fgf zWRT55N(}@D(lDciu;H))Bl|UsvPv~2L7JLU5HNiSBwk|m$fmB=q&_w;hnAL2%>rar zD@`RSrG<7WQlH%-69tGQs4pvGt-i!@HC$BdO8&YeM zk0DK8ix^opnC(L+~6=F#`Vii*O*L1_w*H1&mR!*aCzeCZ}ux!H59{L=n= zx%R0v{^Aw+?@I?{aE-`WF!?tt;(R|i3fFn?+jaHWjS6!P-Fg*7#K&SF=|oq1STgdR zLt~ye;4Jpr3%d+giF;lXib#RQcWaBnk+S%#;UG?@Zv+7XB3crZ0=N(k-~n`;V5DaI zDV?pqW55u%Z+{p2FZL7NIBSa^t)(Jul`^F_4;hgewN8?&Y+bJ{<8(!hTLY4X-^Un0whxGfry<;*t)G!Rm>O8}@I$Zax2GR-BuV^bVP# zM2?kUjqym_`?6R7m+<*=8hNA~tXroOQ1_S>62Va)X)KO>308N#Fjzf&T*eKWQ}b43 z;D8}K@P>+X=Ykv6ej@Ywjud?Lou5;uxk+AL-GUU2|BuO=etESa&-Zz8kR%(NGrBp~ zn-Rny#@B*P#%h&z`sAh*KG577@<}aj@0ReaE=cbzbEh?9HrW9rW|ut! zPd>I)v4yE>yjxN4ekXyiy};HYqMgd^Zfy(*8<~D9dHEH8RhRi$(10KR#JeGnw?=U% z?iV8dkw#*}3!>G#NXU>nxGvWgiYO^(a5>+-E+j~z=(9f*)z#W<95hiQ<4XVg%t1bF z>h#aAsU7UB1}{jZ&lF&$Xrt>c8n*&%9_g1S9cf3i76K`ufgf57@|z6h7|`U&DgN=~ zGY~m7qU5}MjIlX*@*Nm-minqsKKdH(1MjiV2F)n5oK*g&SARy8jHz47)q4V;h#frT z`9`iHjvl%n=t!Ati^wzv4j`z6KG|M?dA4U<%>0{ zhEqY^#^kr29R{61A@GX#NRb-#jY+?RoD|kiUOTo9dS5@eY6W9*^~<8Q`Gsx01>4L_ zW01XH|0jK)YW+ef+{|I>l7_CRRyWIEmOvqWDHM;tsIbq0H_;L=Ms-f-#O@9+zlbc7 zCnikERSXNcotb(LaxoO?n|?-Y*X8@v_?iy9xw#8d8%9`PHi}tzYy%{p8RyCaC;p<_a+8#T|AUkc`aknqN`p7W?}rLgeV# z?E1%Kd6>vs%}V&5EEq zJ>wfV#>xB^=g=jp3T9~YIk%!DaH!8Z-Kn&P#@+k6W;cIIWD z_I{7dQ`uJRJ0u;xu2HJFx=eaK~erzWsw^CqLRH;1vcd>6L1Dn47r`F9jUa+qY zh{-rakCx$o*OaYN1qN6`lXR&cdQ3*DoXp7xE_R62cp}=fFc14FExh^m&}&tj z)We;)R8eW~$osj@HUn>&QDY`3F@4gELJlm|*V~ncN`2;fJ$Cl#n!K>>zf7q9tm-G< zHs1w@qQ7`WMw2sE%)8Z!IONH+ks#z;#ua0+$bM_X-}%!+?~3yra?jc8k}}@Z*T#Ix2bX46EAiX?_<~ zGs#Y)b*+%ju&Q5WFl81qQjnu$?vAM0)g?=Zm?j`wh@PWCNsLypl=Icl}dG+a=YwUHh1e{=#+`5r&SLV=NXKK=xdwdPPDv^(o{ccB9ZgQj?P3d)C zT(RrhQF>(I*}f14Cr%&6Jv?BEBx?+Y0&G)Y-*{s?)W5wUbJG@8sf*kVgD|2_iQ4_+ zaWkFC*~0?k)mY^mytDZfIl?vKj5$Qcw4<{l^F+=+TKn&h z@4@oYv^Fg28SjUqL&3$3fmH$HB|iq8@Z0xv!~3SkPKvt@IX>$0&3w#R#e09h1Y!j~ zACxdRxHEQq%e;I!h*HhlZ~2(CER*VF|IInEh2UgoP~ zUkCKe(CK4k)PFyW=k92=` z?z01yu2r24s@1v7NMS&)R>mIZ?QhNF^ZQIZs1cT73tkXmxbj^Zqogh4!nS+gapedR zX-2;NS`u43%3+5uSgT+fru;?t%X19Fyoe~``M!nn5)4_mA$P^C@4J*SrJZvKwDJ_2 z63D_Tu$@0E<(E6eL^c4ZY+?HwDfa1Z)nU{JRIk#(^v8lJ9a9&lJdp2EK>^UBiF7;F=3B_(kpZ7M}t z&Gu*Mucpt|CrwCd$=c)KkZ!Gg_1A)fF{0bZtt5jwjIJg&_G^(&N_IL7{>FU&wiuvP zc+uwitYVD<0(XPK`6N#a*v)%3d^Ei0lcIa}{(24R_L2CZV+s#HL%66;enuKxA!%%2 z2HCwLgv}5&kc$6NE~@d#Qf`6mN-Jlawzg+8U)*QevSEn0Dj9HFUyt4Ey5G^4GyApB z^ftvjNJ=i)f8e>o`mivyZ8Wv^E`a03GX4C(?BdV!a^M1f^2v^HByr1Vnx}0q+S9c< z8oOyMJF2POf6&p}rYQbazU^RmmN@U<(EdoeJnNm>iNmrryU0V>tM$@tVoe%zuz%%r zE)ag<*LkMiWL+NH=_D}F8gaa((-^PdMZiFqB$3@j@8&%~(L3Tu-zPv6apG{jn77DVu*Yi2 zozUOkV|z7h?x_Mgq~fi_T!vnDxn;keR`7&&B$h4d&5SzI=+^uU&i1Be^J0xU0~c?i zI%&*hkj>)pYCSurbj^JwU?PtB6xOT?>1GyCHNlNwd_qDet*SV{tAcuIUh`n6lCM+g zEsHUO+?!UnoD{R=&s+5EM!l}?`metSv?I)l7PkCeHk_1vK3$(&h$p5@NmS)m+ z?XwQSUx^D3Wz=$Ijnx@VmJ+eU}QGiUw$}479oItKSy*o=@29 zj?)F1ZvCQ}F4R57-52>c5QJjTVg!)4&Hl2S-1})>!48zTouvv9WHOlYafkV~>-^<+ z5I!3P86mGUCMtY)e4&vR@fk8V{QCULpFs@(51aB(lNinWg%BTz*8yprb`JMshu+hhyo#q z&HF5Ax{Vw_CEG7ou=KO_S{{U$5hS{G(Tq9)_+0R$2vX- zF339@j|55h9)5`xwSr7;u0}L(2jktd4q{`B>dXWKR3{=aek`AUh!vX9HqarBdp{dI zkG_AUKD-^*&ommp#%1Qu%qfXuTA$e+D$6qp{Suc*jr?a27|>Lt5ZZ}s4EDK>cs%Aa zaGqjem*R~piY~lXi=<*n{czMw6s$J`3pF7zTLpfLPI6nvbDcK0O%ur)W`miVayPm%|*M zRLL@l+*kOgxMFnwrM?)K=(BXx$Eyn(4B1yXkd+i|tK5U#DSTRXtJfI4m0)-TFYnK+ zG0FrP9{$no3ZaQ_+jGXX3S@gh2=LOFI#y`(z%bl!Kj2p_Rlzc)uQZco*DLO*`#wM8 z%Goe2+1-PParSN(1A~ZxC*a9)Jf}Nox%n8Z5~i6!j?tLo7I8YmXb(a_u+pI5Z|R@% zae*%iVZPC@KV%SCow39cNN=bCi{PKNS1jTJGf?H%@#j|5-m%3G*v?E-Y2-7BkCAAL z-EK~GG_UQFN6WDG3K`XesB&kEl4irWu+zaT0XnodKS>aZrTRHXh8-@fys`K7o13&T z>%7H_X3I;i8y=E6iD$LTH3FSgHe6eX+syc@Wjf9-dvs3=3D*NzHm_>HZFE@s9Svrm#pERUMlGY6Wq0?j_BW$7s|=XApM|eO}lPw zLw|oxKjfq@6w$^cg_dpW&iHPK{?+0Y8Cj4yhfYQ~Qnb<~PUhO7&5{m7VpT}2UTn8! z0HFu^Oc_Zg7f!*RU0oyiZ{5^JX^myP+lTf2DK6n9*z7^dkV9s16NFLZ9Sx1r#6?%7#pQazAY1U^`Cbw0{H^9cOhwJ(Hz z!(MlxS<%|fPZQT)oi@v2x@6(tjz?&xzp(3Q)|zf5(8e)7A`$=|R6@u~O8g^s)lIll zO~L!N>Fv`sSnJAstO_GA^~5a6F^!U%$#^U2-IW(b`yriG8E5P{;94~la9lHK;&eK8 z?@c&2hiLdtCiy6=9kO}zF57Ot@O+TVFN?7+=+H_uhcI*=o z0{SN85)h!JL5MJD;wG5oCMNDH#L0*eqGk+8;k>{5m-;)cUe@$ejmk0mvXRNn)*^)7 zo05<05}B&YUEJ0FTGQ)$FZ)!Urx6qqky>;4eVXOGPvr+U9T=6H%1n$e#w=f~f}%kN zbs%_amwo4KtF=;u22tM7FbySW88~}LSL7pK#5-HUU#u_+nTMPW`y?Np0c-QL;Z?nq z^+L-z@N0LYgR@uXm(`s#LKWluKAm9!Ld>;LjE6`uHsPR`gL-1IY80J}tcAQ(Mf6JX>s+jo9VMz?#DqWk-X$-BgsY}d1 zj)|2+EKe&l3iS77&r0eAtu}IOnpq?LQ2=+cdU)VJ>Pi=6u0K*Y1>$yqkRsCM^rk=$ zB{7~~>VAnUWz%DPP6A5%2l_KLsyj@Cd+7*G3J_6ZKrq#?C=}t;Lm$_K8=W$7NxT+) zSxCQ0PAw#a^$QYh8EuVbEz(GZ0h?!M@G$9ar2w8K#4pD)VDl3oo^6cDdBGO|_*2$6ydik1Rm zhe%TbfgvTe(qwgvYS__QNd0P9?Y2;0pjjzMCl8VrQVNQq2r5ZmRfLPegU~Y|mLVbb zYQm{WNdZIR%|V*~#DR@RTO$c8Dk%j3LM4Q^eovq1Zt%T0xLQ;}eKEF@(T?t3e&lES zsq=dGWt9cz?~&(R?iJNj#nxVXL)=<#z$wa(m^0(h$Q5t7L`tT-?Y^H$#hAg((oBYj;%@79_O>&5yH5d zoceG3AFlX3(wK2ltHi|Q_iSLfeSa6aJ=LKFNSdm-EQjNKDvZ79DN8n@R6yTO?CgzY z@WI^O`3;`RgRoZTGbcLzuU=VpvLk}e;8C1{FAa(hvGTLRq&e@r3i5q*``IjALbF0k z?j!4%F;XseO8WJ=(WSUy;tR^6ibfnOC=Xr;d5kaUC4C8*u)d#F{_=sRwPAhUlHL0F zM;XyRT|bNku|1|{9VMNCst_~+U^CzLXTPp&kL1U>=!(7IUfY8z!%3{C1TAFf;Pu-K zN%Vh6`oKnUA14Mz1ctRjkZV{MI|-7~`2x3c96drw8b3hQGj-`36}gq!C-6zia~lKQ z_r*F2wrz1gQL)DA5*xweJcf@Ob;*Hp4q%T_Ow_!kffHJb-F;Hm*Y4oh!e%G-A!ucz z;Niro@oTSDq>R8tuv=2t@vh?zAH&*Sj?==z1k+%=oRa7K8hh#F*yqhviF_f9>d&%S zU*SiaVxIX1q*=z2Q(Fn5F9$R(aonHoxeq%j$lU=;J8w z;rL)A$e0S0^nb4e>zQC+U*eiBP|-FnCVS9gc5m~2{1x?e z>^6{cTv*H4WceDr?$m?@%=CHgYdt*nNWvv47K+iU#+8iFW@iUdi*^_92x~ifUk&vS z=i{MOqm2{PQH37A)g&yj79hvPL4(;lR02NBqoF{J0lR-g;mAm5o-2+@8Vd zn62Y?+Tc$25{|%E`TXR1Jr^AR9_z794_V?%r}j#F@tOjsG93W@%8SfQ_8duA2t51A zSq$DwQ~yxQpEle1SN%?%XG><6;!UxjE3$S z(M7bsNC4{}0x^-e&if|y?D~fV@b1?RMy4Z@d)ZYO!tX*pk|ZnwKaA^K_SmphFS3e| zRYPo~?>@jgC(4GxE5hYn(P*kp-vaYu%9MPGTDR1b0;wY|y9*}Ygj=lm)*2SjCh$4y z81!ia>rHXrA8oYuCuH&R>G*Y?{5D7|f*%^SSqe&-ygxcerDD~Khk=SY$W<{??TTJW z+?}1=GY9nMbv|2v8-J^kr`qjy)@k>(pIj>%BDR1rC)mpn((=gtSoSa!&v6of8UG;k#AvRi%m<1 z^Gq}MZJ?xM9H4(E{=Le1q6a~7>|ofh3l_tQ;PN_0;~*=O-xT))oLt15l(Rakxb~GKRJwh zrGh)1`dUGLAmu;xjIrOn3;V<|wWTrT z9fhuLRKnh@&D@r-St!~`Vc#hjT_16x)R_F8a2QVOVHGvKclIaK%6fFV1Vt>y# z12a}NimNpvA85TIl-xR@R;R8F1~mf!*nvus8s_hslWEriJjc{>he{J;_zP$2ga)!z zl!o%58?ur9^KH$G_`ao6np&Bpr6OHj<=^8rMnz<`&%1?%hbFqsACzbqYzK=+tued3 xZWz@I!FLa*#?w10kHWjtKEBS3UT#REf3{|Z0k{`GKDN(gi;*x9hwUmv{~rLaM@awx diff --git a/data/df_sentiment_classification_example.rda b/data/df_sentiment_classification_example.rda index 8bcb1a49257351edff5d7ac30ee94fa8cafce95a..2b533f62371cfc620ae994ba67eb0652853b8e9b 100644 GIT binary patch delta 381 zcmV-@0fPRT1DFF2LRx4!F+o`-Q&~p`+HR2!CV!@+@?>oyZAM0#FcU^5o|!|`+C3?N zdYT~BJv0$EiL@kon?W=*Fn}1E0LhaR5M+`x8UWA$0000Q02%-*_gU`@3;~H~s~I_D z@u|UL0x|+spbCiyc(XePxvq{z)N^Qe&vBh@f zhkxKKgo+>#gpz@0j=!?~uRW-{Juq{pTgNAJGqO#B^3EITl0zLE!#~|pf#dz~U z#SkPPd%-l54D75v=C+>BuyF&;Qw=}aUS^6wo@6})G;}d3$c3l+Fqp%lvprH%M$`!c zr6;DQdQ<>YSO~~PUx?d8L83|_HnH)gK}En#5r^Fp!G+tP@8v?MpyqKk>6)l?Ke}r| b|Jk}p(ok@7bsz!!4gN0Vig2MIjt{illi;Qn delta 383 zcmV-_0f7FP1DXR4LRx4!F+o`-Q(3ixV;(9QypsFcv6=nviDi>GibVI>@ zkHf(*z7B_mg;-J2%EGJswDyIAL@za9G-#@}=L$QW9)aGBxMo ziN?@{q`OVlfXBE+p(V>}eNqo6t4(ioWo;UosPa*9!9tpewP@nn#O d{=s`%ILXFPVA05c2W=nX?ntK!5*Dy*WPr?*qBsBm diff --git a/vignettes/hugging_face_inference.Rmd b/vignettes/hugging_face_inference.Rmd index f985801..17ae967 100644 --- a/vignettes/hugging_face_inference.Rmd +++ b/vignettes/hugging_face_inference.Rmd @@ -370,7 +370,7 @@ embedding_result |> count(.error) # View any failures (column names match your original data frame) failures <- embedding_result |> filter(.error == TRUE) |> - select(id, .error_message) + select(id, .error_msg) # Extract just the embeddings for successful rows embeddings_only <- embedding_result |> @@ -416,7 +416,7 @@ The result includes: - Your original ID and text columns (with their original names preserved) - Classification labels (e.g., POSITIVE, NEGATIVE) - Confidence scores -- Error tracking columns (`.error`, `.error_message`) +- Error tracking columns (`.error`, `.error_msg`) - Chunk tracking (`.chunk`) > **NOTE**: Classification labels are model and task specific. Check the model card on Hugging Face for label mappings. From 5f51275fd16d6aa254dc46907fdf6dac09991817 Mon Sep 17 00:00:00 2001 From: jpcompartir Date: Wed, 14 Jan 2026 14:38:00 +0000 Subject: [PATCH 39/39] re-work, re-organise and reformat the sync_async vignette add (batch) to vignette name --- _pkgdown.yml | 2 +- vignettes/sync_async.Rmd | 116 ++++++++++++++++++--------------------- 2 files changed, 53 insertions(+), 65 deletions(-) diff --git a/_pkgdown.yml b/_pkgdown.yml index eca46bd..a016c5d 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -48,7 +48,7 @@ navbar: - text: Advanced Topics - text: Improving Performance href: articles/improving_performance.html - - text: Synchronous vs Asynchronous APIs + - text: Synchronous vs Asynchronous (Batch) APIs href: articles/sync_async.html diff --git a/vignettes/sync_async.Rmd b/vignettes/sync_async.Rmd index 2e76dbe..7f4b2ac 100644 --- a/vignettes/sync_async.Rmd +++ b/vignettes/sync_async.Rmd @@ -1,5 +1,5 @@ --- -title: "Synchronous vs Asynchronous APIs" +title: "Synchronous vs Asynchronous/Batch APIs" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Synchronous vs Asynchronous APIs} @@ -14,10 +14,24 @@ knitr::opts_chunk$set( ) ``` -```{r setup} +```{r setup} library(EndpointR) ``` +# Introduction + +Most of EndpointR's integrations are with synchronous APIs such as [Completions](https://platform.openai.com/docs/api-reference/completions) by OpenAI, Hugging Face's [Inference Endpoints](https://huggingface.co/docs/inference-endpoints/en/index), and Messages by [Anthropic](https://platform.claude.com/docs/en/api/messages). When using these APIs, we send a HTTP request, wait a second or two and receive a response. + +However, data scientists often need to process an entire data frame, resulting in thousands or millions of HTTP requests. This is inefficient because: + +1. Cost - Providers don't offer discounts for these requests +2. Session Blocking - Our coding consoles get blocked for hours at a time +3. Rate Limits - Providers enforce stricter rate limits on these APIs + +A solution to these problems is to use providers' 'Batch APIs' which offer asynchronous results. These often come with a 50% discount and higher rate limits, with a guarantee of results within a time frame, e.g. 24 hours. + +> **TIP**: It's worth noting that the results are often ready much faster, consider checking in 1-2 hours after triggering the batch. + # Quickstart The OpenAI Batch API workflow follows three stages: **prepare**, **submit**, and **retrieve**. Below are complete examples for embeddings and completions. @@ -154,13 +168,13 @@ df <- data.frame( # 3. Prepare requests with schema jsonl_content <- oai_batch_prepare_completions( - df, - text_var = text, - id_var = id, - model = "gpt-4o-mini", - system_prompt = "Analyse the sentiment of the following text.", - schema = sentiment_schema, - temperature = 0 + df, + text_var = text, + id_var = id, + model = "gpt-4o-mini", + system_prompt = "Analyse the sentiment of the following text.", + schema = sentiment_schema, + temperature = 0 ) # 4. Upload and trigger batch job @@ -195,86 +209,60 @@ results_df |> #> 3 review_3 neutral 0.78 FALSE NA ``` -# Introduction - -Most of EndpointR's integrations are with synchronous APIs such as [Completions](https://platform.openai.com/docs/api-reference/completions) by OpenAI, Hugging Face's [Inference Endpoints](https://huggingface.co/docs/inference-endpoints/en/index), and Messages by [Anthropic](https://platform.claude.com/docs/en/api/messages). When using these APIs, we send a request, wait a second or two and receive a response. - -For many use-cases the synchronous APIs work just fine. But often as data scientists we need to do the same thing to thousands, or millions of rows of data. Hammering the provider's servers with thousands/millions of requests is inefficient for us and them. Plus, we don't want to sit around with a blocked R session for 5 hours as our results get returned to us. - -Most Generative AI providers also offer lower-cost, asynchronous APIs. The providers usually offer a guarantee of the results within a time frame, and an estimate of the average time to return the results. For example, they may guarantee results within 24 hours, but expect them within 1-3 hours. - +> **Limits**: Each batch file can contain up to 50,000 requests or 200MB, whichever is reached first. For larger datasets, split into multiple batches. # When to choose Synchronous vs Asynchronous -> For a more comprehensive treatment, and motivating examples [OpenAI's offficial documentation/guide](https://platform.openai.com/docs/guides/batch) is a good place to start. - -As consumers, the decision represents a trade-off between time and money. If we are serving Generative AI to other consumers, e.g. in an application, we will usually favour a Synchronous API because users expect instant results. Alternatively, if we are running analyses over large datasets, or repeated batch-inference, we can usually afford to wait longer for the results so we may favour the Asynchronous APIs. - -Synchronous APIs are also very useful when we are still in the experimental step of an analysis and need quick feedback- i.e. when we're testing our prompts, and developing our [schemas](https://json-schema.org/) for [Structured Outputs](https://platform.openai.com/docs/guides/structured-outputs), we don't want to wait 24 hours to find out we made an error in our schema. Instead, we want to send a request and receive a response within seconds. That way we can iteratively fix/develop our schemas/prompts and get to a better outcome quicker. - -At the time of writing, [OpenAI's batch API](https://platform.openai.com/docs/guides/batch) offers a 50% discount on the Completions API, as well as higher rate limits: - -

- -Learn how to use OpenAI's Batch API to send asynchronous groups of requests with 50% lower costs, a separate pool of significantly higher rate limits, and a clear 24-hour turnaround time. The service is ideal for processing jobs that don't require immediate responses. - -Batch processing jobs are often helpful in use cases like: +> For a more comprehensive treatment, and motivating examples [OpenAI's official documentation/guide](https://platform.openai.com/docs/guides/batch) is a good place to start. -1. Running evaluations -2. Classifying large datasets -3. Embedding content repositories +| | Synchronous | Asynchronous (Batch) | +|----|----|----| +| Cost | Full price per token | \~50% Discount per token | +| Latency | Real-time | Up to 24 hours | +| Use Case | Experimentation, Prompt testing, Schema development, User-facing applications, | Recurrent workflows (evals etc.), embedding large datasets, classifying large datasets | +| Data Size | Up to \~10,000 | \~10,000+ | -
+> **Recommendation**: Use the Synchronous API when you need immediate feedback e.g. prompt or schema development, and for small datasets where cost savings are irrelevant. Once everything is figured out, move to the Batch API to save on cost. -# Files API +# Cleaning Up -In order to use OpenAI's Batch API, we need to upload files to the Files API. Luckily this process is quite simple, but do keep in mind that to successfully run a batch job of embeddings, you will need to work with three APIs: +Once the batch job has been completed, the associated files will live on the OpenAI API, inside the Files API. Your OpenAI account will be charged for storage, so it's best to download the results and save in your org's own cloud storage. -- Embeddings API -- Batch API -- Files API +```{r, eval = FALSE} +oai_file_delete(file_info$id) # delete the input file -Fortunately, the same mental models will be useful for both the Batch and the Files APIs. - -# EndpointR Implementation of OpenAI Batch API - -Due to inherent differences between Synchronous and Asynchronous APIs, the EndpointR implementation of the OpenAI Batch API will feel more like submitting jobs to a cluster/server than automagically working with an entire data frame as in `oai_complete_df()` and `oai_embed_df()`. As such, different functions and workflows are needed. +oai_file_delete(status$output_file_id) # delete the output file +oai_file_delete(status$error_file_id) # delete the error file +``` -The two main functions for **preparing the requests** are -- `oai_batch_prepare_embeddings()` -- `oai_batch_prepare_completions()` +> **NOTE**: At the time of writing, OpenAI save information in both the Batch API and the Files API, you need to delete your input, output, error files from the *Files API*, you cannot delete from the Batch API -Each function expects a data frame as input: `oai_batch_prepare_embeddings()` will accept the relevant arguments from `oai_embed_df()`, `oai_batch_prepare_completions` will accept the relevant arguments from `oai_complete_df()`. The OpenAI Batch API expects a single .jsonl file of up 50,000 rows or 200 MB in size. If we want to perform the operation on a 150,000 row data frame, we need to create and manage 3 separate batches. +# Technical Details -> **NOTE:** For structured outputs the Batch API requires us to send the JSON schema with each request. Complex schemas will quickly lead to large file size, perhaps eclipsing the 200 MB limit. +## Batch Limits -EndpointR prepares each batch, writes it to a file in temporary storage, and then sends the file to the OpenAI Files API. Once in the Files API where it will receive a file ID and some other metadata. EndpointR can pass the file ID to the Batch API and trigger a batch job to run. Once running, the batch job's status can be checked and in the end we'll receive information on where to find the results in the **Files API**. +The OpenAI Batch API enforces specific limits per batch file. If your data exceeds these, you must split it into multiple batch jobs. -Whether using the Batch API for embeddings or chat completions, each line of of the .jsonl file must form a self-contained request with a unique identifier. +- Max Requests per Batch: 50,000 +- Max File Size: 200 MB + + > **Warning**: When using Structured Outputs, the JSON schema is repeated for every single request in the batch file. For complex schemas, you may hit the 200 MB file size limit well before you reach the 50,000 row limit. -E.g. for an embeddings task on the Batch API, the request in each row should look something like: +## Underlying Request Format -Row version: +EndpointR handles the JSON formatting for you, but for debugging purposes, it is helpful to know what the API expects. Each line in the batch file is a JSON object containing a custom_id and the request body. -``` -"{\"custom_id\":1,\"method\":\"POST\",\"url\":\"/v1/embeddings\",\"body\":{\"input\":\"hello\",\"model\":\"text-embedding-3-small\",\"encoding_format\":\"float\"}}" -``` -Prettify'd version: - -``` +```{json} { - "custom_id": 1, + "custom_id": "doc_1", "method": "POST", "url": "/v1/embeddings", "body": { - "input": "hello", + "input": "The quick brown fox...", "model": "text-embedding-3-small", "encoding_format": "float" } } ``` - -> **NOTE:** The Embeddings API expects the input in an 'input' field rather than 'messages' as in the Completions API, and batch requests must adhere to this. -