Texera R UDF

R language support for Apache Texera, enabling data processing workflows using R code.

Installation

Prerequisites

R Installation (version 4.5.2)

Download from: https://www.r-project.org/
Other R versions may work but have not been tested

Required R packages (install these specific versions):

# Install tested versions
install.packages("remotes")
remotes::install_version("arrow", version = "22.0.0.1")
remotes::install_version("coro", version = "1.1.0")
remotes::install_version("aws.s3", version = "0.3.22")

Install Plugin

# Install from GitHub
pip install git+https://github.com/kunwp1/texera-rudf.git

# Development install
git clone https://github.com/kunwp1/texera-rudf.git
cd texera-rudf
pip install -e .

Usage

The plugin provides two APIs for processing data in Texera workflows:

Tuple API (Row-by-Row Processing)

Source Operator:

library(coro)
coro::generator(function() {
  yield(list(col1 = "Hello World!", col2 = 1.0, col3 = TRUE))
})

UDF Operator:

library(coro)
coro::generator(function(tuple, port) {
  tuple$col4 <- tuple$col2 * 2
  yield(tuple)
})

Table API (Batch Processing)

Source Operator:

function() {
  df <- data.frame(
    col1 = "Hello World!",
    col2 = 1.0,
    col3 = TRUE
  )
  return(df)
}

UDF Operator:

function(table, port) {
  table$col4 <- table$col2 * 2
  return(table)
}

Large Binary Support

Handle large binary objects (images, files, etc.) via S3-compatible storage:

Writing Large Binary:

library(coro)
coro::generator(function() {
  # Create a new large binary object
  lb <- largebinary()
  
  # Write data to it
  stream <- LargeBinaryOutputStream(lb)
  stream$write(charToRaw("Hello, Large Binary World!"))
  stream$close()
  
  yield(list(file_content = lb))
})

Reading Large Binary:

library(coro)
coro::generator(function(tuple, port) {
  # Read from large binary object
  stream <- LargeBinaryInputStream(tuple$file_content)
  data <- stream$read()
  stream$close()
  
  # Convert raw bytes to string
  content <- rawToChar(data)
  
  tuple$content_text <- content
  yield(tuple)
})

Features

Tuple API: Row-by-row processing with R generators
Table API: Batch processing with R dataframes
Apache Arrow: Efficient data transfer between Python and R
Large Binary Support: Handle large objects via S3-compatible storage

Requirements

Tested Versions

This plugin has been tested and verified to work with the following versions:

Python Environment:

Python: 3.10, 3.11, 3.12
rpy2: 3.5.11
rpy2-arrow: 0.0.8

R Environment:

R: 4.5.2
arrow: 22.0.0.1
coro: 1.1.0
aws.s3: 0.3.22

Other versions may work but have not been tested and are not guaranteed to be compatible.

License

Licensed under the MIT License. See LICENSE for details.

Contributing

Contributions are welcome! To contribute:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Links

Issues: https://github.com/kunwp1/texera-rudf/issues
Apache Texera: https://github.com/apache/texera
rpy2: https://github.com/rpy2/rpy2

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
texera_r		texera_r
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Texera R UDF

Installation

Prerequisites

Install Plugin

Usage

Tuple API (Row-by-Row Processing)

Table API (Batch Processing)

Large Binary Support

Features

Requirements

Tested Versions

License

Contributing

Links

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Texera R UDF

Installation

Prerequisites

Install Plugin

Usage

Tuple API (Row-by-Row Processing)

Table API (Batch Processing)

Large Binary Support

Features

Requirements

Tested Versions

License

Contributing

Links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages