Skip to content

Add and retrieve arbitrary metadata? #32

@jimjam-slam

Description

@jimjam-slam

Does this package support adding and retrieving arbitrary metadata to a CSVY file (even if it's under a specified key in the YAML)? If I add attributes to a data frame and write it out to a CSVY file, those attributes are included in the YAML front matter:

library(csvy)

test_path <- "test.csvy"
test <- tibble(x = 1:4, y = x^2)

# add attributes and confirm they're present
attr(test, "fruit") <- "banana"
attr(test, "vegetable") <- "broccoli"
test |> attributes()

write_csvy(test, test_path)

# confirm that metadata was written out to file
test_path |> readLines() |> paste(collapse = "\n") |> cat()
# #---
# #profile: tabular-data-package
# #name: test
# #fruit: banana
# #vegetable: broccoli
# #fields:
# #- name: x
# #  type: integer
# #- name: 'y'
# #  type: number
# #--- 
# x,y
# 1,1
# 2,4
# 3,9
# 4,16

But if I read that file back in, the extra attributes are dropped:

test_path |> read_csvy() |> attributes()
# $names
# [1] "x" "y"
# 
# $row.names
# [1] 1 2 3 4
# 
# $profile
# [1] "tabular-data-package"
# 
# $name
# [1] "test"
# 
# $class
# [1] "data.frame"

I'm reading up on the Tabular Data Package schema to see if there's a place reserved in it for arbitrary metadata, but I'm having some trouble understanding it. Is this package intended to allow users to retrieve arbitrary metadata?

Session info
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS 15.3.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Australia/Melbourne
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] here_1.0.1      csvy_0.3.0      lubridate_1.9.3 forcats_1.0.0  
 [5] stringr_1.5.1   dplyr_1.1.4     purrr_1.0.2     readr_2.1.5    
 [9] tidyr_1.3.1     tibble_3.2.1    ggplot2_3.5.1   tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] gtable_0.3.5       jsonlite_1.8.9     compiler_4.4.1     tidyselect_1.2.1  
 [5] scales_1.3.0       yaml_2.3.10        R6_2.6.1           generics_0.1.3    
 [9] rprojroot_2.0.4    munsell_0.5.1      pillar_1.10.1.9000 tzdb_0.4.0        
[13] rlang_1.1.5        utf8_1.2.4         stringi_1.8.4      timechange_0.3.0  
[17] cli_3.6.4          withr_3.0.2        magrittr_2.0.3     grid_4.4.1        
[21] hms_1.1.3          lifecycle_1.0.4    vctrs_0.6.5        glue_1.8.0        
[25] data.table_1.16.2  colorspace_2.1-1   tools_4.4.1        pkgconfig_2.0.3  

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions