Skip to content

Commit c6f8e48

Browse files
committed
behind the scenes
1 parent a492226 commit c6f8e48

File tree

1 file changed

+125
-27
lines changed
  • posts/2025-11-04-introducing-omicslog

1 file changed

+125
-27
lines changed

posts/2025-11-04-introducing-omicslog/index.qmd

Lines changed: 125 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ data(airway, package="airway")
4949
result <-
5050
airway |>
5151
log_start() |> # Starting the logging operations
52-
filter(dex == "untrt") |>
52+
filter(dex == "trt") |>
5353
select(!albut) |>
5454
mutate(dex_upper = toupper(dex)) |>
5555
extract(col = dex,into = "treat") |>
@@ -62,23 +62,21 @@ result
6262

6363
:::{.smaller}
6464
```r
65-
#> # A SummarizedExperiment-tibble abstraction: 1 × 22
65+
#> # A SummarizedExperiment-tibble abstraction: 1 × 1
6666
#> # Features=1 | Samples=1 | Assays=counts
67-
#> .feature .sample counts SampleName cell treat Run avgLength
68-
#> <chr> <chr> <int> <fct> <fct> <chr> <chr> <int>
69-
#> 1 ENSG0000000… SRR103… 1138 GSM1275870 N080… untrt srr1… 120
70-
#> # ℹ 9 more variables: gene_name <chr>, entrezid <int>, gene_biotype <chr>,
71-
#> # gene_seq_start <int>, gene_seq_end <int>, seq_name <chr>, seq_strand <int>,
72-
#> # seq_coord_system <int>, symbol <chr>
67+
#> .feature .sample counts SampleName cell treat Run avgLength Experiment
68+
#> <chr> <chr> <int> <fct> <fct> <chr> <chr> <int> <fct>
69+
#> 1 ENSG00000000… SRR103… 1047 GSM1275871 N080… trt srr1… 126 SRX384354
70+
#> # ℹ 3 more variables: Sample <fct>, BioSample <fct>, dex_upper <chr>
7371
#>
7472
#> Operation log:
75-
#> [2025-06-09 18:34:15] filter: removed 4 samples (50%), 4 samples remaining
76-
#> [2025-06-09 18:34:15] select: removed 1 (11%), 8 column(s) remaining
77-
#> [2025-06-09 18:34:16] mutate: added 1 new column(s): dex_upper
78-
#> [2025-06-09 18:34:17] extract: extracted 'dex' into column: treat (original removed)
79-
#> [2025-06-09 18:34:17] mutate: modified column(s): Run
80-
#> [2025-06-09 18:34:17] filter: removed 63676 genes (100%), 1 genes remaining
81-
#> [2025-06-09 18:34:18] slice: Kept 1/4 rows (25.0%); removed 3 rows
73+
#> [2025-12-17 13:21:30] filter: removed 4 samples (50%), 4 samples remaining
74+
#> [2025-12-17 13:21:31] select: removed 1 (11%), 8 column(s) remaining
75+
#> [2025-12-17 13:21:31] mutate: added 1 new column(s): dex_upper
76+
#> [2025-12-17 13:21:31] extract: extracted 'dex' into column: treat (original removed)
77+
#> [2025-12-17 13:21:31] mutate: modified column(s): Run
78+
#> [2025-12-17 13:21:31] filter: removed 64101 genes (100%), 1 genes remaining
79+
#> [2025-12-17 13:21:31] slice: Kept 1/4 rows (25.0%); removed 3 rows
8280
```
8381
:::
8482

@@ -91,7 +89,7 @@ options(restore_SummarizedExperiment_show = TRUE)
9189

9290
result_base <- log_start(airway) # Starting the logging operations
9391

94-
result_base <- result_base[, colData(result_base)$dex == "untrt"]
92+
result_base <- result_base[, colData(result_base)$dex == "trt"]
9593
colData(result_base)$dex_upper <- toupper(colData(result_base)$dex)
9694
colData(result_base)$Run <- tolower(colData(result_base)$Run)
9795
result_base <- result_base[rownames(result_base) == "ENSG00000000003", ]
@@ -106,24 +104,124 @@ result_base
106104
#> metadata(1): ''
107105
#> assays(1): counts
108106
#> rownames(1): ENSG00000000003
109-
#> rowData names(10): gene_id gene_name ... seq_coord_system symbol
110-
#> colnames(4): SRR1039508 SRR1039512 SRR1039516 SRR1039520
107+
#> rowData names(0):
108+
#> colnames(4): SRR1039509 SRR1039513 SRR1039517 SRR1039521
111109
#> colData names(10): SampleName cell ... BioSample dex_upper
112110
#>
113111
#> Operation log:
114-
#> [2025-06-05 11:02:29] subset: removed 4 samples (50%), 4 samples remaining
115-
#> [2025-06-05 11:02:29] colData<-: added 1 new column(s): dex_upper
116-
#> [2025-06-05 11:02:29] colData<-: modified column 'Run'
117-
#> [2025-06-05 11:02:29] subset: removed 63676 genes (100%), 1 genes remaining
112+
#> [2025-12-17 13:22:58] subset: removed 4 samples (50%), 4 samples remaining
113+
#> [2025-12-17 13:22:58] colData<-: added 1 new column(s): dex_upper
114+
#> [2025-12-17 13:22:58] colData<-: modified column 'Run'
115+
#> [2025-12-17 13:22:58] subset: removed 64101 genes (100%), 1 genes remaining
118116
```
119117
:::
120118

121-
# We need your feedback!
119+
# Behind the scenes
120+
121+
How does `omicslog` operate? In essence, for every function you apply to a `SummarizedExperiment` object, it tracks changes in rows and columns and records a message describing those changes in a dedicated logging structure stored in the object’s `metadata`.
122+
123+
Let us suppose we want to filter the `airway` dataset to retain only samples treated with dexamethasone (`dex == "trt"`):
124+
125+
```r
126+
result1 <- airway |> filter(dex == "untrt")
127+
```
128+
129+
How many samples did we keep? Let us find out:
130+
131+
```r
132+
remaining_samples <- length(colData(result1)$Sample)
133+
remaining_samples
134+
```
135+
136+
:::{.smaller}
137+
```r
138+
#> [1] 4
139+
```
140+
:::
141+
142+
What about the removed data? How many samples were discarded?
122143

123-
**Tell us your stories:**
144+
```r
145+
samples_removed <- length(colData(airway)$Sample) - length(colData(result1)$Sample)
146+
samples_removed
147+
```
148+
149+
:::{.smaller}
150+
```r
151+
#> [1] 4
152+
```
153+
:::
154+
155+
It is often useful to express this change as a percentage, since we may be discarding a substantial amount of information:
156+
157+
```r
158+
percentage <- round(100 - samples_removed / length(colData(airway)$Sample) * 100,2)
159+
percentage
160+
```
161+
162+
:::{.smaller}
163+
```r
164+
#> [1] 50
165+
```
166+
:::
167+
168+
At this point, we have a clear idea of how much the dataset has been modified. However, in practice, we often need to retrieve this kind of information repeatedly. To avoid manual bookkeeping, we would like to store it directly in the object itself, using the `metadata` slot.
169+
170+
Before doing so, we need some additional context, such as *when* the operation was executed and *which* function was used:
171+
172+
```r
173+
time <- Sys.time()
174+
func <- "filter"
175+
```
176+
177+
The most straightforward way to persist this information is to create a concise log message:
178+
179+
```r
180+
result1@metadata$log_history <- paste(time, func,": removed", samples_removed, "samples", "(", percentage,"%)", remaining_samples, "samples remaining")
181+
result1@metadata$log_history
182+
```
183+
184+
:::{.smaller}
185+
```r
186+
#> [1] "2025-12-17 13:27:34 filter : removed 4 samples ( 50 %) 4 samples remaining"
187+
```
188+
:::
189+
190+
Column-related operations follow the same logic. For example, let us remove the `albut` column, as we are not interested in samples treated with albuterol:
191+
192+
```r
193+
result2 <- result1 |>
194+
select(!albut)
195+
```
196+
197+
Even though we know that exactly one column was removed, it is still valuable to keep track of *how* the dataset was modified, *when* the change occurred, and *which* function was responsible:
198+
199+
```r
200+
columns_removed <- ncol(colData(result1)) - ncol(colData(result2))
201+
columns_remaining <- ncol(colData(result2))
202+
percentage <- 100 - round(ncol(colData(result2)) / ncol(colData(result1)) * 100,2)
203+
time <- Sys.time()
204+
func <- "select"
205+
206+
result1@metadata$log_history <- c(result1@metadata$log_history,
207+
paste(time, func,": removed", columns_removed, "(", percentage,"%)", columns_remaining, "column(s) remaining")
208+
)
209+
result1@metadata$log_history
210+
```
211+
212+
:::{.smaller}
213+
```r
214+
#> [1] "2025-12-17 13:27:34 filter : removed 4 samples ( 50 %) 4 samples remaining"
215+
#> [2] "2025-12-17 13:28:38 select : removed 1 ( 11.11 %) 8 column(s) remaining"
216+
```
217+
:::
218+
219+
As shown above, we extract the same type of information and append a new log entry to the `metadata` slot, just as we did for the row-based operation.
220+
221+
Too much work for a single data transformation? We agree. This is exactly where `omicslog` comes in—handling all logging operations automatically, so you can focus on the analysis.
222+
223+
# We need your feedback!
124224

125-
* What is your experience working with omics-oriented objects?
126-
* What difficulties have you faced when tracing changes across different experiments?
127-
* What else can we do to make your research more comfortable and easier to track?
225+
Besides the messages shown above, what other operation details might you be interested in logging for an omics-oriented project?
128226

129227
Don’t hesitate to open an issue in the [omicslog](https://github.com/tidyomics/omicslog "logging capabilities for SummarizedExperiment objects") GitHub repo.

0 commit comments

Comments
 (0)