-
Notifications
You must be signed in to change notification settings - Fork 51
Description
Each individual glacier simulation will have variables stored as 2d arrays (glacxtime or glacxyear) - where glac is of length-1, but corresponds to the integer 0-based index of the given glacier in the model's main rgi glacier table. This should be made less ambiguous. Options are (a) remove the glac coordinate altogether since there's always only one per simulation, or (b) set the index of the model's main_rgi_table to be the "RGIId" or "01Index" so the resulting glac values in each output are more clear. Option (a) probably makes most sense, but would requires more restructuring - run_simulation.py, output.py, and postproc_compile_simulations.py (as well as the demo notebooks).
Hi @yelizy,
Great questions. The short answer is that this index is not used after storing the simulations, and we should probably modify this structure. Each individual simulation output will only have a single glac index. In fact, we could possibly just remove the glac index from the individual outputs altogether, since there is always only one glacier. If you are trying to access the results form an individual glacier output, you can simply index into the 0th glac, similar to what's done in the various example notebooks (e.g., simple_test.ipynb) When the simulations are then merged by region, the RGIId is stored along the glacier index. @drounce can correct me if I'm wrong, but I believe the reason the individual simulation were originally stored as 2d arrays (e.g.,glacxyear) was because then it was easier to stack them regionally in post-processing.
A bit more detail: the reason the glac value may seem ambiguous has to do with a subtlety in how the rgi glacier table is indexed into in the run_simulation script when looping through the list of glaciers in a given run. In run_simulation.py, we index into the rgi glacier table. Pandas default behavior is then to store the 'name' of the resulting series based on the index of the row in your main_glacier_rgi dataframe. For example, if I do a run for 1.00570 and 1.00571 together:
run_simulation -rgi_glac_number 1.00570 1.00571 ....
My main_glac_rgi dataframe will look like so:
This study is focusing on 2 glaciers in region [1]
O1Index RGIId CenLon ... rgino_str RGIId_float CenLon_360
0 569 RGI60-01.00570 -145.427 ... 01.00570 1.00570 214.573
1 570 RGI60-01.00571 -145.449 ... 01.00571 1.00571 214.551
What becomes the 'name' key in our resulting series as we loop through each glacier is the index in main_glac_rgi (e.g., 0 for 1.00570 and 1 for 1.00571). These are the values that get stored under the glac coordinate of the simulation output. So if you ran say 200+ glaciers as your post above indicates, you may will have values that correspond to the range of glaciers in your run under the glac index - but there should always be just one index per output.
If you an an entire region, the glac values should correspond to the RGIId -1. For instance if we ran all of Alaska then 1.00570 would have glac.values=569 in the output file for 1.00570. Sorry for the long-winded explanation, but does this make sense?
Again, in summary, the glac value does not matter, as you will only have one in your individual outputs, but looking at the values of glac can certainly be confusing and we should improve this.
Originally posted by @btobers in #150 (comment)