Extended Debugger reductions and fixed some bugs#523
Extended Debugger reductions and fixed some bugs#523NRauschmayr wants to merge 10 commits intoawslabs:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #523 +/- ##
==========================================
- Coverage 77.60% 71.24% -6.37%
==========================================
Files 127 117 -10
Lines 11111 10614 -497
==========================================
- Hits 8623 7562 -1061
- Misses 2488 3052 +564
Continue to review full report at Codecov.
|
| if subfolder == None: | ||
| subfolder = self.mode |
There was a problem hiding this comment.
Can we use subfolder = os.path.join(self.mode)?
It makes the intentions of this line much clearer to the reader.
There was a problem hiding this comment.
subfolder is just a string and not a filepath.
There was a problem hiding this comment.
os.path.join returns an object of type str. If I understand this part correctly, the subfolder variable contains the path to sub directory for tensorboard data?
There was a problem hiding this comment.
no subfolder is just the name of the reduction. each reduction will be its own subfolder in the tensorboard directory.
| if subfolder == None: | ||
| subfolder = self.mode | ||
|
|
||
| if subfolder in self.tb_writers: |
There was a problem hiding this comment.
We should rename the name of this map. Maybe something more explicit like self.tb_writer_to_dir_map ?
smdebug/core/hook.py
Outdated
|
|
||
| def _get_reduction_tensor_name(self, tensor_name, reduction_name, abs): | ||
| return get_reduction_tensor_name(tensor_name, reduction_name, abs, remove_colon_index=True) | ||
| def _get_reduction_tensor_name(self, tensor_name, reduction_name, abs, collection_name=""): |
There was a problem hiding this comment.
When do we accept an empty string collection name?
Should we use the DEFAULT collection key?
There was a problem hiding this comment.
I extended this part, which is needed for Tensorboard to group visualizations by Debugger collections. To keep it consistent with previous code, the collection name will be empty per default.
| reduction_name = "abs_" + reduction_name | ||
| tb_writer = self._maybe_get_tb_writer(subfolder=reduction_name) | ||
| if tb_writer: | ||
| scalar = self._make_numpy_array(tensor_data) |
There was a problem hiding this comment.
What is the value of scalar if tb_writer = None?
There was a problem hiding this comment.
Per default Debugger writes reductions like normal tensors into debug-output folder and users can retrieve the data via the smdebug API. I extended this part, so that reductions are also written in Tensorboard format (in case user provided a Tensorboard configuration)
| if hasattr(torch.Tensor, reduction_name): | ||
| f = getattr(torch.Tensor, reduction_name) | ||
| op = f(tensor_data.float()) | ||
| if reduction_name == "isnan" or reduction_name == "isinf": |
There was a problem hiding this comment.
Can we manage reduction_name values with Enums?
Description of changes:
Extended smdebug's reductions to check for nan- and inf-values and to compute quantiles for PT tensors. Tensors are now also written out in Tensorboard format such that users can display all reductions for a specific tensor within the same visualization and visualizations will be grouped by Debugger collections.
Here is an example visualization:
Style and formatting:
I have run
pre-commit install && pre-commit run --all-filesto ensure that auto-formatting happens with every commit.Issue number, if available
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.