Extended Debugger reductions and fixed some bugs by NRauschmayr · Pull Request #523 · awslabs/sagemaker-debugger

NRauschmayr · 2021-11-14T22:55:07Z

Description of changes:

Extended smdebug's reductions to check for nan- and inf-values and to compute quantiles for PT tensors. Tensors are now also written out in Tensorboard format such that users can display all reductions for a specific tensor within the same visualization and visualizations will be grouped by Debugger collections.
Here is an example visualization:

Style and formatting:

I have run pre-commit install && pre-commit run --all-files to ensure that auto-formatting happens with every commit.

Issue number, if available

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

codecov-commenter · 2021-11-14T23:07:52Z

Codecov Report

Merging #523 (66999aa) into master (b4dd4c1) will decrease coverage by 6.36%.
The diff coverage is 57.44%.

@@            Coverage Diff             @@
##           master     #523      +/-   ##
==========================================
- Coverage   77.60%   71.24%   -6.37%     
==========================================
  Files         127      117      -10     
  Lines       11111    10614     -497     
==========================================
- Hits         8623     7562    -1061     
- Misses       2488     3052     +564

Impacted Files	Coverage Δ
smdebug/pytorch/utils.py	`48.14% <23.52%> (-32.81%)`	⬇️
smdebug/core/locations.py	`85.71% <60.00%> (-5.96%)`	⬇️
smdebug/core/reduction_config.py	`86.58% <77.77%> (-9.52%)`	⬇️
smdebug/core/hook.py	`86.90% <81.25%> (-0.53%)`	⬇️
smdebug/mxnet/__init__.py	`0.00% <0.00%> (-100.00%)`	⬇️
smdebug/mxnet/singleton_utils.py	`0.00% <0.00%> (-100.00%)`	⬇️
...debug/profiler/analysis/notebook_utils/__init__.py	`0.00% <0.00%> (-100.00%)`	⬇️
smdebug/mxnet/hook.py	`0.00% <0.00%> (-84.85%)`	⬇️
smdebug/mxnet/utils.py	`0.00% <0.00%> (-78.13%)`	⬇️
smdebug/rules/action/message_action.py	`13.25% <0.00%> (-75.91%)`	⬇️
... and 61 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b4dd4c1...66999aa. Read the comment docs.

NihalHarish · 2021-11-14T23:52:27Z

smdebug/core/hook.py

+        if subfolder == None:
+            subfolder = self.mode


Can we use subfolder = os.path.join(self.mode)?
It makes the intentions of this line much clearer to the reader.

subfolder is just a string and not a filepath.

os.path.join returns an object of type str. If I understand this part correctly, the subfolder variable contains the path to sub directory for tensorboard data?

no subfolder is just the name of the reduction. each reduction will be its own subfolder in the tensorboard directory.

NihalHarish · 2021-11-14T23:53:50Z

smdebug/core/hook.py

+        if subfolder == None:
+            subfolder = self.mode
+
+        if subfolder in self.tb_writers:


We should rename the name of this map. Maybe something more explicit like self.tb_writer_to_dir_map ?

NihalHarish · 2021-11-14T23:55:33Z

smdebug/core/hook.py


-    def _get_reduction_tensor_name(self, tensor_name, reduction_name, abs):
-        return get_reduction_tensor_name(tensor_name, reduction_name, abs, remove_colon_index=True)
+    def _get_reduction_tensor_name(self, tensor_name, reduction_name, abs, collection_name=""):


When do we accept an empty string collection name?
Should we use the DEFAULT collection key?

I extended this part, which is needed for Tensorboard to group visualizations by Debugger collections. To keep it consistent with previous code, the collection name will be empty per default.

NihalHarish · 2021-11-14T23:57:34Z

smdebug/core/hook.py

+                reduction_name = "abs_" + reduction_name
+            tb_writer = self._maybe_get_tb_writer(subfolder=reduction_name)
+            if tb_writer:
+                scalar = self._make_numpy_array(tensor_data)


What is the value of scalar if tb_writer = None?

Per default Debugger writes reductions like normal tensors into debug-output folder and users can retrieve the data via the smdebug API. I extended this part, so that reductions are also written in Tensorboard format (in case user provided a Tensorboard configuration)

NihalHarish · 2021-11-15T00:01:49Z

smdebug/pytorch/utils.py

+        if hasattr(torch.Tensor, reduction_name):
+            f = getattr(torch.Tensor, reduction_name)
+            op = f(tensor_data.float())
+            if reduction_name == "isnan" or reduction_name == "isinf":


Can we manage reduction_name values with Enums?

NRauschmayr added 3 commits November 10, 2021 09:13

update reductions

d9f9696

update tensor names

0fcb161

update reduction_config

c24fd2e

NihalHarish suggested changes Nov 15, 2021

View reviewed changes

NRauschmayr added 6 commits November 21, 2021 16:15

bugfix for tf reductions

ff9dd63

fixed tests

316087e

changed tensornames

ce6e880

minor bugfixes

3d52a9a

pre-commit

61e7537

updated tf testcases

66999aa

NihalHarish self-requested a review November 23, 2021 17:38

NihalHarish approved these changes Nov 23, 2021

View reviewed changes

Merge branch 'master' into update_reductions

3e25dd0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extended Debugger reductions and fixed some bugs#523

Extended Debugger reductions and fixed some bugs#523
NRauschmayr wants to merge 10 commits intoawslabs:masterfrom
NRauschmayr:update_reductions

NRauschmayr commented Nov 14, 2021

Uh oh!

codecov-commenter commented Nov 14, 2021 •

edited

Loading

Uh oh!

NihalHarish Nov 14, 2021

Uh oh!

NRauschmayr Nov 15, 2021

Uh oh!

NihalHarish Nov 15, 2021

Uh oh!

NRauschmayr Nov 15, 2021

Uh oh!

NihalHarish Nov 14, 2021

Uh oh!

NihalHarish Nov 14, 2021

Uh oh!

NRauschmayr Nov 15, 2021

Uh oh!

NihalHarish Nov 14, 2021

Uh oh!

NRauschmayr Nov 15, 2021

Uh oh!

NihalHarish Nov 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

NRauschmayr commented Nov 14, 2021

Description of changes:

Style and formatting:

Issue number, if available

Uh oh!

codecov-commenter commented Nov 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Nov 14, 2021 •

edited

Loading