Allow passing in cr/cl bounds and other settings#6
Allow passing in cr/cl bounds and other settings#6winston-zillow wants to merge 3 commits into12wang3:mainfrom
Conversation
Allow CPU execution. Fix GPU support. Fix module loading.
| ```bash | ||
| # trained on the tic-tac-toe data set with one GPU. | ||
| python3 experiment.py -d tic-tac-toe -bs 32 -s 1@16 -e401 -lrde 200 -lr 0.002 -ki 0 -mp 12481 -i 0 -wd 1e-6 & | ||
| python3 experiment.py -d tic-tac-toe -bs 32 -s 1@16 -e401 -lrde 200 -lr 0.002 -ki 0 -mp 12481 -i cuda:0 -wd 1e-6 & |
There was a problem hiding this comment.
Note: see review comment on args.py changes
| rrl_args.test_res = os.path.join(rrl_args.folder_path, 'test_res.txt') | ||
| rrl_args.device_ids = list(map(int, rrl_args.device_ids.strip().split('@'))) | ||
| rrl_args.device_ids = list(map(lambda id: torch.device(id), rrl_args.device_ids.strip().split('@'))) \ | ||
| if rrl_args.device_ids else [None] |
There was a problem hiding this comment.
Note: I found that passing in integer device ID would get the tensors pegged to the GPU memory but the GPU compute utilization remains at 0, as shown by nvidia-smi. After I change the device ID to that returned by torch.device("cuda:0"), the GPU is utilized fully. I do not know why that's the case as simple test using a python loop can cause GPU utilization.
Example run passing in integer device ID:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.142.00 Driver Version: 450.142.00 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
| N/A 47C P0 70W / 149W | 322MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 27173 C ...vs/pytorch_p37/bin/python 319MiB |
+-----------------------------------------------------------------------------+
Example run passing in cuda:*:
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 27346 C ...vs/pytorch_p37/bin/python 1736MiB |
+-----------------------------------------------------------------------------+
Sat Dec 4 01:31:31 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.142.00 Driver Version: 450.142.00 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
| N/A 52C P0 138W / 149W | 1739MiB / 11441MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 27346 C ...vs/pytorch_p37/bin/python 1736MiB |
+-----------------------------------------------------------------------------+
| # lower_bound: [continuous cols] | ||
| # upper_bound: [continuous cols] | ||
| } | ||
| return settings |
There was a problem hiding this comment.
Note: I added this new setting file so that the user can pass in CR/CL bounds as well as controlling normalization and one-hot encoding etc. (those are currently hard-coded)
| if self.left is not None and self.right is not None: | ||
| if cl is not None and cr is not None: # bounds are specified | ||
| cl = torch.tensor(cl).type(torch.float).t() | ||
| cr = torch.tensor(cr).type(torch.float).t() |
There was a problem hiding this comment.
Note: here we can pass in the cl/cr bounds directly.
| cl = 3. * (2. * torch.rand(self.n, self.input_dim[1]) - 1.) | ||
| cr = 3. * (2. * torch.rand(self.n, self.input_dim[1]) - 1.) | ||
| assert torch.Size([self.n, self.input_dim[1]]) == cl.size() | ||
| assert torch.Size([self.n, self.input_dim[1]]) == cr.size() |
There was a problem hiding this comment.
Note: and verify the shapes are correct.
| estimated_grad=estimated_grad) | ||
|
|
||
| self.net.cuda(self.device_id) | ||
| if self.device_id and self.device_id.type == 'cuda': |
There was a problem hiding this comment.
Note: the condition allows the program to run in CPU mode as well.
| self.feature_enc = preprocessing.OneHotEncoder(categories='auto', drop=drop) | ||
| self.imp = SimpleImputer(missing_values=np.nan, strategy='mean') | ||
| self.feature_enc = preprocessing.OneHotEncoder(categories='auto', drop=drop) if one_hot_encode_features else None | ||
| self.imp = SimpleImputer(missing_values=np.nan, strategy='mean') if impute_continuous else None |
There was a problem hiding this comment.
Note: for dataset not requiring or already have one-hot encoding or imputation, they can now be skipped.
|
Thank you very much for the PR. I am busy on other stuff now and will check the code after Dec 9. |

Fix execution on CPU and GPU. Fix model loading.