Skip to content

Allow passing in cr/cl bounds and other settings#6

Open
winston-zillow wants to merge 3 commits into12wang3:mainfrom
winston-zillow:main
Open

Allow passing in cr/cl bounds and other settings#6
winston-zillow wants to merge 3 commits into12wang3:mainfrom
winston-zillow:main

Conversation

@winston-zillow
Copy link

Fix execution on CPU and GPU. Fix model loading.

```bash
# trained on the tic-tac-toe data set with one GPU.
python3 experiment.py -d tic-tac-toe -bs 32 -s 1@16 -e401 -lrde 200 -lr 0.002 -ki 0 -mp 12481 -i 0 -wd 1e-6 &
python3 experiment.py -d tic-tac-toe -bs 32 -s 1@16 -e401 -lrde 200 -lr 0.002 -ki 0 -mp 12481 -i cuda:0 -wd 1e-6 &
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: see review comment on args.py changes

rrl_args.test_res = os.path.join(rrl_args.folder_path, 'test_res.txt')
rrl_args.device_ids = list(map(int, rrl_args.device_ids.strip().split('@')))
rrl_args.device_ids = list(map(lambda id: torch.device(id), rrl_args.device_ids.strip().split('@'))) \
if rrl_args.device_ids else [None]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I found that passing in integer device ID would get the tensors pegged to the GPU memory but the GPU compute utilization remains at 0, as shown by nvidia-smi. After I change the device ID to that returned by torch.device("cuda:0"), the GPU is utilized fully. I do not know why that's the case as simple test using a python loop can cause GPU utilization.

Example run passing in integer device ID:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.142.00   Driver Version: 450.142.00   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
| N/A   47C    P0    70W / 149W |    322MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     27173      C   ...vs/pytorch_p37/bin/python      319MiB |
+-----------------------------------------------------------------------------+

Example run passing in cuda:*:

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     27346      C   ...vs/pytorch_p37/bin/python     1736MiB |
+-----------------------------------------------------------------------------+
Sat Dec  4 01:31:31 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.142.00   Driver Version: 450.142.00   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
| N/A   52C    P0   138W / 149W |   1739MiB / 11441MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     27346      C   ...vs/pytorch_p37/bin/python     1736MiB |
+-----------------------------------------------------------------------------+

# lower_bound: [continuous cols]
# upper_bound: [continuous cols]
}
return settings
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I added this new setting file so that the user can pass in CR/CL bounds as well as controlling normalization and one-hot encoding etc. (those are currently hard-coded)

if self.left is not None and self.right is not None:
if cl is not None and cr is not None: # bounds are specified
cl = torch.tensor(cl).type(torch.float).t()
cr = torch.tensor(cr).type(torch.float).t()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: here we can pass in the cl/cr bounds directly.

cl = 3. * (2. * torch.rand(self.n, self.input_dim[1]) - 1.)
cr = 3. * (2. * torch.rand(self.n, self.input_dim[1]) - 1.)
assert torch.Size([self.n, self.input_dim[1]]) == cl.size()
assert torch.Size([self.n, self.input_dim[1]]) == cr.size()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: and verify the shapes are correct.

estimated_grad=estimated_grad)

self.net.cuda(self.device_id)
if self.device_id and self.device_id.type == 'cuda':
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the condition allows the program to run in CPU mode as well.

self.feature_enc = preprocessing.OneHotEncoder(categories='auto', drop=drop)
self.imp = SimpleImputer(missing_values=np.nan, strategy='mean')
self.feature_enc = preprocessing.OneHotEncoder(categories='auto', drop=drop) if one_hot_encode_features else None
self.imp = SimpleImputer(missing_values=np.nan, strategy='mean') if impute_continuous else None
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: for dataset not requiring or already have one-hot encoding or imputation, they can now be skipped.

@12wang3
Copy link
Owner

12wang3 commented Dec 7, 2021

Thank you very much for the PR. I am busy on other stuff now and will check the code after Dec 9.

@ASan1527
Copy link

ASan1527 commented Nov 8, 2022

image
I cant catch the device_ids, and I only have the single gpu, I don't know how to change the code. Could you please tell me to solve it? thank you!

@12wang3
Copy link
Owner

12wang3 commented Nov 8, 2022

image I cant catch the device_ids, and I only have the single gpu, I don't know how to change the code. Could you please tell me to solve it? thank you!

Could you please show the command you used? Have you set the "-i" argument? It seems that you did not set the device_ids since your device_ids was None. If you only have one single GPU, you can use "-i 0" to set the device_ids. By the way, maybe we should use issue rather than PR to discuss questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants