Fix slowdown issue and add functionality#6
Conversation
Fix issue in save_image where each image is opened in a plot but not closed. This caused massive slowdowns. By using figures and immediately closing them after saving, the program runs at optimal speed. Add feature to save unlabeled images. Previously only the 5,000 labeled training images were saved, this program now has capability to save the 100,000 unlabeled image dataset.
| print(filename) | ||
| save_image(image, filename) | ||
| i = i+1 | ||
| if labels: |
There was a problem hiding this comment.
This part has a lot of code duplication - if and else block is practically identical, https://pl.wikipedia.org/wiki/DRY is violated.
If you did that in for loop it would be better, e.g. like that (not tested!):
for image in images:
if labels:
label = labels[i]
directory = './img/' + str(label) + '/'
else:
directory='./unlabeled_img/'
try:
os.makedirs(directory, exist_ok=True)
except OSError as exc:
if exc.errno == errno.EEXIST:
pass
filename = directory + str(i)
print(filename)
save_image(image, filename)
i = i+1
There was a problem hiding this comment.
Sorry about that, I was rushing writing the code a bit. Would you like me to update the code and submit another pull request? I'm a bit new to the pull request system so I'm not really sure how this process works.
There was a problem hiding this comment.
No problem :)
I'm also new to pull request in github, but I think just update the code in your repository and branch, push it and I should see the change in this pull request.
|
Since editing code to fit @pnaszarkowski suggestion, this code has not been tested. Will update with comment when code has been tested and reviewed. |
Update save_images to fix main if statement. Still need to test fully.
This updated file resolves an issue where the save_image function caused massive slowdowns due to the handling of the plotted image. Using plt.figure() and plt.close(fig) allows the program to run at full speed. Before this change, saving the 5,000 labeled images took over 20 hours. After the change, it will take only a few minutes dependent on computer.
This updated file also adds functionality to allow the saving of the 100,000 unlabeled images dataset by adding to the save_images function.