Add 'Lost and Found' Road Hazard Dataset#950
Add 'Lost and Found' Road Hazard Dataset#950tfds-copybara merged 12 commits intotensorflow:masterfrom
Conversation
|
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
|
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
|
As a suggestion, it was not initially clear that I could sign individually even if my university/company did not sign a CLA, as long as I have general rights to open-source my work. The CLA behind the link made that clear, so I could sign now 👍 |
| yield image_id, features | ||
|
|
||
| # Helper functions | ||
|
|
There was a problem hiding this comment.
Please use pylint. For TF code style check this link
There was a problem hiding this comment.
I assume you would like to reduce the 2 blank lines to 1?
I checked all files with pylint using the TF code style (following the provided script in oss_scripts/lint.sh), however there seems to be no linter (neither pylint nor pycodestyle) that actually checks for that. Fixed it anyways.
| tf.compat.v1.enable_eager_execution() | ||
|
|
||
| # create fake files | ||
| example_dir = ('tensorflow_datasets/testing/test_data/fake_examples/' |
There was a problem hiding this comment.
tfds check this as default, you can delete this.
| # create fake files | ||
| example_dir = ('tensorflow_datasets/testing/test_data/fake_examples/' | ||
| 'lost_and_found') | ||
| testing.test_utils.remake_dir(example_dir) |
There was a problem hiding this comment.
You shouldn't create fake examples on every testing. You should just create and add fake examples.
There was a problem hiding this comment.
There was a problem hiding this comment.
@hermannsblum - Thanks for the PR.
the 120 MB is too much. One way to bring it down: create the fake image that it not a noise but (for example) the simple pattern (even one-color).
This way such files would compress as png lot better.
There was a problem hiding this comment.
@cyfra Totally agree. I adjusted the random png generation in fake_data_utils.py to always generate pngs from a 4x4 random pattern. However, this function is potentially also used by other scripts, so should I add an argument to switch this on and off for backwards compatibility? I don't think huge png files are the wanted behaviour in any case, but I don't knwo how you usually handle this here.
There was a problem hiding this comment.
Yes, it'd be great to have such argument (probably as default?).
But please do it in separate pull request.
| download_urls['gt'] = base_url.format('gtCoarse') | ||
| if 'disparity_map' in self.builder_config.features: | ||
| download_urls['disparity_map'] = base_url.format('disparity') | ||
| # split into two steps to save space for testing data |
There was a problem hiding this comment.
I didn't understand this part. Can you explain more why you decided not to use download_and_extract ?
There was a problem hiding this comment.
If I run download_and_extract the test will fail because it apparently skips the whole command during testing, instead of extracting those zip-files specified in DL_EXTRACT_RESULTS in lost_and_found_test.py.
Splitting the two steps correctly replaces the results from the download-function with DL_EXTRACT_RESULTS and subsequently extracts the archives in the unittest, which means that the test-data can be stored in compressed zip-archives, which saves space.
I will update the comment to explain this better.
Or is there another workaround to fix this issue?
| self.builder_config.right_image_string) | ||
| if 'segmentation_label' in self.builder_config.features \ | ||
| or 'instance_id' in self.builder_config.features: | ||
| download_urls['gt'] = base_url.format('gtCoarse') |
There was a problem hiding this comment.
AFAIK - you can pass the same URL multiple times to download_and_extract (and will do the "smart" thing).
this might make your code easier (as you can pass download_urls['segmentation_label'] and download_urls['instance_id'])
There was a problem hiding this comment.
I will update this, your suggestions is much more elegant and also better readable, thanks!
06f7bca to
46fb524
Compare
|
@cyfra I resolved the python2 issue. The python3 issue appears to be unrelated to this code, so I merged the latest master version in. Please rerun CI. |
PiperOrigin-RevId: 278492512
This PR adds the Lost and Found dataset.
I mostly followed the code from PR #225 as the datasets follow the same structure, but added a script to generate fake-data.
The gist of the
dataset.infofiles is here.This is my first contribution, so please excuse if I missed some step, I tried to follow the contributing guidelines closely.
I so far did not get any information about a CLA between ETH Zurich and Google. I am sure there is one, but my universities contact person will be on holidays for at least another week. If you can find out in the meantime whether there is an agreement and how I can add my email to it that would be great!