Add flatten command #85

jeremy-wl · 2019-03-25T22:19:06Z

A new operator-courier CLI command called flatten which, given a directory of operator bundles, it will extract the versioned csvs and latest version of each crd along with the package file and create a new flat directory of yaml files.

jeremy-wl · 2019-03-25T22:20:49Z

/cc @kevinrizza @SamiSousa

operatorcourier/flatten.py

operatorcourier/api.py

operatorcourier/flatten.py

kevinrizza · 2019-03-26T12:33:12Z

Could we include a link to the nested bundle docs somewhere, at least in the help message?

https://github.com/operator-framework/operator-registry#manifest-format

MartinBasti · 2019-03-26T14:50:15Z

Also would be nice if dir structure is already flat to not raise an error but log only INFO message

csomh · 2019-03-26T16:39:41Z

Also would be nice if dir structure is already flat to not raise an error but log only INFO message

To clarify this a little bit: we discussed with @ralphbean that in order to avoid confusion among OMPS users, the service should be able to handle both flattened and nested content, using the same endpoint. To be able to do this, OMPS either

should be able to tell the difference between the two, and probably should rely on Operator Courier to get this information. This would require an is_flat or something similar API.

OR

flatten should do nothing if called on a "flat" tree.

jeremy-wl · 2019-03-26T22:03:24Z

@MartinBasti @csomh @kevinrizza Thank you guys very much for the very useful comments. I have addressed all of them and please review my changes when you are free.

tox.ini

operatorcourier/api.py

operatorcourier/flatten.py

csomh · 2019-03-27T09:33:58Z

operatorcourier/cli.py

        api.nest(args.source_dir, args.registry_dir)
+
+    # Parse the flatten command
+    def flatten(self):


I think something is missing from the usage string of flatten -h:

$ operator-courier flatten -h usage: operator-courier [-h] source_dir dest_dir

@csomh This is strange... I am able to see the description on my side.

(operator-courier) 10:42 > operator-courier git:(add-flatten-command) ✗ operator-courier flatten -h usage: operator-courier [-h] source_dir dest_dir Given a directory with different versions of operator bundles (CRD, CSV, package), this command extracts versioned CSVs and the latest version of each CRD along with the package file and creates a new flat directory of yaml files. See https://github.com/operator-framework/operator-registry#manifest- format to find out more about how nested bundles should be structured. positional arguments: source_dir Path of the source directory that contains different versions of operator bundles (CRD, CSV, package) dest_dir The new flat directory that contains extracted bundle files optional arguments: -h, --help show this help message and exit

@JEREMYLINLIN, it's not about the description, it's the usage string. For operator-courier flatten -h my expectation would be

operator-courier flatten [-h] source_dir dest_dir

but it's:

operator-courier [-h] source_dir dest_dir

@csomh Gotcha. But it seems to be a general issue for all other subcommands. I fiddled around for some time and wasn't able to find a quick fix. Maybe we should open up an issue and make a separate PR for this? @kevinrizza

It's a good point, but let's not check in more bad code. Let's do this one right and fix the others in a separate PR.

@kevinrizza May I hard code the right usage for flatten for this time?

def flatten(self): parser = argparse.ArgumentParser( usage='operator-courier flatten [-h] source_dir dest_dir', description='Given a directory with ...')

Because after reading some docs about subcommands and doing some experiments, it seems the current subcommands implementation is not the standard way of doing it. And it will be very hard to make the flatten help message work using the "right way" without breaking others.

I can also make a PR to fix the CLI subcommands before fixing this PR, but this may take a while to get through while the current PR is more urgent.

operatorcourier/flatten.py

operatorcourier/api.py

operatorcourier/flatten.py

kevinrizza · 2019-03-27T13:04:34Z

Also would be nice if dir structure is already flat to not raise an error but log only INFO message

To clarify this a little bit: we discussed with @ralphbean that in order to avoid confusion among OMPS users, the service should be able to handle both flattened and nested content, using the same endpoint. To be able to do this, OMPS either

should be able to tell the difference between the two, and probably should rely on Operator Courier to get this information. This would require an is_flat or something similar API.

OR

flatten should do nothing if called on a "flat" tree.

I would very strongly prefer the latter. If we aren't going to successfully create the new directory, we should just do nothing and log something indicating that happened.

jeremy-wl · 2019-03-27T16:10:37Z

/cc @csomh @MartinBasti @kevinrizza

openshift-ci-robot · 2019-03-27T16:10:40Z

@JEREMYLINLIN: GitHub didn't allow me to request PR reviews from the following users: csomh.

Note that only operator-framework members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @csomh @MartinBasti @kevinrizza

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

operatorcourier/flatten.py

operatorcourier/api.py

SamiSousa · 2019-03-27T17:47:07Z

operatorcourier/flatten.py

+    for csv_file_name, csv_entries in create_csv_dict(csv_paths).items():
+        for (version, csv_path) in csv_entries:
+            basename, ext = os.path.splitext(csv_file_name)
+            file_paths_to_copy.append((csv_path, f'{basename}-v{version}{ext}'))


So here we're renaming the CSV files to avoid naming overlap, is that right? In that case, this format for filename is nice.
Will the resulting file ext be of the format .clusterserviceversion.yaml?

So here we're renaming the CSV files to avoid naming overlap, is that right?

Yes.

Will the resulting file ext be of the format .clusterserviceversion.yaml?

It will append a version string before the extension. You can check the test cases to get a better sense.

Why add the version in two spots of the file name? Is it because the first version was part of the file name?
To keep with conventions for yaml naming, the file format for CSVs should look something like:

{basename}{separator}v{version}.clusterserviceversion.yaml

For example:

etcd-operator.v0.9.2.clusterserviceversion.yaml

yeah but that's not guaranteed @SamiSousa , it's just a convention. we don't want this utility to fail just because someone broke a convention.

@SamiSousa See this conversation and this.

@kevinrizza Where would this fail based on a convention? It seems like we're renaming the files anyways, so in a way aren't we just setting our own convention?
It looks like we're pulling the basename from the original filename, so that would explain why the version pops up twice. I don't mind if we have the version back to back like so:

etcd-operator.v0.9.2-v0.9.2.clusterserviceversion.yaml

The reason I'm insisting is because other tools may naively be depending on this filenaming convention to discern which files are CSVs, which are CRDs, etc.

Because there's nothing stopping anyone from naming their files whatever they want. This command literally just renames them so we don't get collisions. No other tool besides courier should ever touch these temp files -- if they do that is a mistake and should be corrected.

operatorcourier/flatten.py

csomh · 2019-03-28T08:05:28Z

lgtm, thank you @JEREMYLINLIN 👍

jeremy-wl · 2019-03-28T13:38:10Z

@csomh No problem. Thank you so much for reviewing my PR and providing great feedback.

MartinBasti · 2019-03-29T10:53:07Z

operatorcourier/flatten.py

+        if not os.path.isfile(item_path):
+            logger.warning('Ignoring %s as it is not a regular file.', item)
+
+        with open(item_path, 'r') as f:


I tested this PR and I got following error (because I had binary (zip) file in that directory by accident):

$ operator-courier flatten orig dest 'utf-8' codec can't decode byte 0xd8 in position 14: invalid continuation byte

There is no traceback, but I assume that this line the source of that error.

push and verify commands reads only files with *.yaml and *.yml extensions, IMHO flatten should work consistently with other commands.

+1 This is valuable feedback. I think there should be two takeways from this: we certainly limit our parsing and copying in this command to just yaml files. And we should also explicitly log and return an error if the folder contains files which are not of that type.

@MartinBasti Thanks for the feedback! I've updated the code to handle that situation.

MartinBasti

Thank you!

MartinBasti · 2019-04-01T12:07:10Z

Actually coverage could be updated, https://coveralls.io/builds/22511114/source?filename=operatorcourier/flatten.py but it could be a new PR to not block this feature

MartinBasti · 2019-04-01T12:35:31Z

operatorcourier/flatten.py

+from typing import Dict, Tuple
+from shutil import copyfile
+import semver
+import operatorcourier.identify as identify


How about?

from operatorcourier import identify from operatorcourier import errors

Thanks @MartinBasti. Updated.

kevinrizza · 2019-04-01T13:55:31Z

/lgtm

openshift-ci-robot requested review from MartinBasti and awgreene March 25, 2019 22:19

openshift-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Mar 25, 2019

openshift-ci-robot requested review from SamiSousa and kevinrizza March 25, 2019 22:20

MartinBasti reviewed Mar 26, 2019

View reviewed changes