Skip to content

Conversation

@jeremy-wl
Copy link
Member

A new operator-courier CLI command called flatten which, given a directory of operator bundles, it will extract the versioned csvs and latest version of each crd along with the package file and create a new flat directory of yaml files.

@openshift-ci-robot openshift-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Mar 25, 2019
@jeremy-wl
Copy link
Member Author

/cc @kevinrizza @SamiSousa

@kevinrizza
Copy link
Member

Could we include a link to the nested bundle docs somewhere, at least in the help message?

https://github.com/operator-framework/operator-registry#manifest-format

@MartinBasti
Copy link
Contributor

Also would be nice if dir structure is already flat to not raise an error but log only INFO message

@csomh
Copy link
Contributor

csomh commented Mar 26, 2019

Also would be nice if dir structure is already flat to not raise an error but log only INFO message

To clarify this a little bit: we discussed with @ralphbean that in order to avoid confusion among OMPS users, the service should be able to handle both flattened and nested content, using the same endpoint. To be able to do this, OMPS either

  • should be able to tell the difference between the two, and probably should rely on Operator Courier to get this information. This would require an is_flat or something similar API.

OR

  • flatten should do nothing if called on a "flat" tree.

@jeremy-wl jeremy-wl force-pushed the add-flatten-command branch from 86f89bb to 2a9a333 Compare March 26, 2019 21:58
@jeremy-wl
Copy link
Member Author

@MartinBasti @csomh @kevinrizza Thank you guys very much for the very useful comments. I have addressed all of them and please review my changes when you are free.

api.nest(args.source_dir, args.registry_dir)

# Parse the flatten command
def flatten(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something is missing from the usage string of flatten -h:

$ operator-courier flatten -h
usage: operator-courier [-h] source_dir dest_dir

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csomh This is strange... I am able to see the description on my side.

(operator-courier) 10:42 > operator-courier git:(add-flatten-command) ✗ operator-courier flatten -h
usage: operator-courier [-h] source_dir dest_dir

Given a directory with different versions of operator bundles (CRD, CSV,
package), this command extracts versioned CSVs and the latest version of each
CRD along with the package file and creates a new flat directory of yaml
files. See https://github.com/operator-framework/operator-registry#manifest-
format to find out more about how nested bundles should be structured.

positional arguments:
  source_dir  Path of the source directory that contains different versions of
              operator bundles (CRD, CSV, package)
  dest_dir    The new flat directory that contains extracted bundle files

optional arguments:
  -h, --help  show this help message and exit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JEREMYLINLIN, it's not about the description, it's the usage string. For operator-courier flatten -h my expectation would be

operator-courier flatten [-h] source_dir dest_dir

but it's:

operator-courier [-h] source_dir dest_dir

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csomh Gotcha. But it seems to be a general issue for all other subcommands. I fiddled around for some time and wasn't able to find a quick fix. Maybe we should open up an issue and make a separate PR for this? @kevinrizza

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point, but let's not check in more bad code. Let's do this one right and fix the others in a separate PR.

Copy link
Member Author

@jeremy-wl jeremy-wl Mar 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinrizza May I hard code the right usage for flatten for this time?

    def flatten(self):
        parser = argparse.ArgumentParser(
            usage='operator-courier flatten [-h] source_dir dest_dir',
            description='Given a directory with ...')

Because after reading some docs about subcommands and doing some experiments, it seems the current subcommands implementation is not the standard way of doing it. And it will be very hard to make the flatten help message work using the "right way" without breaking others.

I can also make a PR to fix the CLI subcommands before fixing this PR, but this may take a while to get through while the current PR is more urgent.

@kevinrizza
Copy link
Member

Also would be nice if dir structure is already flat to not raise an error but log only INFO message

To clarify this a little bit: we discussed with @ralphbean that in order to avoid confusion among OMPS users, the service should be able to handle both flattened and nested content, using the same endpoint. To be able to do this, OMPS either

  • should be able to tell the difference between the two, and probably should rely on Operator Courier to get this information. This would require an is_flat or something similar API.

OR

  • flatten should do nothing if called on a "flat" tree.

I would very strongly prefer the latter. If we aren't going to successfully create the new directory, we should just do nothing and log something indicating that happened.

@jeremy-wl jeremy-wl force-pushed the add-flatten-command branch from 2a9a333 to 64095c9 Compare March 27, 2019 15:53
@jeremy-wl
Copy link
Member Author

/cc @csomh @MartinBasti @kevinrizza

@openshift-ci-robot
Copy link

@JEREMYLINLIN: GitHub didn't allow me to request PR reviews from the following users: csomh.

Note that only operator-framework members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @csomh @MartinBasti @kevinrizza

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

for csv_file_name, csv_entries in create_csv_dict(csv_paths).items():
for (version, csv_path) in csv_entries:
basename, ext = os.path.splitext(csv_file_name)
file_paths_to_copy.append((csv_path, f'{basename}-v{version}{ext}'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here we're renaming the CSV files to avoid naming overlap, is that right? In that case, this format for filename is nice.
Will the resulting file ext be of the format .clusterserviceversion.yaml?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here we're renaming the CSV files to avoid naming overlap, is that right?

Yes.

Will the resulting file ext be of the format .clusterserviceversion.yaml?

It will append a version string before the extension. You can check the test cases to get a better sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add the version in two spots of the file name? Is it because the first version was part of the file name?
To keep with conventions for yaml naming, the file format for CSVs should look something like:

{basename}{separator}v{version}.clusterserviceversion.yaml

For example:

etcd-operator.v0.9.2.clusterserviceversion.yaml

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah but that's not guaranteed @SamiSousa , it's just a convention. we don't want this utility to fail just because someone broke a convention.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SamiSousa See this conversation and this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinrizza Where would this fail based on a convention? It seems like we're renaming the files anyways, so in a way aren't we just setting our own convention?
It looks like we're pulling the basename from the original filename, so that would explain why the version pops up twice. I don't mind if we have the version back to back like so:

etcd-operator.v0.9.2-v0.9.2.clusterserviceversion.yaml

The reason I'm insisting is because other tools may naively be depending on this filenaming convention to discern which files are CSVs, which are CRDs, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there's nothing stopping anyone from naming their files whatever they want. This command literally just renames them so we don't get collisions. No other tool besides courier should ever touch these temp files -- if they do that is a mistake and should be corrected.

@jeremy-wl jeremy-wl force-pushed the add-flatten-command branch from 64095c9 to 4e330fb Compare March 27, 2019 20:21
@csomh
Copy link
Contributor

csomh commented Mar 28, 2019

lgtm, thank you @JEREMYLINLIN 👍

@jeremy-wl
Copy link
Member Author

@csomh No problem. Thank you so much for reviewing my PR and providing great feedback.

if not os.path.isfile(item_path):
logger.warning('Ignoring %s as it is not a regular file.', item)

with open(item_path, 'r') as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this PR and I got following error (because I had binary (zip) file in that directory by accident):

$ operator-courier flatten orig dest
'utf-8' codec can't decode byte 0xd8 in position 14: invalid continuation byte

There is no traceback, but I assume that this line the source of that error.

push and verify commands reads only files with *.yaml and *.yml extensions, IMHO flatten should work consistently with other commands.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 This is valuable feedback. I think there should be two takeways from this: we certainly limit our parsing and copying in this command to just yaml files. And we should also explicitly log and return an error if the folder contains files which are not of that type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MartinBasti Thanks for the feedback! I've updated the code to handle that situation.

Copy link
Contributor

@MartinBasti MartinBasti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@MartinBasti
Copy link
Contributor

Actually coverage could be updated, https://coveralls.io/builds/22511114/source?filename=operatorcourier/flatten.py but it could be a new PR to not block this feature

from typing import Dict, Tuple
from shutil import copyfile
import semver
import operatorcourier.identify as identify
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about?

from operatorcourier import identify
from operatorcourier import errors

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MartinBasti. Updated.

@jeremy-wl jeremy-wl force-pushed the add-flatten-command branch from 574777d to 7768ed6 Compare April 1, 2019 12:44
@kevinrizza
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 1, 2019
@kevinrizza kevinrizza merged commit 9c2fcb9 into operator-framework:master Apr 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants