Code for the HPSS Integrity Crawler resides in a git repo on the TechInt github organization:
https://github.com/ORNL-TechInt/hpss-crawler
A white paper motivating the HPSSIC is available in Google documents here:
The high level design for the HPSSIC is documented in a Google document here:
These documents are owned by Tom Barron. Please contact him for access if needed.
$ py.test --all
Test results are written to stdout and also logged in crawl_test.log so a history is available.
The tests for a specific component can be run like this:
$ py.test -k test_
To see a list of testable components,
$ ls -al hpssic/test/test_*.py
The git repo has the following branches used for the indicated intentions
<feature>
Current development
master
Last stable release
prehistory
This branch contains the history of the work from before the
github repo was established. It is fully incorporated into
devel, however deleting this branch would lose the steps from
project inception to when the github repo was set up.
$ git clone https://github.com/ORNL-TechInt/hpss-crawler.git hpssic
$ cd hpssic
There are two hook scripts provided in directory githooks. If you will be pushing to the repository on github or gerrit, please install the hook scripts after cloning the repository (see above) as follows:
$ ln -s `pwd`/githooks/pre-commit .git/hooks/pre-commit
$ ln -s `pwd`/githooks/commit-msg .git/hooks/commit-msg
The commit-msg hook adds "Change-ID" values to commit messages as required by gerrit.
The pre-commit hook checks the output of 'git describe' against the system version in hpssic/version.py to ensure they are aligned before allowing the commit to go forward.
The configuration file and log path can each be specified in several ways. The following table shows the precedence of the various mechanisms. Items higher in the table will supercede those further down. So, for example, if $CRAWL_LOG is set and --logpath is specified on the command line, log messages will be written to the file specified on the command line.
| | Configuration File | Log Path |
|:------------------------:|:------------------:|:------------------:|
| Command line option | -c/--cfg | -l/--logpath |
| Configuration file entry | | crawler/logpath |
| Environment variable | $CRAWL_CONF | $CRAWL_LOG |
| Default value | crawl.cfg | /var/log/crawl.log |
| | | or /tmp/crawl.log |
Note that it makes no sense to specify the location of the configuration file in the configuration file itself, so that is not supported.
If the crawler is run by the root user, it will be able to write in /var/log and will use /var/log/crawl.log as the default log file. If /var/log is not write accessible (e.g., if the crawler is run by some other user), /tmp/crawl.log will be used as the default log file.
-
CRAWL - Where the crawler is deployed. In this directory should appear subdirectories hpssic, githooks, dist, etc.
-
CRAWL_CONF - Full path for the configuration file to use. This environment variable overrides the default and is in turn overridden by any value specified on the command line.
-
CRAWL_LOG - Full path for the log file to use. Overrides the default and any value specified in a configuration file, is overridden by any value specified on the command line.
-
HPSS_ROOT - Where to find an instance of the HPSS source code if needed.
-
KEEPFILES - If set to something other than '' or '0', indicates that files generated by running the tests should be preserved for examination after the test run is complete. Otherwise, the test files will be removed after the test run completes.