Skip to content

ORNL-TechInt/esg-octopus

Repository files navigation

ESG Octopus Development Notes

What's here:
   README
      This file. It is in git.

   esg_octopus.py
      The base script that implements the octopus beak. To create the
      "esg_octopus" symlink, run "./esg_octopus.py --makelink".
      esg_octopus.py is in git. The symlink (esg_octopus) is not.

   esg_octopus.cfg
      The configuration file for esg_octopus. Not in git.

   esg_octopus.cfg.example
      An example configuration file that can be copied to
      esg_octopus.cfg. This is in git.

   data/
      A copy of the THREDDS catalog from ORNL's production data node
      esg2-sdn1. This is not checked into git.

   java/
      The .jar files needed for plugins and plugouts. These are not in
      git. The setup/install script (not yet written) will download
      these at install time.

   plugins
      Where input format plugins reside. Each plugin has the structure
      described later in this file. The netcdf example plugin is in git.

   plugouts
      Where output format plugins reside. Each plugout has the
      structure described later in this file. The thredds example
      plugout is in git.

What Needs to be Done

 - Finish characterizing the contents of our current THREDDS catalog
   for guidance on what needs to go into THREDDS catalogs used in ESG.
   This will likely need to be broadened in the future. We have to
   start somewhere.

 - Figure out how to build a catalog in memory using the
   ThreddsXmlWriter and related classes.

 - Identify the calls the thredds plugout will make to gather the
   information it needs to construct the catalog. The thredds plugout
   will call up to the beak. The beak will forward these calls to the
   netcdf plugin.

 - Write the routines in the beak layer that will receive the calls
   from the thredds plugout.

 - Work out how to incorporate the cdms2 python module into
   esg_octopus. Is this another piece the install script has to
   download at install time? This will be needed to complete the
   netcdf routines.

 - Write the routines in the netcdf plugin that will receive the calls
   forwarded by the beak.

Plugin/Plugout Layout

   Each plugin or plugout has its own subdirectory in the respective
   PluginDir or PlugoutDir, as defined in the configuration file.

   At a minimum, the plug{in,out} directory contains a file named
   __init__.py which is called when the plug{in,out} is imported.

   The subdirectory name is the string the user will enter on the
   command line as the argument for options --input_format or
   --output_format to use the named plug{in,out}.

The current publication sequence:

   esgscan_directory --dataset <dataset-name> --project <project-name> \
      <pathtofiles> > <mapfile>
      # <mapfile> contains "dataset map" where each line contains:

         dataset_id | absolute_file_path | size

      # <dataset-name> and <project-name> have to be defined in esg.ini
        
   esgpublish --map <mapfile> --project <project-name>

   esgpublish --use-existing <dataset-name> --noscan --thredds

   esgpublish --use-existing <dataset-name> --noscan --publish
      (requires globus certificate)


Current THREDDS Catalog Characterizationn

   Looking at THREDDS xml files copied from esg2-sdn1.

   All files begin with the line

      <?xml version='1.0' encoding='UTF-8'?>

   This is followed in each file by a single <catalog> ... </catalog>
   element which occupies the rest of the file. It appears that the
   <catalog ...> tag always contains the same attributes (xmlns:xsi,
   xmlns:xlink, xmlns, name, xsi:schemaLocation). Only the value of
   the name attribute varies. For the catalog.xml file at the top
   level, name is "ORNL Earth System Grid catalog". For files one
   layer down, it's always "TDS configuration file".

   catalog.xml: Inside the <catalog> ... </catalog> element appear:
    - datasetRoot
       > path
       > location
    - catalogRef (repeated 46 times)
       > name
       > xlink:title
       > xlink: href

   The following files are missing from catalog.xml:

      ornl.arm.data.mytest.v1.xml
      ornl.arm.data.rgmtest.v1.xml
      ornl.ccsm.b30.041g.proc.tseries.monthly.v1.xml
      ornl.cdiac.ameriflux_L2.AMF_USARb.v1.xml
      ornl.cdiac.ameriflux_L2.AMF_USARc.v1.xml
      ornl.cdiac.ameriflux_L2.AMF_USARM.v1.xml
      ornl.cdiac.ameriflux_L2.AMF_USAud.v1.xml
      ornl.cdiac.ameriflux_L2.AMF_USSP1.v1.xml
      pcmdi.ornl.test.mytest.v1.xml
      pcmdi.test.mytest.v1.xml

   In the lower level files, <catalog> ... </catalog> contains:
    - service
       > OpenDAP
       > HTTPServer
       > SRM
    - property
       > catalog_version
    - dataset (may be nested)
       > property (multiple)
       > metadata
          > variables
             > variable name="..."
       > metadata inherited="true"
       > nested datasets

   In the thredds catalog, we have the following xml files. Under each
   .xml files is listed the netcdf files it represents

   obs4cmip5.observations.cmbe.ARM.Barrow.in-situ_stations.atmos.1hr.v1.xml
      PRJ/obs4cmip5/observations/atmos/ps/1hr/in_situ_sites/usdoe/arm/cmbe/
         v120110314/ps_arm_barrow_cmbe_v1p1_20010101-20101231.nc
      PRJ/obs4cmip5/observations/atmos/sfc/1hr/in_situ_sites/usdoe/arm/cmbe/
         v120110314/sfcWind_arm_barrow_cmbe_v1p1_20010101-20101231.nc
      PRJ/obs4cmip5/observations/atmos/tas/1hr/in_situ_sites/usdoe/arm/cmbe/
         v120110314/tas_arm_barrow_cmbe_v1p1_20010101-20101231.nc
      PRJ/obs4cmip5/observations/atmos/uas/1hr/in_situ_sites/usdoe/arm/cmbe/
         v120110314/uas_arm_barrow_cmbe_v1p1_20010101-20101231.nc
      PRJ/obs4cmip5/observations/atmos/vas/1hr/in_situ_sites/usdoe/arm/cmbe/
         v120110314/vas_arm_barrow_cmbe_v1p1_20010101-20101231.nc

   obs4cmip5.observations.cmbe.ARM.Darwin.in-situ_stations.atmos.1hr.v1.xml
      PRJ/obs4cmip5/observations/atmos/ps/1hr/in_situ_sites/usdoe/arm/cmbe/
         v120110314/ps_arm_darwin_cmbe_v1p1_20020101-20101231.nc
      PRJ/obs4cmip5/observations/atmos/sfc/1hr/in_situ_sites/usdoe/arm/cmbe/
         v120110314/sfcWind_arm_darwin_cmbe_v1p1_20020101-20101231.nc
      PRJ/obs4cmip5/observations/atmos/tas/1hr/in_situ_sites/usdoe/arm/cmbe/
         v120110314/tas_arm_darwin_cmbe_v1p1_20020101-20101231.nc
      PRJ/obs4cmip5/observations/atmos/uas/1hr/in_situ_sites/usdoe/arm/cmbe/
         v120110314/uas_arm_darwin_cmbe_v1p1_20020101-20101231.nc
      PRJ/obs4cmip5/observations/atmos/vas/1hr/in_situ_sites/usdoe/arm/cmbe/
         v120110314/vas_arm_darwin_cmbe_v1p1_20020101-20101231.nc

   obs4cmip5.observations.cmbe.ARM.Manus.in-situ_stations.atmos.1hr.v1.xml
      PRJ/obs4cmip5/observations/atmos/ps/1hr/in_situ_sites/usdoe/arm/cmbe/
         v120110314/ps_arm_manus_cmbe_v1p1_19980101-20101231.nc
      PRJ/obs4cmip5/observations/atmos/sfc/1hr/in_situ_sites/usdoe/arm/cmbe/
         v120110314/sfcWind_arm_manus_cmbe_v1p1_19980101-20101231.nc
      PRJ/obs4cmip5/observations/atmos/tas/1hr/in_situ_sites/usdoe/arm/cmbe/
         v120110314/tas_arm_manus_cmbe_v1p1_19980101-20101231.nc
      PRJ/obs4cmip5/observations/atmos/uas/1hr/in_situ_sites/usdoe/arm/cmbe/
         v120110314/uas_arm_manus_cmbe_v1p1_19980101-20101231.nc
      PRJ/obs4cmip5/observations/atmos/vas/1hr/in_situ_sites/usdoe/arm/cmbe/
         v120110314/vas_arm_manus_cmbe_v1p1_19980101-20101231.nc

   obs4cmip5.observations.cmbe.ARM.Nauru.in-situ_stations.atmos.1hr.v1.xml
   ornl.arm.cmbe-atm.nsac1.v1.xml
   ornl.arm.cmbe-atm.sgpc1.v1.xml
   ornl.arm.cmbe-atm.twpc1.v1.xml
   ornl.arm.cmbe-atm.twpc2.v1.xml
   ornl.arm.cmbe-atm.twpc3.v1.xml
   ornl.arm.cmbe-cldrad.nsac1.v1.xml
   ornl.arm.cmbe-cldrad.sgpc1.v1.xml
   ornl.arm.cmbe-cldrad.twpc1.v1.xml
   ornl.arm.cmbe-cldrad.twpc2.v1.xml
   ornl.arm.cmbe-cldrad.twpc3.v1.xml
   ornl.arm.data.mytest.v1.xml
   ornl.arm.data.offline.v1.xml
   ornl.arm.data.rgmtest.v1.xml
   ornl.ccsm.b30.041g.proc.tseries.monthly.v1.xml
   ornl.cdiac.ameriflux_L2.AMF_USARb.v1.xml
   ornl.cdiac.ameriflux_L2.AMF_USARc.v1.xml
   ornl.cdiac.ameriflux_L2.AMF_USARM.v1.xml
   ornl.cdiac.ameriflux_L2.AMF_USAud.v1.xml
   ornl.cdiac.ameriflux_L2.AMF_USSP1.v1.xml
   ornl.c-lamp.exp1.clm3_casa.exp1_2.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp1.clm3_casa.exp1_3.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp1.clm3_casa.exp1_4.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp1.clm3_casa.exp1_6.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp1.clm3_casa.exp1_7.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp1.clm3_cn.exp1_2.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp1.clm3_cn.exp1_3.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp1.clm3_cn.exp1_4.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp1.clm3_cn.exp1_6.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp1.clm3_cn.exp1_7.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp1_fix.clm3_casa.exp1_2.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp1_fix.clm3_casa.exp1_3.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp1_fix.clm3_casa.exp1_4.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp1_fix.clm3_casa.exp1_6.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp1_fix.clm3_casa.exp1_7.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp1_fix.clm3_cn.exp1_2.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp1_fix.clm3_cn.exp1_3.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp1_fix.clm3_cn.exp1_4.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp1_fix.clm3_cn.exp1_6.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp1_fix.clm3_cn.exp1_7.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp2.clm3_casa.exp2_2.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp2.clm3_casa.exp2_3.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp2.clm3_casa.exp2_4.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp2.clm3_casa.exp2_6.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp2.clm3_cn.exp2_2.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp2.clm3_cn.exp2_3.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp2.clm3_cn.exp2_4.run1.monthly_mean.v1.xml
   ornl.c-lamp.exp2.clm3_cn.exp2_6.run1.monthly_mean.v1.xml
   ornl.ultrahighres.CESM1.T85F09_12YRS.F1850.v1.xml
   ornl.ultrahighres.CESM1.T85F09_5YRS.B1850.v1.xml
   ornl.ultrahighres.CESM1.T85F09.B1850.v1.xml
   pcmdi.ornl.test.mytest.v1.xml
   pcmdi.test.mytest.v1.xml

About

Generalized metadata extractor

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages