diff --git a/conf.py b/conf.py index ad9231c..c1eb849 100644 --- a/conf.py +++ b/conf.py @@ -4,3 +4,18 @@ # Remove this later after we fix documenteer mermaid_version = "11.9.0" + +# Ignore links + +linkcheck_ignore = [ + r'https://grafana.slac.stanford.edu.+', + r'https://.+.slac.stanford.edu.+', + r'https://confluence.lsstcorp.org.+', + r'https://github.com/.+', + r'https://docs.redhat.com.+', + r'https://slactraining.skillport.com.+', +] + +# disable TLS verification + +tls_verify = False diff --git a/docs/developers/kubernetes.rst b/docs/developers/kubernetes.rst index e69d796..3a259d5 100644 --- a/docs/developers/kubernetes.rst +++ b/docs/developers/kubernetes.rst @@ -4,7 +4,7 @@ SLAC Kubernetes Overview Operations activities will be carried out at the SLAC US Data Facility (USDF). Where possible, all USDF services will reside on top of a kubernetes platform. -SLAC operates a single large kubernetes cluster. The benefits of this are with increased scale (sharing of resources) and reduced management overhead. We run 'vanilla' kubernetes, deployed via `kubeadm `__. On top of this, to provide segregation and project control we use `loft.sh's vcluster `__. The latter provides a virtual kubernetes cluster from which we can provide a similar experience to `openshift's projects `__ or `GKE's projects and folders `__. +SLAC operates a single large kubernetes cluster. The benefits of this are with increased scale (sharing of resources) and reduced management overhead. We run 'vanilla' kubernetes, deployed via `kubeadm `__. On top of this, to provide segregation and project control we use `loft.sh's vcluster `__. The latter provides a virtual kubernetes cluster from which we can provide a similar experience to `openshift's projects `__ or `GKE's projects and folders `__. SLAC Virtual Clusters, aka "Projects" @@ -61,7 +61,7 @@ Connecting and Authenticating Generically: - Determine the 'project' that you wish to access, eg usdf-butler -- Go to https://k8s.slac.stanford.edu/ +- Go to https://k8s.slac.stanford.edu/usdf-butler - Click 'Sign-In' to begin the authentication procedure - Enter your SLAC credentials into the login page, and possibly your Duo 2Factor if requested. This step may automatically skip if you already have valid single sign on credentials in place already. - Click on 'Grant Access' to agree to register @@ -74,6 +74,6 @@ We currently provide kubernetes API access without the need for VPNs etc. i.e. y Miscellaneous ============= -- if you encounter an error like "Unable to connect to the server: No valid id-token, and cannot refresh without refresh-token" when running your kubectl, you will need to log back in via https://k8s.slac.stanford.edu/, re-executing the commands in the second box. This is because our OIDC (dex) implementation does not and cannot generate refresh tokens from our SAML2 (windows ADFS) backend. (Actually, only the ``set-credentials`` command is needed, but it doesn't hurt to execute them all.) +- if you encounter an error like "Unable to connect to the server: No valid id-token, and cannot refresh without refresh-token" when running your kubectl, you will need to log back in via ``https://k8s.slac.stanford.edu/``, re-executing the commands in the second box. This is because our OIDC (dex) implementation does not and cannot generate refresh tokens from our SAML2 (windows ADFS) backend. (Actually, only the ``set-credentials`` command is needed, but it doesn't hurt to execute them all.) Kubernetes secrets are usually held in Vault (vault.slac.stanford.edu). The vault command is available on USDF interactive nodes. You may need to activate it with ``module load vault``. Then login using the commands ``export VAULT_ADDR=https://vault.slac.stanford.edu; vault login -method=ldap`` with your SLAC Windows password. You can then use ``vault kv list -mount=secret rubin[/PATH]`` and ``vault kv get -mount=secret PATH/TO/SECRET`` to access secrets for which you have permission. diff --git a/docs/usdf-applications/curation/embargo-dataflow/index.rst b/docs/usdf-applications/curation/embargo-dataflow/index.rst index 53917fc..73a2b98 100644 --- a/docs/usdf-applications/curation/embargo-dataflow/index.rst +++ b/docs/usdf-applications/curation/embargo-dataflow/index.rst @@ -56,7 +56,7 @@ image data and how and where the automated process runs. The ``transfer_raw_zip`` Tool ----------------------------- -The ``transfer_raw_zip.py`` in https://github.comr/lsst-dm/transfer_embargo (branch tickets/DM-51619) is +The ``transfer_raw_zip.py`` in https://github.com/lsst-dm/transfer_embargo (branch tickets/DM-51619) is used to unembargo raw image data. This tool will unembargo the data: - Transfer from ``embargo`` S3 storage's ``rubin-summit`` bucket to the destination directory diff --git a/docs/usdf-applications/curation/fix-metadata-in-raw.rst b/docs/usdf-applications/curation/fix-metadata-in-raw.rst new file mode 100644 index 0000000..2d10d47 --- /dev/null +++ b/docs/usdf-applications/curation/fix-metadata-in-raw.rst @@ -0,0 +1,171 @@ +############################# +Fixing Metadata in Raw Images +############################# + +**Caution:** The fix described here actually happens in butler repos that have those problematic +raw images in their datastores. The raw images themselves are never altered. + +From time to time, there may be errors within obs headers, typical issues are: + +* Faults or bugs that cause images to be taken with ``can_see_sky=True`` or ``can_see_sky=None`` + when in fact it was not possible to see the sky. These images would normally have to wait for + the full embargo period before being copied out of embargo, which can significantly hamper the + Calibrations Team. + +* Wrong filter information in the image headers + +This results in wrong dimension records be ingested to the embargo butler repo. The process below +will allow the dimension records in butler to be fixed. The procedure to report and obtain a fix +for these issues is as follows: + +#. An issue is usually identified and reported by someone who notice the problem. Data Curation + team should file a JIRA ticket to the OBS team to report the problem, and a JIRA ticket to + the DM Data Engineering team to obtain a fix in ``obs_lsst`` (usually in ``lsstCam.py``). + Information in thos ticket usually include observation ids (``day_obs`` and ``seq_num``) of + the images that were taken with incorrect metadata. + +#. The Data Engineering team usually confirms this issue and provides a fix. In some simple cases, + The Data Curation team can also fix those issues by **Checking** what the header should be + changed to (It is best to check with experts). For ``can_see_sky``, + check that the images were in fact taken inside the dome, which may require experts spot-checking + images in RubinTV to ensure that they don't have astronomical objects. + +#. **Skip this step if the Data Engineering team will provide a fix in ``obs_lsst``**: Add the + ``day_obs`` and ``seq_num`` ranges to the ``fix_ranges`` dictionary in the ``obs_lsst`` + `translator code `__. + Get this reviewed and merged to the ``main`` branch. + +The following uses butler repo ``embargo`` as an example, but the same process applies to any +butler repo that has the affected images in its datastore. + +#. Setup the local version of ``obs_lsst`` + + .. code:: bash + + # update environment + source /cvmfs/sw.lsst.eu/almalinux-x86_64/lsst_distrib/w_2026_17/loadLSST.sh + setup lsst_distrib + + # Add obs_lsst to eups environment + git clone https://github.com/lsst/obs_lsst.git && cd obs_lsst + git checkout main + + scons + + # add to eups + setup -r . + + # verify local obs_lsst, should say LOCAL:{path} + eups list --setup obs_lsst + +#. Ensure s3 and database credentials are set + + .. code:: bash + + # s3 credentials are located in $HOME/.lsst/aws-credentials.ini + export AWS_PROFILE=embargo_rw + + # postgres credentials + export PGPASSFILE=$HOME/.lsst/postgres-credentials.txt + export PGUSER=rubin + + +#. Update the butler dimension records (``echo`` is for a dry-run) + + .. code:: bash + + # Make sure SHELL is bash + # Remove echo when ready + day_obs=20250715 + REPO=embargo + for seq_num in $(seq -w 000205 001218); do + # Old command + #echo butler ingest-raws $REPO --regex '[W\d]\d.fits$' -t direct -j 10 --output-run LSSTCam/raw/all --update-records s3://embargo@rubin-summit/LSSTCam/${day_obs}/MC_O_${day_obs}_${seq_num}/; + # New command + echo butler update-exposures-from-raws $REPO LSSTCam --where "exposure=${day_obs}${seq_num: -5}" + done + + # Verifying the ingest + # Exposure sequence numbers (XXXXX below) are 5 digits and need to be padded with zeroes + butler query-datasets --collections LSSTCam/raw/all $REPO --where "instrument='LSSTCam' and exposure=YYYYMMDDXXXXX" + + + * ``--regex``: Exclude guiders that cannot be ingested using ``ingest-raws`` + + * ``-t direct``: Mandatory because raws do not live in the butler repo's datastore + + * ``-j 10``: Speed things up a bit + + * ``--output-run``: the default, but it's there for safety + + * ``--update-records``: is required to fix headers + +If the raw images haven't been unembargoed, the following process can manually unembargo the +data after the metadata is fixed. + +#. Login to the ``usdf-embargo-dmz`` vcluster using ``https://k8s.slac.stanford.edu/usdf-embargo-dmz`` and execute the ``catchup-raw.sh`` script from ``slaclab/usdf-embargo-deploy/kubernetes/overlays/transfer`` with the date. This script deploys a Kubernetes Job to the cluster that re-scans the exposure dimension and unembargo the data. + + .. code:: bash + + git clone https://github.com/slaclab/usdf-embargo-deploy + + # script is in this branch + git checkout tickets/DM-51916 + + cd kubernetes/overlays/transfer + + bash ./catchup-raw.sh + +If the raw images has been unembargoed, then the downstream butler repos (USDF ``main`` bulter +and FrDF and UKDF butlers) will need those updated butler dimension records as well.i + +Each exposure has a corresponding Rucio dataset (e.g. ``raw:Dataset/LSSTCam/raw/Obs/20250715/MC_O_20250715_000205``). +This dataset has a .zip file that contains the raw .fits files and json files, and a _dimensions.yaml +file that contains the (incorrect) dimension records for that exposure. We will need to create +a new _dimensions.1.yaml file with the correct dimension records, and add it to the same Rucio dataset. +To do so: + +#. Download and run `create_rawdata_dimensions_yaml.py `_ + + .. code:: bash + + git clone https://github.com/lsst-dm/data-curation-tools.git && cd data-curation-tools/bin.src + + day_obs=20250715 + seq_num=000205 + + python create_rawdata_dimensions_yaml.py ${day_obs}${seq_num: -5} + # This will create a file named MC_O_20250715_000205_dimensions.1.yaml in the current directory. + +#. Upload the new dimensions yaml file to Rucio + + .. code:: bash + + day_obs=20250715 + seq_num=000205 + + # instrCode="MC" + # controller="O" + # obsId=${instrCode}_${controller} + obsId="MC_O" # !!! sometimes this could be "MC_C". Check what was created above. + + newDimensionsYaml="${obsId}_${day_obs}_${seq_num}_dimensions.1.yaml" + didName="LSSTCam/${day_obs}/${newDimensionsYaml}" + obsDataset="raw:Dataset/LSSTCam/raw/Obs/${day_obs}/${obsId}_${day_obs}_${seq_num}" + + # Note "rucio upload" (below) will not work unless you login to `rubinmgr` and temporarily change + # the permission of /sdf/data/rubin/lsstdata/offline/instrument/20250715 to world-writeable (777). + # Remember to change the permission back after running the rucio upload command in the next step. + + echo rucio upload --rse SLAC_RAW_DISK --scope raw --dataset $didName $newDimensionsYaml + + echo rucio did update --open $obsDataset + echo rucio did content add --to-did $obsDataset $lfn + echo rucio did metadata set --key SafeCopies --value "" $obsDataset + echo rucio did metadata set --key arcBackup --value SLAC_RAW_DISK_BKUP:need $obsDataset + echo rucio did update --close $obsDataset + +The present of this new _dimensions.1.yaml is an indication that the butler dimesion records has been +updated. If a DF hasn't ingested the raw data yet, it can directly use the new _dimensions.1.yaml to +ingest the data. If the raw data has been ingested, then the DF will need to update the dimension +records in their butler repos using a similar process as described above for the embargo butler repo. diff --git a/docs/usdf-applications/curation/manual-unembargo.rst b/docs/usdf-applications/curation/manual-unembargo.rst deleted file mode 100644 index 9269f00..0000000 --- a/docs/usdf-applications/curation/manual-unembargo.rst +++ /dev/null @@ -1,80 +0,0 @@ -################ -Manual Unembargo -################ - -From time to time, there may be errors within obs headers, such as faults or bugs that cause images to be taken with ``can_see_sky=True`` or ``can_see_sky=None`` when in fact it was not possible to see the sky. -These images would normally have to wait for the full embargo period before being copied out of embargo, which can significantly hamper the Calibrations Team. -The process below will allow these images to be unembargoed manually. - -#. Ensure that a DM Jira ticket has been filed with the observation ids (``day_obs`` and ``seq_num``) of the images that were taken with incorrect metadata. - -#. **Check** what the header should be changed to. This may involve certification from experts. For ``can_see_sky``, check that the images were in fact taken inside the dome, which may require experts spot-checking images in RubinTV to ensure that they don't have astronomical objects. - -#. Add the ``day_obs`` and ``seq_num`` ranges to the ``fix_ranges`` dictionary in the ``obs_lsst`` `translator code `__. Get this reviewed and merged to ``main``. - -#. Setup the local version of ``obs_lsst`` - - .. code:: bash - - # update environment - source /cvmfs/sw.lsst.eu/almalinux-x86_64/lsst_distrib/w_2025_29/loadLSST.sh - setup lsst_distrib - - # Add obs_lsst to eups environment - git clone https://github.com/lsst/obs_lsst.git && cd obs_lsst - git checkout main - - scons - - # add to eups - setup -r . - - # verify local obs_lsst, should say LOCAL:{path} - eups list --setup obs_lsst - -#. Ensure s3 and database credentials are set - - .. code:: bash - - # s3 credentials are located in $HOME/.lsst/aws-credentials.ini - export AWS_PROFILE=embargo_rw - - # postgres credentials - export PGPASSFILE=$HOME/.lsst/postgres-credentials.txt - export PGUSER=rubin - - -#. Reingest the raw images (``echo`` is for a dry-run) - - .. code:: bash - - # Remove echo when ready - day_obs=20250715; for seq_num in $(seq -w 000205 001218); do echo butler ingest-raws embargo --regex '[W\d]\d.fits$' -t direct -j 10 --output-run LSSTCam/raw/all --update-records s3://embargo@rubin-summit/LSSTCam/${day_obs}/MC_O_${day_obs}_${seq_num}/; done - - # Verifying the ingest - # Exposure sequence numbers are 5 digits and need to be padded with zeroes - butler query-datasets --collections LSSTCam/raw/all embargo --where "instrument='LSSTCam' and exposure=YYYYMMDDXXXXX" - - - * ``--regex``: Exclude guiders that cannot be ingested using ``ingest-raws`` - - * ``-t direct``: Mandatory because raws do not live in the butler repo's datastore - - * ``-j 10``: Speed things up a bit - - * ``--output-run``: the default, but it's there for safety - - * ``--update-records``: is required to fix headers - -#. Login to the ``usdf-embargo-dmz`` vcluster using ``https://k8s.slac.stanford.edu/usdf-embargo-dmz`` and execute the ``catchup-raw.sh`` script from ``slaclab/usdf-embargo-deploy/kubernetes/overlays/transfer`` with the date. This script deploys a Kubernetes Job to the cluster that re-scans the exposure dimension and unembargo the data. - - .. code:: bash - - git clone https://github.com/slaclab/usdf-embargo-deploy - - # script is in this branch - git checkout tickets/DM-51916 - - cd kubernetes/overlays/transfer - - bash ./catchup-raw.sh diff --git a/docs/usdf-applications/data-curation.rst b/docs/usdf-applications/data-curation.rst index fdfa217..ac9408d 100644 --- a/docs/usdf-applications/data-curation.rst +++ b/docs/usdf-applications/data-curation.rst @@ -11,6 +11,6 @@ Data Curation curation/embargo-dataflow/index curation/embargo-transfer/index curation/lfa-replication/index - curation/manual-unembargo.rst + curation/fix-metadata-in-raw.rst curation/prompt-output-unembargo/index curation/s3-file-notifications/index