Skip to content
Merged
15 changes: 15 additions & 0 deletions conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,18 @@

# Remove this later after we fix documenteer
mermaid_version = "11.9.0"

# Ignore links

linkcheck_ignore = [
r'https://grafana.slac.stanford.edu.+',
r'https://.+.slac.stanford.edu.+',
r'https://confluence.lsstcorp.org.+',
r'https://github.com/.+',
r'https://docs.redhat.com.+',
r'https://slactraining.skillport.com.+',
]

# disable TLS verification

tls_verify = False
6 changes: 3 additions & 3 deletions docs/developers/kubernetes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ SLAC Kubernetes Overview

Operations activities will be carried out at the SLAC US Data Facility (USDF). Where possible, all USDF services will reside on top of a kubernetes platform.

SLAC operates a single large kubernetes cluster. The benefits of this are with increased scale (sharing of resources) and reduced management overhead. We run 'vanilla' kubernetes, deployed via `kubeadm <https://github.com/kubernetes/kubeadm>`__. On top of this, to provide segregation and project control we use `loft.sh's vcluster <https://github.com/loft-sh/vcluster>`__. The latter provides a virtual kubernetes cluster from which we can provide a similar experience to `openshift's projects <https://docs.openshift.com/container-platform/4.6/applications/projects/working-with-projects.html>`__ or `GKE's projects and folders <https://cloud.google.com/resource-manager/docs/cloud-platform-resource-hierarchy>`__.
SLAC operates a single large kubernetes cluster. The benefits of this are with increased scale (sharing of resources) and reduced management overhead. We run 'vanilla' kubernetes, deployed via `kubeadm <https://github.com/kubernetes/kubeadm>`__. On top of this, to provide segregation and project control we use `loft.sh's vcluster <https://github.com/loft-sh/vcluster>`__. The latter provides a virtual kubernetes cluster from which we can provide a similar experience to `openshift's projects <https://docs.redhat.com/en/documentation/openshift_container_platform/4.6/html/applications/projects#working-with-projects>`__ or `GKE's projects and folders <https://cloud.google.com/resource-manager/docs/cloud-platform-resource-hierarchy>`__.


SLAC Virtual Clusters, aka "Projects"
Expand Down Expand Up @@ -61,7 +61,7 @@ Connecting and Authenticating
Generically:

- Determine the 'project' that you wish to access, eg usdf-butler
- Go to https://k8s.slac.stanford.edu/<project>
- Go to https://k8s.slac.stanford.edu/usdf-butler
- Click 'Sign-In' to begin the authentication procedure
- Enter your SLAC credentials into the login page, and possibly your Duo 2Factor if requested. This step may automatically skip if you already have valid single sign on credentials in place already.
- Click on 'Grant Access' to agree to register
Expand All @@ -74,6 +74,6 @@ We currently provide kubernetes API access without the need for VPNs etc. i.e. y
Miscellaneous
=============

- if you encounter an error like "Unable to connect to the server: No valid id-token, and cannot refresh without refresh-token" when running your kubectl, you will need to log back in via https://k8s.slac.stanford.edu/<project>, re-executing the commands in the second box. This is because our OIDC (dex) implementation does not and cannot generate refresh tokens from our SAML2 (windows ADFS) backend. (Actually, only the ``set-credentials`` command is needed, but it doesn't hurt to execute them all.)
- if you encounter an error like "Unable to connect to the server: No valid id-token, and cannot refresh without refresh-token" when running your kubectl, you will need to log back in via ``https://k8s.slac.stanford.edu/<project>``, re-executing the commands in the second box. This is because our OIDC (dex) implementation does not and cannot generate refresh tokens from our SAML2 (windows ADFS) backend. (Actually, only the ``set-credentials`` command is needed, but it doesn't hurt to execute them all.)

Kubernetes secrets are usually held in Vault (vault.slac.stanford.edu). The vault command is available on USDF interactive nodes. You may need to activate it with ``module load vault``. Then login using the commands ``export VAULT_ADDR=https://vault.slac.stanford.edu; vault login -method=ldap`` with your SLAC Windows password. You can then use ``vault kv list -mount=secret rubin[/PATH]`` and ``vault kv get -mount=secret PATH/TO/SECRET`` to access secrets for which you have permission.
2 changes: 1 addition & 1 deletion docs/usdf-applications/curation/embargo-dataflow/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ image data and how and where the automated process runs.
The ``transfer_raw_zip`` Tool
-----------------------------

The ``transfer_raw_zip.py`` in https://github.comr/lsst-dm/transfer_embargo (branch tickets/DM-51619) is
The ``transfer_raw_zip.py`` in https://github.com/lsst-dm/transfer_embargo (branch tickets/DM-51619) is
used to unembargo raw image data. This tool will unembargo the data:

- Transfer from ``embargo`` S3 storage's ``rubin-summit`` bucket to the destination directory
Expand Down
171 changes: 171 additions & 0 deletions docs/usdf-applications/curation/fix-metadata-in-raw.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
#############################
Fixing Metadata in Raw Images
#############################

**Caution:** The fix described here actually happens in butler repos that have those problematic
raw images in their datastores. The raw images themselves are never altered.

From time to time, there may be errors within obs headers, typical issues are:

* Faults or bugs that cause images to be taken with ``can_see_sky=True`` or ``can_see_sky=None``
when in fact it was not possible to see the sky. These images would normally have to wait for
the full embargo period before being copied out of embargo, which can significantly hamper the
Calibrations Team.

* Wrong filter information in the image headers

This results in wrong dimension records be ingested to the embargo butler repo. The process below
will allow the dimension records in butler to be fixed. The procedure to report and obtain a fix
for these issues is as follows:

#. An issue is usually identified and reported by someone who notice the problem. Data Curation
team should file a JIRA ticket to the OBS team to report the problem, and a JIRA ticket to
the DM Data Engineering team to obtain a fix in ``obs_lsst`` (usually in ``lsstCam.py``).
Information in thos ticket usually include observation ids (``day_obs`` and ``seq_num``) of
the images that were taken with incorrect metadata.

#. The Data Engineering team usually confirms this issue and provides a fix. In some simple cases,
The Data Curation team can also fix those issues by **Checking** what the header should be
changed to (It is best to check with experts). For ``can_see_sky``,
check that the images were in fact taken inside the dome, which may require experts spot-checking
images in RubinTV to ensure that they don't have astronomical objects.

#. **Skip this step if the Data Engineering team will provide a fix in ``obs_lsst``**: Add the
``day_obs`` and ``seq_num`` ranges to the ``fix_ranges`` dictionary in the ``obs_lsst``
`translator code <https://github.com/lsst/obs_lsst/blob/main/python/lsst/obs/lsst/translators/lsstCam.py#L164-L171>`__.
Get this reviewed and merged to the ``main`` branch.

The following uses butler repo ``embargo`` as an example, but the same process applies to any
butler repo that has the affected images in its datastore.

#. Setup the local version of ``obs_lsst``

.. code:: bash

# update environment
source /cvmfs/sw.lsst.eu/almalinux-x86_64/lsst_distrib/w_2026_17/loadLSST.sh
setup lsst_distrib

# Add obs_lsst to eups environment
git clone https://github.com/lsst/obs_lsst.git && cd obs_lsst
git checkout main

scons

# add to eups
setup -r .

# verify local obs_lsst, should say LOCAL:{path}
eups list --setup obs_lsst

#. Ensure s3 and database credentials are set

.. code:: bash

# s3 credentials are located in $HOME/.lsst/aws-credentials.ini
export AWS_PROFILE=embargo_rw

# postgres credentials
export PGPASSFILE=$HOME/.lsst/postgres-credentials.txt
export PGUSER=rubin


#. Update the butler dimension records (``echo`` is for a dry-run)

.. code:: bash

# Make sure SHELL is bash
# Remove echo when ready
day_obs=20250715
REPO=embargo
for seq_num in $(seq -w 000205 001218); do
# Old command
#echo butler ingest-raws $REPO --regex '[W\d]\d.fits$' -t direct -j 10 --output-run LSSTCam/raw/all --update-records s3://embargo@rubin-summit/LSSTCam/${day_obs}/MC_O_${day_obs}_${seq_num}/;
# New command
echo butler update-exposures-from-raws $REPO LSSTCam --where "exposure=${day_obs}${seq_num: -5}"
done

# Verifying the ingest
# Exposure sequence numbers (XXXXX below) are 5 digits and need to be padded with zeroes
butler query-datasets --collections LSSTCam/raw/all $REPO --where "instrument='LSSTCam' and exposure=YYYYMMDDXXXXX"


* ``--regex``: Exclude guiders that cannot be ingested using ``ingest-raws``

* ``-t direct``: Mandatory because raws do not live in the butler repo's datastore

* ``-j 10``: Speed things up a bit

* ``--output-run``: the default, but it's there for safety

* ``--update-records``: is required to fix headers

If the raw images haven't been unembargoed, the following process can manually unembargo the
data after the metadata is fixed.

#. Login to the ``usdf-embargo-dmz`` vcluster using ``https://k8s.slac.stanford.edu/usdf-embargo-dmz`` and execute the ``catchup-raw.sh`` script from ``slaclab/usdf-embargo-deploy/kubernetes/overlays/transfer`` with the date. This script deploys a Kubernetes Job to the cluster that re-scans the exposure dimension and unembargo the data.

.. code:: bash

git clone https://github.com/slaclab/usdf-embargo-deploy

# script is in this branch
git checkout tickets/DM-51916

cd kubernetes/overlays/transfer

bash ./catchup-raw.sh <YYYY-MM-DD>

If the raw images has been unembargoed, then the downstream butler repos (USDF ``main`` bulter
and FrDF and UKDF butlers) will need those updated butler dimension records as well.i

Each exposure has a corresponding Rucio dataset (e.g. ``raw:Dataset/LSSTCam/raw/Obs/20250715/MC_O_20250715_000205``).
This dataset has a .zip file that contains the raw .fits files and json files, and a _dimensions.yaml
file that contains the (incorrect) dimension records for that exposure. We will need to create
a new _dimensions.1.yaml file with the correct dimension records, and add it to the same Rucio dataset.
To do so:

#. Download and run `create_rawdata_dimensions_yaml.py <https://github.com/lsst-dm/data-curation-tools/blob/main/bin.src/create_rawdata_dimensions_yaml.py>`_

.. code:: bash

git clone https://github.com/lsst-dm/data-curation-tools.git && cd data-curation-tools/bin.src

day_obs=20250715
seq_num=000205

python create_rawdata_dimensions_yaml.py ${day_obs}${seq_num: -5}
# This will create a file named MC_O_20250715_000205_dimensions.1.yaml in the current directory.

#. Upload the new dimensions yaml file to Rucio

.. code:: bash

day_obs=20250715
seq_num=000205

# instrCode="MC"
# controller="O"
# obsId=${instrCode}_${controller}
obsId="MC_O" # !!! sometimes this could be "MC_C". Check what was created above.

newDimensionsYaml="${obsId}_${day_obs}_${seq_num}_dimensions.1.yaml"
didName="LSSTCam/${day_obs}/${newDimensionsYaml}"
obsDataset="raw:Dataset/LSSTCam/raw/Obs/${day_obs}/${obsId}_${day_obs}_${seq_num}"

# Note "rucio upload" (below) will not work unless you login to `rubinmgr` and temporarily change
# the permission of /sdf/data/rubin/lsstdata/offline/instrument/20250715 to world-writeable (777).
# Remember to change the permission back after running the rucio upload command in the next step.

echo rucio upload --rse SLAC_RAW_DISK --scope raw --dataset $didName $newDimensionsYaml

echo rucio did update --open $obsDataset
echo rucio did content add --to-did $obsDataset $lfn
echo rucio did metadata set --key SafeCopies --value "" $obsDataset
echo rucio did metadata set --key arcBackup --value SLAC_RAW_DISK_BKUP:need $obsDataset
echo rucio did update --close $obsDataset

The present of this new _dimensions.1.yaml is an indication that the butler dimesion records has been
updated. If a DF hasn't ingested the raw data yet, it can directly use the new _dimensions.1.yaml to
ingest the data. If the raw data has been ingested, then the DF will need to update the dimension
records in their butler repos using a similar process as described above for the embargo butler repo.
80 changes: 0 additions & 80 deletions docs/usdf-applications/curation/manual-unembargo.rst

This file was deleted.

2 changes: 1 addition & 1 deletion docs/usdf-applications/data-curation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,6 @@ Data Curation
curation/embargo-dataflow/index
curation/embargo-transfer/index
curation/lfa-replication/index
curation/manual-unembargo.rst
curation/fix-metadata-in-raw.rst
curation/prompt-output-unembargo/index
curation/s3-file-notifications/index
Loading