Skip to content

Add oco3 SAM COGs dataset config#333

Closed
abarciauskas-bgse wants to merge 12 commits into
mainfrom
add-oco3-cogs
Closed

Add oco3 SAM COGs dataset config#333
abarciauskas-bgse wants to merge 12 commits into
mainfrom
add-oco3-cogs

Conversation

@abarciauskas-bgse

Copy link
Copy Markdown
Contributor

No description provided.

@github-actions

github-actions Bot commented Mar 14, 2025

Copy link
Copy Markdown

Workflow Status

Starting workflow... View action run

Collection Publication Status

➡️ oco3-co2-sam-l3-cogs: Successfully published ✅

@abarciauskas-bgse

abarciauskas-bgse commented Mar 14, 2025

Copy link
Copy Markdown
Contributor Author

@botanical @anayeaye the action to publish collection to staging appears successful but I get a not found response for https://staging.openveda.cloud/api/stac/collections/oco3-sam-cogs - is it possible something is silently failing in the DAG or there is a different STAC endpoint it is publishing to? Where can I troubleshoot the Airflow DAG workflow?

@botanical

Copy link
Copy Markdown
Member

@abarciauskas-bgse

Copy link
Copy Markdown
Contributor Author

I can see and (I think) understand the error, thank you @botanical !

@smohiudd

smohiudd commented Mar 14, 2025

Copy link
Copy Markdown
Contributor

The 422 error is 'type': 'missing', 'loc': ('body', 'assets'), 'msg': 'Field required'

I believe this has to do with how we're validating the collection schema here. This could be a pydantic v2 thing where we need to do assets: Optional[Dict]=None so it doesn't treat it as a required field.

@abarciauskas-bgse

Copy link
Copy Markdown
Contributor Author

I added more files (100/4990) to the staging bucket and some items are failing (or at least one with index -9223372036854775808 is out of bounds for axis 0 with size 11). I look into this more when I have the chance.

It took me a while to track down how to get the logs for the build_stac_item task but I finally figured it out by reverse engineering the link Jennifer shared (https://sm2a.staging.openveda.cloud/dags/veda_dataset_pipeline/grid?run_id=veda_dataset_pipeline-193bce76-2d7e-42d5-94f0-fc162e5bfc53&tab=logs&dag_run_id=veda_dataset_pipeline-193bce76-2d7e-42d5-94f0-fc162e5bfc53&task_id=build_stac_task&map_index=0) and then downloading the logs identified by the ValueError: Some items failed to be processed. Failures logged here: s3://veda-tf-state-shared-smce/events/oco3-sam-cogs/dead_letter_events/build_stac_failed_c4de815c-6757-4f0a-9fdf-f33c16b7d169.json message.

Could we include a link to the run (e.g. https://sm2a.staging.openveda.cloud/dags/veda_dataset_pipeline/grid?run_id=veda_dataset_pipeline-193bce76-2d7e-42d5-94f0-fc162e5bfc53) in the CI output? Or is it already there and I could not find it?

Also curious how to navigate to the logs for a specific task - I generated the URL from inspecting the URL Jennifer shared.

@abarciauskas-bgse

Copy link
Copy Markdown
Contributor Author

I think that there is only one file failing but that is causing the other 99 not to be published. Is there a way to report on failures but still publish the rest of the items?

(I'm going to try to remove just that one file for now)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants