Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
b4ec234
Add ability to open ee.Image objects
tylere Nov 16, 2024
afaa9c5
End docstring with punctuation
tylere Nov 18, 2024
c2e2a9f
Expose __version__ as module attribute
tylere Jan 2, 2025
fd9e20a
Add install docs
jdbcode Jan 7, 2025
5fac43a
Account for mask byte in chunk size calculation
jdbcode Jan 7, 2025
3c34816
ignore temp directory
tylere Feb 3, 2025
4ba0259
Merge branch 'main' into simplify_proj_params
tylere Feb 3, 2025
c107bca
Add pixi config files
tylere Feb 3, 2025
0dbd250
limit to python<3.13; add proj, gdal as dependencies
tylere Feb 3, 2025
fd2b255
Use grid params (crs, crs_transform, shape_2d)
tylere Feb 26, 2025
2cda1b9
Merge branch 'simplify_proj_params' into simplify_proj_params
tylere Feb 26, 2025
dab7297
Remove helper placeholders
tylere Feb 28, 2025
68d7a46
Update README code and add tests
tylere Mar 1, 2025
97a3814
Double to single quotes.
tylere Mar 11, 2025
c4da199
Remove unnecessary import and print
tylere Mar 11, 2025
4566716
Clean up TransformType type definition
tylere Mar 11, 2025
5b5a75f
Add type hints.
tylere Mar 11, 2025
6419d93
Revert .gitattributes change
tylere Mar 11, 2025
b52f630
Revert .gitignore changes
tylere Mar 11, 2025
82407de
Revert .vscode/settings.json
tylere Mar 11, 2025
3044fce
Remove pixi config
tylere Mar 17, 2025
0228871
Remove extra print statement
tylere Mar 18, 2025
9770fae
Change strings from double to single quotes
tylere Mar 18, 2025
68c1905
Switch back to double quotes to enclose single quote
tylere Mar 18, 2025
d661309
Remove match/case syntax
tylere Mar 18, 2025
addc823
refactor: Use `affine` directly instead of `rasterio.transform.Affine`
jdbcode Sep 25, 2025
4c0a799
refactor: Make `shapely` a require dependency
jdbcode Sep 25, 2025
e32843e
refactor: Add support for accepting `affine.Affine` object as `crs_tr…
jdbcode Sep 26, 2025
1d866b8
refactor: Make `crs_transform` an attribute of `self` for reuse
jdbcode Sep 26, 2025
cf48661
fix: Handle negative and positive y scale in `fit_geometry`
jdbcode Sep 26, 2025
0a54b45
fix: Fix tests so that tuple is used (requried type for crs_transform…
jdbcode Sep 26, 2025
c215fac
refactor: Update readme to use tuple for shape and transform, add det…
jdbcode Sep 29, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -130,4 +130,4 @@ cython_debug/
.DS_Store

# pixi environments
.pixi
.pixi
205 changes: 164 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@

_An Xarray extension for Google Earth Engine._

Xee bridges the gap between Google Earth Engine's massive data catalog and the scientific Python ecosystem. It provides a custom Xarray backend that allows you to open any `ee.ImageCollection` as if it were a local `xarray.Dataset`. Data is loaded lazily and in parallel, enabling you to work with petabyte-scale archives of satellite and climate data using the power and flexibility of Xarray and its integrations with libraries like Dask.

[![image](https://img.shields.io/pypi/v/xee.svg)](https://pypi.python.org/pypi/xee)
[![image](https://static.pepy.tech/badge/xee)](https://pepy.tech/project/xee)
[![Conda
Expand Down Expand Up @@ -32,85 +34,206 @@ Then, authenticate Earth Engine:
earthengine authenticate --quiet
```

Now, in your Python environment, make the following imports:
Now, in your Python environment, make the following imports and initialize the Earth Engine client with your project ID. Using the high-volume API endpoint is recommended.

```python
import ee
import xarray
import xarray as xr
from xee import helpers
import shapely

ee.Initialize(
project='PROJECT-ID', # Replace with your project ID
opt_url='https://earthengine-highvolume.googleapis.com'
)
```

Next, specify your EE-registered cloud project ID and initialize the EE client
with the high volume API:
### Specifying the Output Grid

To open a dataset, you must specify the desired output pixel grid. The `xee.helpers` module simplifies this process by providing several convenient workflows, summarized below.

| Goal | Method | When to Use |
| :--- | :--- | :--- |
| **Match Source Grid** | Use `helpers.extract_grid_params()` to get the parameters from an EE object. | When you want the data in its original, default projection and scale. |
| **Fit Area to a Shape** | Use `helpers.fit_geometry()` with the `geometry` and `grid_shape` arguments. | When you need a consistent output array size (e.g., for ML models) and the exact pixel size is less important. |
| **Fit Area to a Scale** | Use `helpers.fit_geometry()` with the `geometry` and `grid_scale` arguments. | When the specific resolution (e.g., 30 meters, 0.01 degrees) is critical for your analysis. |
| **Manual Override** | Pass `crs`, `crs_transform`, and `shape_2d` directly to `xr.open_dataset`. | For advanced cases where you already have an exact grid definition. |

> **Important Note on Units:** All grid parameter values must be in the units of the specified Coordinate Reference System (`crs`).
> * For a geographic CRS like `'EPSG:4326'`, the units are in **degrees**.
> * For a projected CRS like `'EPSG:32610'` (UTM), the units are in **meters**.
> This applies to the translation values in `crs_transform` and the pixel sizes in `grid_scale`.

### Usage Examples

Here are common workflows for opening datasets with `xee`, corresponding to the methods in the table above.

#### Match Source Grid

This is the simplest case, using `helpers.extract_grid_params` to match the dataset's default grid.

```python
ee.Initialize(
project='my-project-id'
opt_url='https://earthengine-highvolume.googleapis.com')
ic = ee.ImageCollection('ECMWF/ERA5_LAND/MONTHLY_AGGR')
grid_params = helpers.extract_grid_params(ic)
ds = xr.open_dataset(ic, engine='ee', **grid_params)
```

Open any Earth Engine ImageCollection by specifying the Xarray engine as `'ee'`:
#### Fit Area to a Shape

Define a grid over an area of interest by specifying the number of pixels. `helpers.fit_geometry` will calculate the correct `crs_transform`.

```python
ds = xarray.open_dataset('ee://ECMWF/ERA5_LAND/HOURLY', engine='ee')
aoi = shapely.geometry.box(113.33, -43.63, 153.56, -10.66) # Australia
grid_params = helpers.fit_geometry(
geometry=aoi,
grid_crs='EPSG:4326',
grid_shape=(256, 256)
)

ds = xr.open_dataset('ee://ECMWF/ERA5_LAND/MONTHLY_AGGR', engine='ee', **grid_params)
```

Open all bands in a specific projection (not the Xee default):
#### Fit Area to a Scale (Resolution)

> **A Note on `grid_scale` and Y-Scale Orientation**
> When using `fit_geometry` with `grid_scale`, you are defining both the pixel size and the grid's orientation via the sign of the y-scale.
> * A **negative `y_scale`** (e.g., `(10000, -10000)`) is the standard for "north-up" satellite and aerial imagery, creating a grid with a **top-left** origin.
> * A **positive `y_scale`** (e.g., `(10000, 10000)`) is used by some datasets and creates a grid with a **bottom-left** origin.
> You may need to inspect your source dataset's projection information to determine the correct sign to use. If you use `grid_shape`, a standard negative y-scale is assumed.

The following example defines a grid over an area by specifying the pixel size in meters. `fit_geometry` will reproject the geometry and calculate the correct `shape_2d`.

```python
ds = xarray.open_dataset('ee://ECMWF/ERA5_LAND/HOURLY', engine='ee',
crs='EPSG:4326', scale=0.25)
aoi = shapely.geometry.box(113.33, -43.63, 153.56, -10.66) # Australia
grid_params = helpers.fit_geometry(
geometry=aoi,
geometry_crs='EPSG:4326', # CRS of the input geometry
grid_crs='EPSG:32662', # Target CRS in meters (Plate Carrée)
grid_scale=(10000, -10000) # Define a 10km pixel size
)

ds = xr.open_dataset('ee://ECMWF/ERA5_LAND/MONTHLY_AGGR', engine='ee', **grid_params)
```

Open an ImageCollection (maybe, with EE-side filtering or processing):
#### Open a Custom Region at Source Resolution

This workflow is ideal for analyzing a specific area while maintaining the dataset's original resolution.

```python
ic = ee.ImageCollection('ECMWF/ERA5_LAND/HOURLY').filterDate(
'1992-10-05', '1993-03-31')
ds = xarray.open_dataset(ic, engine='ee', crs='EPSG:4326', scale=0.25)
# 1. Get the original grid parameters from the target ImageCollection
ic = ee.ImageCollection('ECMWF/ERA5_LAND/MONTHLY_AGGR')
source_params = helpers.extract_grid_params(ic)

# 2. Extract the source CRS and scale
source_crs = source_params['crs']
source_transform = source_params['crs_transform']
source_scale = (source_transform[0], source_transform[4]) # (x_scale, y_scale)

# 3. Use the source parameters to fit the grid to a specific geometry
aoi = shapely.geometry.box(113.33, -43.63, 153.56, -10.66) # Australia
final_grid_params = helpers.fit_geometry(
geometry=aoi,
geometry_crs='EPSG:4326',
grid_crs=source_crs,
grid_scale=source_scale
)

# 4. Open the dataset with the final, combined parameters
ds = xr.open_dataset(ic, engine='ee', **final_grid_params)
```

Open an ImageCollection with a specific EE projection or geometry:
#### Manual Override

For use cases where you know the exact grid parameters, you can provide them directly.

```python
ic = ee.ImageCollection('ECMWF/ERA5_LAND/HOURLY').filterDate(
'1992-10-05', '1993-03-31')
leg1 = ee.Geometry.Rectangle(113.33, -43.63, 153.56, -10.66)
ds = xarray.open_dataset(
ic,
# Manually define a 512x512 pixel grid with 1-degree pixels in EPSG:4326
manual_crs = 'EPSG:4326'
manual_transform = (0.1, 0, -180.05, 0, -0.1, 90.05) # Values are in degrees
manual_shape = (512, 512)

ds = xr.open_dataset(
'ee://ECMWF/ERA5_LAND/MONTHLY_AGGR',
engine='ee',
projection=ic.first().select(0).projection(),
geometry=leg1
crs=manual_crs,
crs_transform=manual_transform,
shape_2d=manual_shape,
)
```

Open multiple ImageCollections into one `xarray.Dataset`, all with the same
projection:
#### Open a Pre-Processed ImageCollection

A key feature of Xee is its ability to open a computed `ee.ImageCollection`. This allows you to leverage Earth Engine's powerful server-side processing for tasks like filtering, band selection, and calculations before loading the data into Xarray.

```python
ds = xarray.open_mfdataset(
['ee://ECMWF/ERA5_LAND/HOURLY', 'ee://NASA/GDDP-CMIP6'],
engine='ee', crs='EPSG:4326', scale=0.25)
# Define an AOI as a shapely object for the helper function
sf_aoi_shapely = shapely.geometry.Point(-122.4, 37.7).buffer(0.2)
# Create an ee.Geometry from the shapely object for server-side filtering
coords = list(sf_aoi_shapely.exterior.coords)
sf_aoi_ee = ee.Geometry.Polygon(coords)

# Define a function to calculate NDVI and add it as a band
def add_ndvi(image):
# Landsat 9 SR bands: NIR = B5, Red = B4
ndvi = image.normalizedDifference(['SR_B5', 'SR_B4']).rename('NDVI')
return image.addBands(ndvi)

# Build the pre-processed collection
processed_collection = (ee.ImageCollection('LANDSAT/LC09/C02/T1_L2')
.filterDate('2024-06-01', '2024-09-01')
.filterBounds(sf_aoi_ee)
.map(add_ndvi)
.select(['NDVI']))

# Define the output grid using a helper
grid_params = helpers.fit_geometry(
geometry=sf_aoi_shapely,
grid_crs='EPSG:32610', # Target CRS in meters (UTM Zone 10N)
grid_scale=(30, -30) # Use Landsat's 30m resolution
)

# Open the fully processed collection
ds = xr.open_dataset(processed_collection, engine='ee', **grid_params)
```

Open a single Image by passing it to an ImageCollection:
#### Open a single Image

The `helpers` work the same way for a single `ee.Image`.

```python
i = ee.ImageCollection(ee.Image('LANDSAT/LC08/C02/T1_TOA/LC08_044034_20140318'))
ds = xarray.open_dataset(i, engine='ee')
img = ee.Image('ECMWF/ERA5_LAND/MONTHLY_AGGR/202501')
grid_params = helpers.extract_grid_params(img)
ds = xr.open_dataset(img, engine='ee', **grid_params)
```

#### Visualize a Single Time Slice

Once you have your `xarray.Dataset`, you can visualize a single time slice of a variable to verify the results. This requires the `matplotlib` library, which is an optional dependency.

If you don't have it installed, you can add it with pip:

```shell
pip install matplotlib
```

Open any Earth Engine ImageCollection to match an existing transform:
Xarray's plotting functions expect dimensions in `(y, x)` order for 2D plots. Since the data is in `(x, y)` order, we use `.transpose()` to swap the axes for correct visualization.

```python
raster = rioxarray.open_rasterio(...) # assume crs + transform is set
ds = xr.open_dataset(
'ee://ECMWF/ERA5_LAND/HOURLY',
engine='ee',
geometry=tuple(raster.rio.bounds()), # must be in EPSG:4326
projection=ee.Projection(
crs=str(raster.rio.crs), transform=raster.rio.transform()[:6]
),

# First, open a dataset using one of the methods above
aoi = shapely.geometry.box(113.33, -43.63, 153.56, -10.66) # Australia
grid_params = helpers.fit_geometry(
geometry=aoi,
grid_crs='EPSG:4326',
grid_shape=(256, 256)
)
ds = xr.open_dataset('ECMWF/ERA5_LAND/MONTHLY_AGGR', engine='ee', **grid_params)

# Select the 2m air temperature for the first time step
temp_slice = ds['temperature_2m'].isel(time=0)

# Transpose from (x, y) to (y, x) for correct plotting orientation and plot
temp_slice.transpose('y', 'x').plot()
```

See [examples](https://github.com/google/Xee/tree/main/examples) or
Expand Down
6 changes: 5 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name = "xee"
dynamic = ["version"]
description = "A Google Earth Engine extension for Xarray."
readme = "README.md"
requires-python = ">=3.8"
requires-python = ">=3.8,<3.13"
Comment thread
jdbcode marked this conversation as resolved.
license = {text = "Apache-2.0"}
authors = [
{name = "Google LLC", email = "noreply@google.com"},
Expand All @@ -28,6 +28,7 @@ dependencies = [
"earthengine-api>=0.1.374",
"pyproj",
"affine",
"shapely",
]

[project.entry-points."xarray.backends"]
Expand Down Expand Up @@ -65,5 +66,8 @@ preview = true
pyink-indentation = 2
pyink-use-majority-quotes = true

[tool.setuptools]
packages = ["xee"]

[tool.setuptools_scm]
fallback_version = "9999"
Loading