Icechunk is coming - versioned cloud-native Zarr, GDAL, and R

code
news
Author

Michael Sumner

Published

June 26, 2026

Icechunk is a transactional storage engine for Zarr and provides git-style versioning for large scientific datasets. This post is about Icechunk from a GDAL perspective and some impact that will have on the R community and its interactions with Zarr.

What is Icechunk?

Zarr is a chunked array format designed for cloud object storage. Icechunk is a Rust library that sits on top of Zarr and adds git-like snapshots combined with ACID transactions - (which means reliable and guaranteed read and write).

(From their docs) this provides:

  • Safety: Recover from corrupted or accidentally deleted data using version history
  • Consistency: Ensure readers always see complete, valid snapshots - never partial writes
  • Reproducibility: Reference any version of your data permanently via commits or tags

Typical usage involves the Python packages xarray and icechunk, but Rust Icechunk can be used directly, in combination with zarrs_icechunk, and now in an upcoming GDAL release 3.14 via the read-only Icechunk driver.

Please be aware that Icechunk in GDAL is very new, and not yet in an officially supported release but if you are interested to get involved it can be used today. (GDAL 3.14 is milestoned for sometime in November 2026).

What works in GDAL and R today

GDAL 3.14 ships a /vsiicechunk/ virtual filesystem and Icechunk driver, and will be available in gdalcubes, gdalraster, sf, terra, and vapour in various forms once that support lands in R packages.

The following examples can be run in this Linux docker container, for example:

docker pull ghcr.io/hypertidy/gdal-r-full:dev
docker run --rm -ti  ghcr.io/hypertidy/gdal-r-full:dev

(If that doesn’t work, or you want help with docker/GDAL/R just contact me directly)

Here is a single cli call that gives a GDAL multidimensional summary of an Icechunk store.

gdal mdim info \
  /vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk \
  --config AWS_NO_SIGN_REQUEST YES --config AWS_REGION us-west-2
Click here for the full mdim info output
Driver: Icechunk

Dimensions:
  Name (path)  Size      Type      Direction
  -----------  ----  ------------  ---------
  /init_time   3264
  /latitude     721  HORIZONTAL_Y
  /lead_time     61
  /longitude   1440  HORIZONTAL_X

Coordinates (indexing variables):
  Name (path)   Dimension    Type              Unit
  -----------  -----------  -------  ------------------------
  /init_time   (init_time)  Int64    seconds since 1970-01-01
  /latitude    (latitude)   Float64  degree_north
  /lead_time   (lead_time)  Float64  seconds
  /longitude   (longitude)  Float64  degree_east

Data variables:
                  Name (path)                   Type              Unit                    Shape             Chunk size
  -------------------------------------------  -------  ------------------------  ---------------------  -----------------

 (/init_time):
  /expected_forecast_length                    Float64  seconds                   [3264]                 [23360]
  /ingested_forecast_length                    Float64  seconds                   [3264]                 [23360]

 (/init_time, /lead_time):
  /valid_time                                  Int64    seconds since 1970-01-01  [3264, 61]             [23360, 61]

 (/init_time, /lead_time, /latitude, /longitude):
  /dew_point_temperature_2m                    Float32  degree_Celsius            [3264, 61, 721, 1440]  [1, 61, 241, 240]
  /downward_long_wave_radiation_flux_surface   Float32  W m-2                     [3264, 61, 721, 1440]  [1, 61, 241, 240]
  /downward_short_wave_radiation_flux_surface  Float32  W m-2                     [3264, 61, 721, 1440]  [1, 61, 241, 240]
  /geopotential_height_500hpa                  Float32  m                         [3264, 61, 721, 1440]  [1, 61, 241, 240]
  /geopotential_height_850hpa                  Float32  m                         [3264, 61, 721, 1440]  [1, 61, 241, 240]
  /geopotential_height_925hpa                  Float32  m                         [3264, 61, 721, 1440]  [1, 61, 241, 240]
  /precipitation_surface                       Float32  kg m-2 s-1                [3264, 61, 721, 1440]  [1, 61, 241, 240]
  /pressure_reduced_to_mean_sea_level          Float32  Pa                        [3264, 61, 721, 1440]  [1, 61, 241, 240]
  /pressure_surface                            Float32  Pa                        [3264, 61, 721, 1440]  [1, 61, 241, 240]
  /temperature_2m                              Float32  degree_Celsius            [3264, 61, 721, 1440]  [1, 61, 241, 240]
  /temperature_850hpa                          Float32  degree_Celsius            [3264, 61, 721, 1440]  [1, 61, 241, 240]
  /temperature_925hpa                          Float32  degree_Celsius            [3264, 61, 721, 1440]  [1, 61, 241, 240]
  /total_cloud_cover_atmosphere                Float32  percent                   [3264, 61, 721, 1440]  [1, 61, 241, 240]
  /wind_u_100m                                 Float32  m s-1                     [3264, 61, 721, 1440]  [1, 61, 241, 240]
  /wind_u_10m                                  Float32  m s-1                     [3264, 61, 721, 1440]  [1, 61, 241, 240]
  /wind_v_100m                                 Float32  m s-1                     [3264, 61, 721, 1440]  [1, 61, 241, 240]
  /wind_v_10m                                  Float32  m s-1                     [3264, 61, 721, 1440]  [1, 61, 241, 240]

Scalar arrays:
  Name (path)   Type   Unit
  ------------  -----  ----
  /spatial_ref  Int64

Attributes:
         Name           Type                                      Value
  -------------------  ------  ----------------------------------------------------------------------------
  attribution          String  "ECMWF AIFS Single forecast data processed by dynamical.org from ECMWF Open
                               Data."
  dataset_id           String  "ecmwf-aifs-single-forecast"
  dataset_version      String  "0.1.0"
  description          String  "Weather forecasts from the ECMWF Artificial Intelligence Forecasting System
                               (AIFS) Single model."
  forecast_domain      String  "Forecast lead time 0-360 hours (0-15 days) ahead"
  forecast_resolution  String  "6 hourly"
  license              String  "CC-BY-4.0"
  name                 String  "ECMWF AIFS Single forecast"
  spatial_domain       String  "Global"
  spatial_resolution   String  "0.25 degrees (~20km)"
  time_domain          String  "Forecasts initialized 2024-04-01 00:00:00 UTC to Present"
  time_resolution      String  "Forecasts initialized every 6 hours"

Arrays:

  - /init_time:
      Dimensions:    (/init_time)
      Shape:         [3264]
      Chunk size:    [23360]
      Type:          Int64
      Unit:          seconds since 1970-01-01
      Nodata value:  0

      Attributes:
                 Name            Type                       Value
        ----------------------  ------  ---------------------------------------------
        calendar                String  "proleptic_gregorian"
        long_name               String  "Forecast initialization time"
        standard_name           String  "forecast_reference_time"
        statistics_approximate  String  {"min":"2024-04-01T00:00:00","max":"Present"}

      Structural metadata:
        COMPRESSOR  { "name": "blosc", "configuration": { "typesize": 8, "cname": "zstd", "clevel": 3, "shuffle": "shuffle", "blocksize": 0 } }

  - /latitude:
      Dimensions:    (/latitude)
      Shape:         [721]
      Chunk size:    [721]
      Type:          Float64
      Unit:          degree_north
      Nodata value:  "NaN"

      Attributes:
                 Name            Type          Value
        ----------------------  ------  --------------------
        _FillValue              String  "AAAAAAAA+H8="
        axis                    String  "Y"
        long_name               String  "Latitude"
        statistics_approximate  String  {"min":-90,"max":90}

      Structural metadata:
        COMPRESSOR  { "name": "blosc", "configuration": { "typesize": 8, "cname": "zstd", "clevel": 3, "shuffle": "shuffle", "blocksize": 0 } }

  - /lead_time:
      Dimensions:    (/lead_time)
      Shape:         [61]
      Chunk size:    [61]
      Type:          Float64
      Unit:          seconds
      Nodata value:  "NaN"

      Attributes:
                 Name            Type                         Value
        ----------------------  ------  --------------------------------------------------
        _FillValue              String  "AAAAAAAA+H8="
        dtype                   String  "timedelta64[us]"
        long_name               String  "Forecast lead time"
        standard_name           String  "forecast_period"
        statistics_approximate  String  {"min":"0 days 00:00:00","max":"15 days 00:00:00"}

      Structural metadata:
        COMPRESSOR  { "name": "blosc", "configuration": { "typesize": 8, "cname": "zstd", "clevel": 3, "shuffle": "shuffle", "blocksize": 0 } }

  - /longitude:
      Dimensions:    (/longitude)
      Shape:         [1440]
      Chunk size:    [1440]
      Type:          Float64
      Unit:          degree_east
      Nodata value:  "NaN"

      Attributes:
                 Name            Type             Value
        ----------------------  ------  -------------------------
        _FillValue              String  "AAAAAAAA+H8="
        axis                    String  "X"
        long_name               String  "Longitude"
        statistics_approximate  String  {"min":-180,"max":179.75}

      Structural metadata:
        COMPRESSOR  { "name": "blosc", "configuration": { "typesize": 8, "cname": "zstd", "clevel": 3, "shuffle": "shuffle", "blocksize": 0 } }

  - /dew_point_temperature_2m:
      Dimensions:    (/init_time, /lead_time, /latitude, /longitude)
      Shape:         [3264, 61, 721, 1440]
      Chunk size:    [1, 61, 241, 240]
      Type:          Float32
      Unit:          degree_Celsius
      Nodata value:  "NaN"

      Attributes:
            Name        Type                                     Value
        -------------  ------  --------------------------------------------------------------------------
        _FillValue     String  "AAAAAAAA+H8="
        coordinates    String  "expected_forecast_length ingested_forecast_length spatial_ref valid_time"
        long_name      String  "2 metre dewpoint temperature"
        short_name     String  "2d"
        standard_name  String  "dew_point_temperature"
        step_type      String  "instant"

      Structural metadata:
        COMPRESSOR  { "name": "sharding_indexed", "configuration": { "chunk_shape": [ 1, 61, 241, 240 ], "codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "blosc", "configuration": { "typesize": 4, "cname": "zstd", "clevel": 3, "shuffle": "shuffle", "blocksize": 0 } } ], "index_codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "crc32c" } ], "index_location": "end" } }

  - /downward_long_wave_radiation_flux_surface:
      Dimensions:    (/init_time, /lead_time, /latitude, /longitude)
      Shape:         [3264, 61, 721, 1440]
      Chunk size:    [1, 61, 241, 240]
      Type:          Float32
      Unit:          W m-2
      Nodata value:  "NaN"

      Attributes:
            Name        Type                                     Value
        -------------  ------  --------------------------------------------------------------------------
        _FillValue     String  "AAAAAAAA+H8="
        coordinates    String  "expected_forecast_length ingested_forecast_length spatial_ref valid_time"
        long_name      String  "Surface downward long-wave radiation flux"
        short_name     String  "sdlwrf"
        standard_name  String  "surface_downwelling_longwave_flux_in_air"
        step_type      String  "avg"

      Structural metadata:
        COMPRESSOR  { "name": "sharding_indexed", "configuration": { "chunk_shape": [ 1, 61, 241, 240 ], "codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "blosc", "configuration": { "typesize": 4, "cname": "zstd", "clevel": 3, "shuffle": "shuffle", "blocksize": 0 } } ], "index_codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "crc32c" } ], "index_location": "end" } }

  - /downward_short_wave_radiation_flux_surface:
      Dimensions:    (/init_time, /lead_time, /latitude, /longitude)
      Shape:         [3264, 61, 721, 1440]
      Chunk size:    [1, 61, 241, 240]
      Type:          Float32
      Unit:          W m-2
      Nodata value:  "NaN"

      Attributes:
            Name        Type                                     Value
        -------------  ------  --------------------------------------------------------------------------
        _FillValue     String  "AAAAAAAA+H8="
        coordinates    String  "expected_forecast_length ingested_forecast_length spatial_ref valid_time"
        long_name      String  "Surface downward short-wave radiation flux"
        short_name     String  "sdswrf"
        standard_name  String  "surface_downwelling_shortwave_flux_in_air"
        step_type      String  "avg"

      Structural metadata:
        COMPRESSOR  { "name": "sharding_indexed", "configuration": { "chunk_shape": [ 1, 61, 241, 240 ], "codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "blosc", "configuration": { "typesize": 4, "cname": "zstd", "clevel": 3, "shuffle": "shuffle", "blocksize": 0 } } ], "index_codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "crc32c" } ], "index_location": "end" } }

  - /expected_forecast_length:
      Dimensions:    (/init_time)
      Shape:         [3264]
      Chunk size:    [23360]
      Type:          Float64
      Unit:          seconds
      Nodata value:  "NaN"

      Attributes:
                 Name            Type                         Value
        ----------------------  ------  --------------------------------------------------
        _FillValue              String  "AAAAAAAA+H8="
        dtype                   String  "timedelta64[us]"
        long_name               String  "Expected forecast length"
        statistics_approximate  String  {"min":"0 days 00:00:00","max":"15 days 00:00:00"}

      Structural metadata:
        COMPRESSOR  { "name": "blosc", "configuration": { "typesize": 8, "cname": "zstd", "clevel": 3, "shuffle": "shuffle", "blocksize": 0 } }

  - /geopotential_height_500hpa:
      Dimensions:    (/init_time, /lead_time, /latitude, /longitude)
      Shape:         [3264, 61, 721, 1440]
      Chunk size:    [1, 61, 241, 240]
      Type:          Float32
      Unit:          m
      Nodata value:  "NaN"

      Attributes:
            Name        Type                                     Value
        -------------  ------  --------------------------------------------------------------------------
        _FillValue     String  "AAAAAAAA+H8="
        coordinates    String  "expected_forecast_length ingested_forecast_length spatial_ref valid_time"
        long_name      String  "Geopotential height"
        short_name     String  "gh"
        standard_name  String  "geopotential_height"
        step_type      String  "instant"

      Structural metadata:
        COMPRESSOR  { "name": "sharding_indexed", "configuration": { "chunk_shape": [ 1, 61, 241, 240 ], "codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "blosc", "configuration": { "typesize": 4, "cname": "zstd", "clevel": 3, "shuffle": "shuffle", "blocksize": 0 } } ], "index_codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "crc32c" } ], "index_location": "end" } }

  - /geopotential_height_850hpa:
      Dimensions:    (/init_time, /lead_time, /latitude, /longitude)
      Shape:         [3264, 61, 721, 1440]
      Chunk size:    [1, 61, 241, 240]
      Type:          Float32
      Unit:          m
      Nodata value:  "NaN"

      Attributes:
            Name        Type                                     Value
        -------------  ------  --------------------------------------------------------------------------
        _FillValue     String  "AAAAAAAA+H8="
        coordinates    String  "expected_forecast_length ingested_forecast_length spatial_ref valid_time"
        long_name      String  "Geopotential height"
        short_name     String  "gh"
        standard_name  String  "geopotential_height"
        step_type      String  "instant"

      Structural metadata:
        COMPRESSOR  { "name": "sharding_indexed", "configuration": { "chunk_shape": [ 1, 61, 241, 240 ], "codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "blosc", "configuration": { "typesize": 4, "cname": "zstd", "clevel": 3, "shuffle": "shuffle", "blocksize": 0 } } ], "index_codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "crc32c" } ], "index_location": "end" } }

  - /geopotential_height_925hpa:
      Dimensions:    (/init_time, /lead_time, /latitude, /longitude)
      Shape:         [3264, 61, 721, 1440]
      Chunk size:    [1, 61, 241, 240]
      Type:          Float32
      Unit:          m
      Nodata value:  "NaN"

      Attributes:
            Name        Type                                     Value
        -------------  ------  --------------------------------------------------------------------------
        _FillValue     String  "AAAAAAAA+H8="
        coordinates    String  "expected_forecast_length ingested_forecast_length spatial_ref valid_time"
        long_name      String  "Geopotential height"
        short_name     String  "gh"
        standard_name  String  "geopotential_height"
        step_type      String  "instant"

      Structural metadata:
        COMPRESSOR  { "name": "sharding_indexed", "configuration": { "chunk_shape": [ 1, 61, 241, 240 ], "codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "blosc", "configuration": { "typesize": 4, "cname": "zstd", "clevel": 3, "shuffle": "shuffle", "blocksize": 0 } } ], "index_codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "crc32c" } ], "index_location": "end" } }

  - /ingested_forecast_length:
      Dimensions:    (/init_time)
      Shape:         [3264]
      Chunk size:    [23360]
      Type:          Float64
      Unit:          seconds
      Nodata value:  "NaN"

      Attributes:
                 Name            Type                         Value
        ----------------------  ------  --------------------------------------------------
        _FillValue              String  "AAAAAAAA+H8="
        dtype                   String  "timedelta64[us]"
        long_name               String  "Ingested forecast length"
        statistics_approximate  String  {"min":"0 days 00:00:00","max":"15 days 00:00:00"}

      Structural metadata:
        COMPRESSOR  { "name": "blosc", "configuration": { "typesize": 8, "cname": "zstd", "clevel": 3, "shuffle": "shuffle", "blocksize": 0 } }

  - /precipitation_surface:
      Dimensions:    (/init_time, /lead_time, /latitude, /longitude)
      Shape:         [3264, 61, 721, 1440]
      Chunk size:    [1, 61, 241, 240]
      Type:          Float32
      Unit:          kg m-2 s-1
      Nodata value:  "NaN"

      Attributes:
            Name        Type                                       Value
        -------------  ------  ------------------------------------------------------------------------------
        _FillValue     String  "AAAAAAAA+H8="
        comment        String  "Average precipitation rate since the previous forecast step. Units equivalent
                               to mm/s."
        coordinates    String  "expected_forecast_length ingested_forecast_length spatial_ref valid_time"
        long_name      String  "Precipitation rate"
        short_name     String  "prate"
        standard_name  String  "precipitation_flux"
        step_type      String  "avg"

      Structural metadata:
        COMPRESSOR  { "name": "sharding_indexed", "configuration": { "chunk_shape": [ 1, 61, 241, 240 ], "codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "blosc", "configuration": { "typesize": 4, "cname": "zstd", "clevel": 3, "shuffle": "shuffle", "blocksize": 0 } } ], "index_codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "crc32c" } ], "index_location": "end" } }

  - /pressure_reduced_to_mean_sea_level:
      Dimensions:    (/init_time, /lead_time, /latitude, /longitude)
      Shape:         [3264, 61, 721, 1440]
      Chunk size:    [1, 61, 241, 240]
      Type:          Float32
      Unit:          Pa
      Nodata value:  "NaN"

      Attributes:
            Name        Type                                     Value
        -------------  ------  --------------------------------------------------------------------------
        _FillValue     String  "AAAAAAAA+H8="
        coordinates    String  "expected_forecast_length ingested_forecast_length spatial_ref valid_time"
        long_name      String  "Pressure reduced to MSL"
        short_name     String  "prmsl"
        standard_name  String  "air_pressure_at_mean_sea_level"
        step_type      String  "instant"

      Structural metadata:
        COMPRESSOR  { "name": "sharding_indexed", "configuration": { "chunk_shape": [ 1, 61, 241, 240 ], "codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "blosc", "configuration": { "typesize": 4, "cname": "zstd", "clevel": 3, "shuffle": "shuffle", "blocksize": 0 } } ], "index_codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "crc32c" } ], "index_location": "end" } }

  - /pressure_surface:
      Dimensions:    (/init_time, /lead_time, /latitude, /longitude)
      Shape:         [3264, 61, 721, 1440]
      Chunk size:    [1, 61, 241, 240]
      Type:          Float32
      Unit:          Pa
      Nodata value:  "NaN"

      Attributes:
            Name        Type                                     Value
        -------------  ------  --------------------------------------------------------------------------
        _FillValue     String  "AAAAAAAA+H8="
        coordinates    String  "expected_forecast_length ingested_forecast_length spatial_ref valid_time"
        long_name      String  "Surface pressure"
        short_name     String  "sp"
        standard_name  String  "surface_air_pressure"
        step_type      String  "instant"

      Structural metadata:
        COMPRESSOR  { "name": "sharding_indexed", "configuration": { "chunk_shape": [ 1, 61, 241, 240 ], "codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "blosc", "configuration": { "typesize": 4, "cname": "zstd", "clevel": 3, "shuffle": "shuffle", "blocksize": 0 } } ], "index_codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "crc32c" } ], "index_location": "end" } }

  - /spatial_ref:
      Dimensions:    ()
      Shape:         []
      Type:          Int64
      Nodata value:  0

      Attributes:
                   Name               Type                                        Value
        ---------------------------  -------  -----------------------------------------------------------------------------
        comment                      String   "This coordinate reference system matches the source data which follows WMO
                                              conventions of assuming the earth is a perfect sphere with a radius of 6,371,
                                              229m. It is similar to EPSG:4326, but EPSG:4326 uses a more accurate
                                              representation of the earth's shape."
        crs_wkt                      String   "GEOGCS[\"unknown\",DATUM[\"unknown\",SPHEROID[\"unknown\",6371229,0]],
                                              PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",
                                              0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AXIS[\"Longitude\",EAST],
                                              AXIS[\"Latitude\",NORTH]]"
        geographic_crs_name          String   "unknown"
        grid_mapping_name            String   "latitude_longitude"
        horizontal_datum_name        String   "unknown"
        inverse_flattening           Float64  0
        longitude_of_prime_meridian  Float64  0
        prime_meridian_name          String   "Greenwich"
        reference_ellipsoid_name     String   "unknown"
        semi_major_axis              Float64  6371229
        semi_minor_axis              Float64  6371229
        spatial_ref                  String   "GEOGCS[\"unknown\",DATUM[\"unknown\",SPHEROID[\"unknown\",6371229,0]],
                                              PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",
                                              0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AXIS[\"Longitude\",EAST],
                                              AXIS[\"Latitude\",NORTH]]"

      Structural metadata:
        COMPRESSOR  { "name": "zstd", "configuration": { "level": 0, "checksum": false } }

  - /temperature_2m:
      Dimensions:    (/init_time, /lead_time, /latitude, /longitude)
      Shape:         [3264, 61, 721, 1440]
      Chunk size:    [1, 61, 241, 240]
      Type:          Float32
      Unit:          degree_Celsius
      Nodata value:  "NaN"

      Attributes:
            Name        Type                                     Value
        -------------  ------  --------------------------------------------------------------------------
        _FillValue     String  "AAAAAAAA+H8="
        coordinates    String  "expected_forecast_length ingested_forecast_length spatial_ref valid_time"
        long_name      String  "2 metre temperature"
        short_name     String  "2t"
        standard_name  String  "air_temperature"
        step_type      String  "instant"

      Structural metadata:
        COMPRESSOR  { "name": "sharding_indexed", "configuration": { "chunk_shape": [ 1, 61, 241, 240 ], "codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "blosc", "configuration": { "typesize": 4, "cname": "zstd", "clevel": 3, "shuffle": "shuffle", "blocksize": 0 } } ], "index_codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "crc32c" } ], "index_location": "end" } }

  - /temperature_850hpa:
      Dimensions:    (/init_time, /lead_time, /latitude, /longitude)
      Shape:         [3264, 61, 721, 1440]
      Chunk size:    [1, 61, 241, 240]
      Type:          Float32
      Unit:          degree_Celsius
      Nodata value:  "NaN"

      Attributes:
            Name        Type                                     Value
        -------------  ------  --------------------------------------------------------------------------
        _FillValue     String  "AAAAAAAA+H8="
        coordinates    String  "expected_forecast_length ingested_forecast_length spatial_ref valid_time"
        long_name      String  "Temperature"
        short_name     String  "t"
        standard_name  String  "air_temperature"
        step_type      String  "instant"

      Structural metadata:
        COMPRESSOR  { "name": "sharding_indexed", "configuration": { "chunk_shape": [ 1, 61, 241, 240 ], "codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "blosc", "configuration": { "typesize": 4, "cname": "zstd", "clevel": 3, "shuffle": "shuffle", "blocksize": 0 } } ], "index_codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "crc32c" } ], "index_location": "end" } }

  - /temperature_925hpa:
      Dimensions:    (/init_time, /lead_time, /latitude, /longitude)
      Shape:         [3264, 61, 721, 1440]
      Chunk size:    [1, 61, 241, 240]
      Type:          Float32
      Unit:          degree_Celsius
      Nodata value:  "NaN"

      Attributes:
            Name        Type                                     Value
        -------------  ------  --------------------------------------------------------------------------
        _FillValue     String  "AAAAAAAA+H8="
        coordinates    String  "expected_forecast_length ingested_forecast_length spatial_ref valid_time"
        long_name      String  "Temperature"
        short_name     String  "t"
        standard_name  String  "air_temperature"
        step_type      String  "instant"

      Structural metadata:
        COMPRESSOR  { "name": "sharding_indexed", "configuration": { "chunk_shape": [ 1, 61, 241, 240 ], "codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "blosc", "configuration": { "typesize": 4, "cname": "zstd", "clevel": 3, "shuffle": "shuffle", "blocksize": 0 } } ], "index_codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "crc32c" } ], "index_location": "end" } }

  - /total_cloud_cover_atmosphere:
      Dimensions:    (/init_time, /lead_time, /latitude, /longitude)
      Shape:         [3264, 61, 721, 1440]
      Chunk size:    [1, 61, 241, 240]
      Type:          Float32
      Unit:          percent
      Nodata value:  "NaN"

      Attributes:
            Name        Type                                     Value
        -------------  ------  --------------------------------------------------------------------------
        _FillValue     String  "AAAAAAAA+H8="
        coordinates    String  "expected_forecast_length ingested_forecast_length spatial_ref valid_time"
        long_name      String  "Total cloud cover"
        short_name     String  "tcc"
        standard_name  String  "cloud_area_fraction"
        step_type      String  "instant"

      Structural metadata:
        COMPRESSOR  { "name": "sharding_indexed", "configuration": { "chunk_shape": [ 1, 61, 241, 240 ], "codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "blosc", "configuration": { "typesize": 4, "cname": "zstd", "clevel": 3, "shuffle": "shuffle", "blocksize": 0 } } ], "index_codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "crc32c" } ], "index_location": "end" } }

  - /valid_time:
      Dimensions:    (/init_time, /lead_time)
      Shape:         [3264, 61]
      Chunk size:    [23360, 61]
      Type:          Int64
      Unit:          seconds since 1970-01-01
      Nodata value:  0

      Attributes:
                 Name            Type                       Value
        ----------------------  ------  ---------------------------------------------
        calendar                String  "proleptic_gregorian"
        long_name               String  "Valid time"
        standard_name           String  "time"
        statistics_approximate  String  {"min":"2024-04-01T00:00:00","max":"Present"}

      Structural metadata:
        COMPRESSOR  { "name": "blosc", "configuration": { "typesize": 8, "cname": "zstd", "clevel": 3, "shuffle": "shuffle", "blocksize": 0 } }

  - /wind_u_100m:
      Dimensions:    (/init_time, /lead_time, /latitude, /longitude)
      Shape:         [3264, 61, 721, 1440]
      Chunk size:    [1, 61, 241, 240]
      Type:          Float32
      Unit:          m s-1
      Nodata value:  "NaN"

      Attributes:
            Name        Type                                     Value
        -------------  ------  --------------------------------------------------------------------------
        _FillValue     String  "AAAAAAAA+H8="
        coordinates    String  "expected_forecast_length ingested_forecast_length spatial_ref valid_time"
        long_name      String  "100 metre U wind component"
        short_name     String  "100u"
        standard_name  String  "eastward_wind"
        step_type      String  "instant"

      Structural metadata:
        COMPRESSOR  { "name": "sharding_indexed", "configuration": { "chunk_shape": [ 1, 61, 241, 240 ], "codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "blosc", "configuration": { "typesize": 4, "cname": "zstd", "clevel": 3, "shuffle": "shuffle", "blocksize": 0 } } ], "index_codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "crc32c" } ], "index_location": "end" } }

  - /wind_u_10m:
      Dimensions:    (/init_time, /lead_time, /latitude, /longitude)
      Shape:         [3264, 61, 721, 1440]
      Chunk size:    [1, 61, 241, 240]
      Type:          Float32
      Unit:          m s-1
      Nodata value:  "NaN"

      Attributes:
            Name        Type                                     Value
        -------------  ------  --------------------------------------------------------------------------
        _FillValue     String  "AAAAAAAA+H8="
        coordinates    String  "expected_forecast_length ingested_forecast_length spatial_ref valid_time"
        long_name      String  "10 metre U wind component"
        short_name     String  "10u"
        standard_name  String  "eastward_wind"
        step_type      String  "instant"

      Structural metadata:
        COMPRESSOR  { "name": "sharding_indexed", "configuration": { "chunk_shape": [ 1, 61, 241, 240 ], "codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "blosc", "configuration": { "typesize": 4, "cname": "zstd", "clevel": 3, "shuffle": "shuffle", "blocksize": 0 } } ], "index_codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "crc32c" } ], "index_location": "end" } }

  - /wind_v_100m:
      Dimensions:    (/init_time, /lead_time, /latitude, /longitude)
      Shape:         [3264, 61, 721, 1440]
      Chunk size:    [1, 61, 241, 240]
      Type:          Float32
      Unit:          m s-1
      Nodata value:  "NaN"

      Attributes:
            Name        Type                                     Value
        -------------  ------  --------------------------------------------------------------------------
        _FillValue     String  "AAAAAAAA+H8="
        coordinates    String  "expected_forecast_length ingested_forecast_length spatial_ref valid_time"
        long_name      String  "100 metre V wind component"
        short_name     String  "100v"
        standard_name  String  "northward_wind"
        step_type      String  "instant"

      Structural metadata:
        COMPRESSOR  { "name": "sharding_indexed", "configuration": { "chunk_shape": [ 1, 61, 241, 240 ], "codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "blosc", "configuration": { "typesize": 4, "cname": "zstd", "clevel": 3, "shuffle": "shuffle", "blocksize": 0 } } ], "index_codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "crc32c" } ], "index_location": "end" } }

  - /wind_v_10m:
      Dimensions:    (/init_time, /lead_time, /latitude, /longitude)
      Shape:         [3264, 61, 721, 1440]
      Chunk size:    [1, 61, 241, 240]
      Type:          Float32
      Unit:          m s-1
      Nodata value:  "NaN"

      Attributes:
            Name        Type                                     Value
        -------------  ------  --------------------------------------------------------------------------
        _FillValue     String  "AAAAAAAA+H8="
        coordinates    String  "expected_forecast_length ingested_forecast_length spatial_ref valid_time"
        long_name      String  "10 metre V wind component"
        short_name     String  "10v"
        standard_name  String  "northward_wind"
        step_type      String  "instant"

      Structural metadata:
        COMPRESSOR  { "name": "sharding_indexed", "configuration": { "chunk_shape": [ 1, 61, 241, 240 ], "codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "blosc", "configuration": { "typesize": 4, "cname": "zstd", "clevel": 3, "shuffle": "shuffle", "blocksize": 0 } } ], "index_codecs": [ { "name": "bytes", "configuration": { "endian": "little" } }, { "name": "crc32c" } ], "index_location": "end" } }

It’s a lot of output! For comparison let’s see what xarray will show as equivalent summary.

<xarray.Dataset> Size: 14TB
Dimensions:                                     (init_time: 3264,
                                                 latitude: 721, lead_time: 61,
                                                 longitude: 1440)
Coordinates:
  * init_time                                   (init_time) datetime64[ns] 26kB ...
  * latitude                                    (latitude) float64 6kB 90.0 ....
  * lead_time                                   (lead_time) float64 488B 0.0 ...
  * longitude                                   (longitude) float64 12kB -180...
Data variables: (12/21)
    dew_point_temperature_2m                    (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(3264, 61, 721, 1440), meta=np.ndarray>
    downward_long_wave_radiation_flux_surface   (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(3264, 61, 721, 1440), meta=np.ndarray>
    downward_short_wave_radiation_flux_surface  (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(3264, 61, 721, 1440), meta=np.ndarray>
    expected_forecast_length                    (init_time) float64 26kB dask.array<chunksize=(3264,), meta=np.ndarray>
    geopotential_height_500hpa                  (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(3264, 61, 721, 1440), meta=np.ndarray>
    geopotential_height_850hpa                  (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(3264, 61, 721, 1440), meta=np.ndarray>
    ...                                          ...
    total_cloud_cover_atmosphere                (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(3264, 61, 721, 1440), meta=np.ndarray>
    valid_time                                  (init_time, lead_time) float32 796kB dask.array<chunksize=(3264, 61), meta=np.ndarray>
    wind_u_100m                                 (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(3264, 61, 721, 1440), meta=np.ndarray>
    wind_u_10m                                  (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(3264, 61, 721, 1440), meta=np.ndarray>
    wind_v_100m                                 (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(3264, 61, 721, 1440), meta=np.ndarray>
    wind_v_10m                                  (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(3264, 61, 721, 1440), meta=np.ndarray>
Attributes:
    attribution:          ECMWF AIFS Single forecast data processed by dynami...
    dataset_id:           ecmwf-aifs-single-forecast
    dataset_version:      0.1.0
    description:          Weather forecasts from the ECMWF Artificial Intelli...
    forecast_domain:      Forecast lead time 0-360 hours (0-15 days) ahead
    forecast_resolution:  6 hourly
    license:              CC-BY-4.0
    name:                 ECMWF AIFS Single forecast
    spatial_domain:       Global
    spatial_resolution:   0.25 degrees (~20km)
    time_domain:          Forecasts initialized 2024-04-01 00:00:00 UTC to Pr...
    time_resolution:      Forecasts initialized every 6 hours

That’s a lot denser, but the important thing is that GDAL has access to the same rich underlying Zarr data and note that xarray has clipped the full list of variables in the summary.

There are a few things to unpack in this cli call.

GDAL multidimensional

gdal mdim provides a family of apps for the GDAL multidimensional raster model - this means that array data is presented in its native form with potentially multiple dimensions, and multiple variables like the NetCDF4 or HDF5 file type, whereas the classic 2D raster model (a 2D grid with one or more bands) can only present one variable and must unroll higher dimension out as bands.

The xarray output and the gdal mdim info output above is human readable, for a programmatic summary we can call

gdal mdim info --format json \
  /vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk \
  --config AWS_NO_SIGN_REQUEST YES --config AWS_REGION us-west-2

and in R we can do

gdalraster::set_config_option("AWS_NO_SIGN_REQUEST", "YES")
gdalraster::set_config_option("AWS_REGION", "us-west-2")
json <- gdalraster::mdim_info("/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk", cout = FALSE)
cat(substr(json, 1, 1000))
# {
#   "type": "group",
#   "driver": "Icechunk",
#   "name": "/",
#   "attributes": {
#     "attribution": "ECMWF AIFS Single forecast data processed by dynamical.org from ECMWF Open Data.",
#     "dataset_id": "ecmwf-aifs-single-forecast",
#     "dataset_version": "0.1.0",
#     "description": "Weather forecasts from the ECMWF Artificial Intelligence Forecasting System (AIFS) Single model.",
#     "forecast_domain": "Forecast lead time 0-360 hours (0-15 days) ahead",
#     "forecast_resolution": "6 hourly",
#     "license": "CC-BY-4.0",
#     "name": "ECMWF AIFS Single forecast",
#     "spatial_domain": "Global",
#     "spatial_resolution": "0.25 degrees (~20km)",
#     "time_domain": "Forecasts initialized 2024-04-01 00:00:00 UTC to Present",
#     "time_resolution": "Forecasts initialized every 6 hours"
#   },
#   "dimensions": [
#     {
#       "name": "init_time",
#       "full_name": "/init_time",
#       "size": 3264,
#       "indexing_variable": "/init_time"
#     },
#     {
#       "name": "latitude",
#       "full_name": "/latit...  

or with terra (in-dev, only on github atm)

terra::setGDALconfig("AWS_NO_SIGN_REQUEST", "YES")
terra::setGDALconfig("AWS_REGION", "us-west-2")
terra::rast("/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk")
# class       : SpatRaster
# size        : 721, 1440, 3384768  (nrow, ncol, nlyr)
# resolution  : 0.25, 0.25  (x, y)
# extent      : -180.125, 179.875, -90.125, 90.125  (xmin, xmax, ymin, ymax)
# coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84)
# sources     : v0.1.0.icechunk
# varnames    : dew_point_temperature_2m (2 metre dewpoint temperature)
#               downward_long_wave_radiation_flux_surface (Surface downward long-wave radiation flux)
#               downward_short_wave_radiation_flux_surface (Surface downward short-wave radiation flux)
#               geopotential_height_500hpa (Geopotential height)
#               geopotential_height_850hpa (Geopotential height)
#               ...
# names       :    dew_p~600_1,    dew_p~600_2,    dew_p~600_3,    dew_p~600_4,    dew_p~600_5,    dew_p~600_6, ...
# unit        : degree_Celsius, degree_Celsius, degree_Celsius, degree_Celsius, degree_Celsius, degree_Celsius, ...
# depth       : 1.71193e+09 to 1.78241e+09 (: 3264 steps)
# time        : 0
# Warning message:
# [rast] skipped array (different geometry): /valid_time

or sf/stars

Sys.setenv("AWS_NO_SIGN_REQUEST" = "YES")
Sys.setenv("AWS_REGION" = "us-west-2")
s <- sf::gdal_read_mdim("/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk", proxy = TRUE)
str(s)
List of 4
 $ array_list:List of 17
  ..$ dew_point_temperature_2m                  : NULL
  ..$ downward_long_wave_radiation_flux_surface : NULL
  ..$ downward_short_wave_radiation_flux_surface: NULL
...
Click here for the full sf structure info output
List of 4
 $ array_list:List of 17
  ..$ dew_point_temperature_2m                  : NULL
  ..$ downward_long_wave_radiation_flux_surface : NULL
  ..$ downward_short_wave_radiation_flux_surface: NULL
  ..$ geopotential_height_500hpa                : NULL
  ..$ geopotential_height_850hpa                : NULL
  ..$ geopotential_height_925hpa                : NULL
  ..$ precipitation_surface                     : NULL
  ..$ pressure_reduced_to_mean_sea_level        : NULL
  ..$ pressure_surface                          : NULL
  ..$ temperature_2m                            : NULL
  ..$ temperature_850hpa                        : NULL
  ..$ temperature_925hpa                        : NULL
  ..$ total_cloud_cover_atmosphere              : NULL
  ..$ wind_u_100m                               : NULL
  ..$ wind_u_10m                                : NULL
  ..$ wind_v_100m                               : NULL
  ..$ wind_v_10m                                : NULL
 $ dimensions:List of 4
  ..$ init_time:List of 5
  .. ..$ from     : int 1
  .. ..$ to       : int 3264
  .. ..$ values   :List of 1
  .. .. ..$ : num [1:3264(1d)] 1.71e+09 1.71e+09 1.71e+09 1.71e+09 1.71e+09 ...
  .. .. .. ..- attr(*, "units")= chr "seconds since 1970-01-01"
  .. .. .. ..- attr(*, "d_names")= chr "init_time"
  .. .. .. ..- attr(*, "attributes")= Named chr [1:4] "proleptic_gregorian" "Forecast initialization time" "forecast_reference_time" "{ \"min\": \"2024-04-01T00:00:00\", \"max\": \"Present\" }"
  .. .. .. .. ..- attr(*, "names")= chr [1:4] "calendar" "long_name" "standard_name" "statistics_approximate"
  .. ..$ type     : chr ""
  .. ..$ direction: chr ""
  ..$ lead_time:List of 5
  .. ..$ from     : int 1
  .. ..$ to       : int 61
  .. ..$ values   :List of 1
  .. .. ..$ : num [1:61(1d)] 0 21600 43200 64800 86400 ...
  .. .. .. ..- attr(*, "units")= chr "seconds"
  .. .. .. ..- attr(*, "d_names")= chr "lead_time"
  .. .. .. ..- attr(*, "attributes")= Named chr [1:5] "AAAAAAAA+H8=" "timedelta64[us]" "Forecast lead time" "forecast_period" ...
  .. .. .. .. ..- attr(*, "names")= chr [1:5] "_FillValue" "dtype" "long_name" "standard_name" ...
  .. ..$ type     : chr ""
  .. ..$ direction: chr ""
  ..$ latitude :List of 5
  .. ..$ from     : int 1
  .. ..$ to       : int 721
  .. ..$ values   :List of 1
  .. .. ..$ : num [1:721(1d)] 90 89.8 89.5 89.2 89 ...
  .. .. .. ..- attr(*, "units")= chr "degree_north"
  .. .. .. ..- attr(*, "d_names")= chr "latitude"
  .. .. .. ..- attr(*, "attributes")= Named chr [1:4] "AAAAAAAA+H8=" "Y" "Latitude" "{ \"min\": -90.0, \"max\": 90.0 }"
  .. .. .. .. ..- attr(*, "names")= chr [1:4] "_FillValue" "axis" "long_name" "statistics_approximate"
  .. ..$ type     : chr "HORIZONTAL_Y"
  .. ..$ direction: chr ""
  ..$ longitude:List of 5
  .. ..$ from     : int 1
  .. ..$ to       : int 1440
  .. ..$ values   :List of 1
  .. .. ..$ : num [1:1440(1d)] -180 -180 -180 -179 -179 ...
  .. .. .. ..- attr(*, "units")= chr "degree_east"
  .. .. .. ..- attr(*, "d_names")= chr "longitude"
  .. .. .. ..- attr(*, "attributes")= Named chr [1:4] "AAAAAAAA+H8=" "X" "Longitude" "{ \"min\": -180.0, \"max\": 179.75 }"
  .. .. .. .. ..- attr(*, "names")= chr [1:4] "_FillValue" "axis" "long_name" "statistics_approximate"
  .. ..$ type     : chr "HORIZONTAL_X"
  .. ..$ direction: chr ""
 $ srs       : chr NA
 $ geometry  : list()

That is leveraged by stars to give

Sys.setenv("AWS_NO_SIGN_REQUEST" = "YES")
Sys.setenv("AWS_REGION" = "us-west-2")
stars::read_mdim("/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk", proxy = TRUE)

# stars_proxy object with 17 attributes in 17 file(s):
# $dew_point_temperature_2m
# [1] "[...]/v0.1.0.icechunk"
# 
# $downward_long_wave_radiation_flux_surface
# [1] "[...]/v0.1.0.icechunk"
# 
# ...
# 
# $temperature_925hpa
# [1] "[...]/v0.1.0.icechunk"
# 
# $total_cloud_cover_atmosphere
# [1] "[...]/v0.1.0.icechunk"
# 
# $wind_u_100m
# [1] "[...]/v0.1.0.icechunk"
# 
# $wind_u_10m
# [1] "[...]/v0.1.0.icechunk"
# 
# $wind_v_100m
# [1] "[...]/v0.1.0.icechunk"
# 
# $wind_v_10m
# [1] "[...]/v0.1.0.icechunk"
# 
# dimension(s):
#           from   to         offset     delta  refsys x/y
# longitude    1 1440         -180.1      0.25      NA [x]
# latitude     1  721          90.12     -0.25      NA [y]
# lead_time    1   61          0 [s] 21600 [s] udunits    
# init_time    1 3264 2024-04-01 UTC   6 hours POSIXct    

Please note that this store contains pretty simple regular array data, and so the relationship of data arrays to their coordinates is simple, the map in longitude/latitude is a regular grid and so slices of that can correspond to the normal raster model where a bounding box (or corner coordinates + resolution) is sufficient to georeference the data. Multidimensional data can be more general and more complex than that.

This post won’t explore any more detail of the various ways R packages support mdim.

Multidimensional?

We can treat this store as raster data in the normal 2D way, but note that there are caveats and real performance implications depending on the way data is laid out in a given store.

gdalinfo /vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk --config AWS_NO_SIGN_REQUEST YES --config AWS_REGION us-west-2
Driver: Icechunk/Icechunk
Files: /vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk
Size is 512, 512
Subdatasets:
  SUBDATASET_1_NAME=ZARR:"/vsiicechunk/{/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk}":/dew_point_temperature_2m
  SUBDATASET_1_DESC=[3264x61x721x1440] /dew_point_temperature_2m (Float32)
  SUBDATASET_2_NAME=ZARR:"/vsiicechunk/{/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk}":/downward_long_wave_radiation_flux_surface
  SUBDATASET_2_DESC=[3264x61x721x1440] /downward_long_wave_radiation_flux_surface (Float32)
  SUBDATASET_3_NAME=ZARR:"/vsiicechunk/{/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk}":/downward_short_wave_radiation_flux_surface
  SUBDATASET_3_DESC=[3264x61x721x1440] /downward_short_wave_radiation_flux_surface (Float32)
  SUBDATASET_4_NAME=ZARR:"/vsiicechunk/{/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk}":/geopotential_height_500hpa
  SUBDATASET_4_DESC=[3264x61x721x1440] /geopotential_height_500hpa (Float32)
  SUBDATASET_5_NAME=ZARR:"/vsiicechunk/{/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk}":/geopotential_height_850hpa
  SUBDATASET_5_DESC=[3264x61x721x1440] /geopotential_height_850hpa (Float32)
  SUBDATASET_6_NAME=ZARR:"/vsiicechunk/{/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk}":/geopotential_height_925hpa
  SUBDATASET_6_DESC=[3264x61x721x1440] /geopotential_height_925hpa (Float32)
  SUBDATASET_7_NAME=ZARR:"/vsiicechunk/{/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk}":/precipitation_surface
  SUBDATASET_7_DESC=[3264x61x721x1440] /precipitation_surface (Float32)
  SUBDATASET_8_NAME=ZARR:"/vsiicechunk/{/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk}":/pressure_reduced_to_mean_sea_level
  SUBDATASET_8_DESC=[3264x61x721x1440] /pressure_reduced_to_mean_sea_level (Float32)
  SUBDATASET_9_NAME=ZARR:"/vsiicechunk/{/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk}":/pressure_surface
  SUBDATASET_9_DESC=[3264x61x721x1440] /pressure_surface (Float32)
  SUBDATASET_10_NAME=ZARR:"/vsiicechunk/{/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk}":/temperature_2m
  SUBDATASET_10_DESC=[3264x61x721x1440] /temperature_2m (Float32)
  SUBDATASET_11_NAME=ZARR:"/vsiicechunk/{/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk}":/temperature_850hpa
  SUBDATASET_11_DESC=[3264x61x721x1440] /temperature_850hpa (Float32)
  SUBDATASET_12_NAME=ZARR:"/vsiicechunk/{/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk}":/temperature_925hpa
  SUBDATASET_12_DESC=[3264x61x721x1440] /temperature_925hpa (Float32)
  SUBDATASET_13_NAME=ZARR:"/vsiicechunk/{/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk}":/total_cloud_cover_atmosphere
  SUBDATASET_13_DESC=[3264x61x721x1440] /total_cloud_cover_atmosphere (Float32)
  SUBDATASET_14_NAME=ZARR:"/vsiicechunk/{/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk}":/valid_time
  SUBDATASET_14_DESC=[3264x61] /valid_time (Int64)
  SUBDATASET_15_NAME=ZARR:"/vsiicechunk/{/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk}":/wind_u_100m
  SUBDATASET_15_DESC=[3264x61x721x1440] /wind_u_100m (Float32)
  SUBDATASET_16_NAME=ZARR:"/vsiicechunk/{/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk}":/wind_u_10m
  SUBDATASET_16_DESC=[3264x61x721x1440] /wind_u_10m (Float32)
  SUBDATASET_17_NAME=ZARR:"/vsiicechunk/{/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk}":/wind_v_100m
  SUBDATASET_17_DESC=[3264x61x721x1440] /wind_v_100m (Float32)
  SUBDATASET_18_NAME=ZARR:"/vsiicechunk/{/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk}":/wind_v_10m
  SUBDATASET_18_DESC=[3264x61x721x1440] /wind_v_10m (Float32)
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0,  512.0)
Upper Right (  512.0,    0.0)
Lower Right (  512.0,  512.0)
Center      (  256.0,  256.0)

Those strings like "ZARR:\"/vsiicechunk/{/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk}\":/dew_point_temperature_2m" (note the embedded escaped quotes) can be used by the more tradition GDAL classic raster model.

We’ll come back to the /vsiicechunk and {/vsis3} parts of this string a bit later.

Setting config options and environment variables for GDAL

GDAL config options can be set through a package wrapper - gdalraster::set_config_option("AWS_NO_SIGN_REQUEST", "YES") or terra::setGDALconfig("AWS_NO_SIGN_REQUEST", "YES") - or as process environment variables (Sys.setenv(AWS_NO_SIGN_REQUEST = "YES"), or KEY=value in the shell). Both routes are global and sticky: once set they can affect every other step, spooky-action-at-a-distance that we want to avoid.

That’s why setting config inline - --config KEY=VALUE on the gdal mdim info call is a clean option: it lives and dies in one line, a bit like with in Python or with() in R. The R equivalent for env vars and options is withr::with_envvar() / withr::with_options(), true scoped context managers.

With online storage it can be bewildering what exact configuration you need and what you have, but it does come down to a fairly comprehensible set of ideas. This post is not a guide to online storage access configuration, or a guide to GDAL configuration options but there’s a finite set of concepts in this jungle so don’t be put off by the complexity.

Auth - keyed vs anonymous. If a dataset is not public, you will need something like AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. If it is public, you will need something like AWS_NO_SIGN_REQUEST=YES (or --no-sign-request).

Addressing. Everything is a key inside a bucket. You reference an object as s3://bucket/key (Amazon), gs://bucket/key (Google Storage), abfss://… (Azure storage) - and that’s the whole model. (Azure bends it slightly: a storage account holds containers holding blobs, and the scheme carries the account, abfss://container@account.dfs.core.windows.net/key.)

Endpoint, region, hosting-style. That object is hosted on an endpoint - s3:// is shorthand for s3.<region>.amazonaws.com, gs:// for storage.googleapis.com. A custom endpoint means pointing at your own host - AWS_S3_ENDPOINT=your.endpoint.net - so that s3 in s3://bucket/key resolves to your host and not Amazon’s. Virtual hosting just means the bucket goes before the endpoint in the final url https://bucket.s3.<region>.amazonaws.com/key vs https://my.custom.endpoint.net/bucket/key. It’s confusing, until you know what it does and when you might need it. For non-AWS S3 implementations (MinIO, Ceph) this usually goes together with path-style addressing, AWS_VIRTUAL_HOSTING=NO, since most of them don’t do virtual-host style. AWS_REGION is sometimes needed but probably shouldn’t matter - set it to ““.

GDAL and the virtual filesystem

Everything above is generic - true of any R tool talking to object storage - and GDAL addresses remote objects through its own virtual filesystem (VSI) rather than the s3://-style schemes. Again, bucket, key but with a different prefix:

cloud-native GDAL VSI
s3://bucket/key /vsis3/bucket/key
gs://bucket/key /vsigs/bucket/key
abfss://container/key /vsiaz/container/key

This is the most common first stumble: s3://… handed to terra::rast() or vapour::gdal_raster_data() won’t open; the /vsis3/… form will. Be aware that some packages in R and other langs will do their own heuristics to wrap different protocols: ‘https://’, ‘s3://’, ‘virtualizarr://’ etc - this can be helpful but also very confusing, and this post cannot summarize that landscape. When GDAL is in use my preference is to use GDAL protocols without wrapping. The config options from above (AWS_NO_SIGN_REQUEST, AWS_S3_ENDPOINT, and friends) all still apply unchanged - VSI only changes how you name the object, it doesn’t affect how or if it’s accessible.

The same /vsi* mechanism abstracts over a whole zoo of access methods, all of which GDAL then reads as if they were ordinary local files:

prefix what it opens
/vsicurl/ any file over plain HTTP/HTTPS (range-requested, not downloaded whole)
/vsizip/ a file inside a ZIP archive
/vsitar/, /vsigzip/ likewise for TAR / gzip
/vsimem/ an in-memory buffer
/vsiswift/, /vsioss/ OpenStack Swift, Alibaba OSS, …
/vsistdout/ stream output straight to standard out

The real payoff is that these compose - you chain prefixes right-to-left to stack the abstractions. To read a single GeoTIFF sitting inside a ZIP that lives in an S3 bucket, none of which you download in full:

/vsizip//vsis3/bucket/archive.zip/inside.tif

GDAL signs the S3 request, range-reads the ZIP’s central directory, pulls just the bytes for inside.tif, and hands you a raster. That’s the whole world the VSI layer opens up: object storage, archives, compression, and HTTP are all just prefixes over the same “name a thing, read its bytes” idea.

Naming a thing inside a thing

That earlier chain glued its prefixes by plain concatenation - /vsizip/ meets /vsis3/, and since both carry a leading slash you get the doubled //. GDAL finds the archive boundary inside that run by heuristic: it spots the .zip. The brace form, /vsizip/{…}, makes the boundary explicit rather than guessed - and once the inner path carries its own punctuation (a URL with a query string, say) explicit is the only thing that works.

GEBCO 2026 ships its global grid as a single zip behind a download URL ending …gebco_2026_geotiff.zip?download=1. To reach into it without pulling the whole archive, chain /vsizip/ over /vsicurl/, braces around the inner path because of that ?download=1 which masks the fact that it’s a zip file:

/vsizip/{/vsicurl/https://dap.ceda.ac.uk/bodc/gebco/global/gebco_2026/ice_surface_elevation/geotiff/gebco_2026_geotiff.zip?download=1}

If we point GDAL at that archive it can’t return a single raster, so it tells you what’s in there instead:

ERROR 4: ... not recognized as being in a supported file format.
The archive contains several files:
  …}/gebco_2026_n0.0_s-90.0_w0.0_e90.0_geotiff.tif
  …}/gebco_2026_n0.0_s-90.0_w-180.0_e-90.0_geotiff.tif
     (six more files)
  …}/GEBCO_Grid_documentation.pdf
  …}/GEBCO_Grid_terms_of_use.pdf

That listing is a logical filesystem - real files physically inside the zip in an opaque form, enumerated out of a remote archive. Append one of them after the closing brace and you’ve named a single item; GDAL opens it.

Compare that to the subdataset syntax, multiple subdatasets of classic 2D raster we saw in the Icechunk store above:

ZARR:"/vsiicechunk/{/vsis3/.../v0.1.0.icechunk}":/dew_point_temperature_2m

Again we have bracing of an internal /vsi* item but the thing named after it, /dew_point_temperature_2m, isn’t a file but a variable in a logical hierarchy, a node in the Zarr data model. /vsizip/ descends into the structural contents of an archive, ZARR: over /vsiicechunk/ into the logical contents of a data model - that’s why we see different forms of addressing in stores.

GDAL can list things, here we use gdal vsi list to see the structural contents:

gdal vsi list   /vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk \
  --config AWS_NO_SIGN_REQUEST YES --config AWS_REGION us-west-2
repo
chunks
manifests
overwritten
snapshots
transactions

ok we think, those look like directories so let’s go deeper (eek!):

gdal vsi list   /vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk/chunks \
 --config AWS_NO_SIGN_REQUEST YES --config AWS_REGION us-west-2
0003DSF0M0N9B5MP3R9G
0010VXY96ABV0RP45T30
0015TP2QVGT990JE9PM0
...

yes it’s valid content but not something we can understand - we are seeing the internal structure of a complex sophisticated cloud format. We know it’s Icechunk so lets wield the Icechunk virtualization tool

gdal vsi list /vsiicechunk/{/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk} \
 --config AWS_NO_SIGN_REQUEST YES --config AWS_REGION us-west-2
zarr.json
dew_point_temperature_2m
downward_long_wave_radiation_flux_surface
downward_short_wave_radiation_flux_surface
expected_forecast_length
geopotential_height_500hpa
geopotential_height_850hpa
geopotential_height_925hpa
ingested_forecast_length
init_time
latitude
lead_time
longitude
precipitation_surface
pressure_reduced_to_mean_sea_level
pressure_surface
spatial_ref
temperature_2m
temperature_850hpa
temperature_925hpa
total_cloud_cover_atmosphere
valid_time
wind_u_100m
wind_u_10m
wind_v_100m
wind_v_10m

and now we are back in logical ground, this is the interpretation that gdal mdim info gave us above, not raw listing of the structural contents.

That shows some of the depth of GDAL abstraction. “Name a thing, read its bytes” it’s not just bytes on a disk. A thing can be an object in a bucket, a member of an archive, a variable in a chunked store, etc - and the path is the recipe for access.

The distinction between material structure (thousands of small manifest files on S3) and logical/virtual structure (the clean array hierarchy you see via /vsiicechunk/) is a key icechunk concept. GDAL’s /vsi* filesystem stack handles this transparently - the same way /vsicurl/, /vsis3/, /vsigs/ abstract over HTTP and object stores.

What about python?

Above we saw how GDAL and some R packages interact with the Icechunk store, how do we do that in Python?

This time we don’t have a single description of the store, we have to use infrastructure to encode the details.

import icechunk
import xarray 

storage = icechunk.s3_storage(
    bucket="dynamical-ecmwf-aifs-single",
    prefix="ecmwf-aifs-single-forecast/v0.1.0.icechunk",
    region="us-west-2",
    anonymous=True,           # the AWS_NO_SIGN_REQUEST=YES equivalent
)
repo = icechunk.Repository.open(storage)
session = repo.readonly_session("main")
ds = xarray.open_zarr(session.store, consolidated=False)
ds
# <xarray.Dataset> Size: 14TB
# Dimensions:                                     (init_time: 3265,
#                                                  lead_time: 61, latitude: 721,
#                                                  longitude: 1440)
# Coordinates:
#   * init_time                                   (init_time) datetime64[ns] 26kB ...
#     expected_forecast_length                    (init_time) timedelta64[us] 26kB dask.array<chunksize=(3265,), meta=np.ndarray>
#     ingested_forecast_length                    (init_time) timedelta64[us] 26kB dask.array<chunksize=(3265,), meta=np.ndarray>
#   * lead_time                                   (lead_time) timedelta64[us] 488B ...
#     valid_time                                  (init_time, lead_time) datetime64[ns] 2MB dask.array<chunksize=(3265, 61), meta=np.ndarray>
#   * latitude                                    (latitude) float64 6kB 90.0 ....
#   * longitude                                   (longitude) float64 12kB -180...
#     spatial_ref                                 int64 8B ...
# Data variables: (12/17)
#     dew_point_temperature_2m                    (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(1, 61, 241, 240), meta=np.ndarray>
#     downward_long_wave_radiation_flux_surface   (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(1, 61, 241, 240), meta=np.ndarray>
#     geopotential_height_500hpa                  (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(1, 61, 241, 240), meta=np.ndarray>
#     geopotential_height_850hpa                  (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(1, 61, 241, 240), meta=np.ndarray>
#     precipitation_surface                       (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(1, 61, 241, 240), meta=np.ndarray>
#     downward_short_wave_radiation_flux_surface  (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(1, 61, 241, 240), meta=np.ndarray>
#     ...                                          ...
#     wind_u_100m                                 (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(1, 61, 241, 240), meta=np.ndarray>
#     temperature_925hpa                          (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(1, 61, 241, 240), meta=np.ndarray>
#     total_cloud_cover_atmosphere                (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(1, 61, 241, 240), meta=np.ndarray>
#     wind_u_10m                                  (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(1, 61, 241, 240), meta=np.ndarray>
#     wind_v_100m                                 (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(1, 61, 241, 240), meta=np.ndarray>
#     wind_v_10m                                  (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(1, 61, 241, 240), meta=np.ndarray>
# Attributes:
#     dataset_id:           ecmwf-aifs-single-forecast
#     dataset_version:      0.1.0
#     name:                 ECMWF AIFS Single forecast
#     description:          Weather forecasts from the ECMWF Artificial Intelli...
#     attribution:          ECMWF AIFS Single forecast data processed by dynami...
#     license:              CC-BY-4.0
#     spatial_domain:       Global
#     spatial_resolution:   0.25 degrees (~20km)
#     time_domain:          Forecasts initialized 2024-04-01 00:00:00 UTC to Pr...
#     time_resolution:      Forecasts initialized every 6 hours
#     forecast_domain:      Forecast lead time 0-360 hours (0-15 days) ahead
#     forecast_resolution:  6 hourly

We looked above at this xarray representation but up there we didn’t use the same infrastructure or show the code. The one above was made with

import os
import xarray

os.environ["AWS_NO_SIGN_REQUEST"] = "YES"
os.environ["AWS_REGION"] = "us-west-2"

ds = xarray.open_dataset("/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk", engine = "gdalxarray", chunks = {})
# <xarray.Dataset> Size: 14TB
# Dimensions:                                     (init_time: 3265,
#                                                  latitude: 721, lead_time: 61,
#                                                  longitude: 1440)
# Coordinates:
#   * init_time                                   (init_time) datetime64[ns] 26kB ...
#     expected_forecast_length                    (init_time) float64 26kB dask.array<chunksize=(3265,), meta=np.ndarray>
#     ingested_forecast_length                    (init_time) float64 26kB dask.array<chunksize=(3265,), meta=np.ndarray>
#   * latitude                                    (latitude) float64 6kB 90.0 ....
#   * lead_time                                   (lead_time) float64 488B 0.0 ...
#     valid_time                                  (init_time, lead_time) float32 797kB dask.array<chunksize=(3265, 61), meta=np.ndarray>
#   * longitude                                   (longitude) float64 12kB -180...
#     spatial_ref                                 float32 4B ...
# Data variables: (12/17)
#     dew_point_temperature_2m                    (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(3265, 61, 721, 1440), meta=np.ndarray>
#     downward_long_wave_radiation_flux_surface   (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(3265, 61, 721, 1440), meta=np.ndarray>
#     downward_short_wave_radiation_flux_surface  (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(3265, 61, 721, 1440), meta=np.ndarray>
#     geopotential_height_500hpa                  (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(3265, 61, 721, 1440), meta=np.ndarray>
#     geopotential_height_850hpa                  (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(3265, 61, 721, 1440), meta=np.ndarray>
#     geopotential_height_925hpa                  (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(3265, 61, 721, 1440), meta=np.ndarray>
#     ...                                          ...
#     temperature_925hpa                          (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(3265, 61, 721, 1440), meta=np.ndarray>
#     total_cloud_cover_atmosphere                (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(3265, 61, 721, 1440), meta=np.ndarray>
#     wind_u_100m                                 (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(3265, 61, 721, 1440), meta=np.ndarray>
#     wind_u_10m                                  (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(3265, 61, 721, 1440), meta=np.ndarray>
#     wind_v_100m                                 (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(3265, 61, 721, 1440), meta=np.ndarray>
#     wind_v_10m                                  (init_time, lead_time, latitude, longitude) float32 827GB dask.array<chunksize=(3265, 61, 721, 1440), meta=np.ndarray>
# Attributes:
#     attribution:          ECMWF AIFS Single forecast data processed by dynami...
#     dataset_id:           ecmwf-aifs-single-forecast
#     dataset_version:      0.1.0
#     description:          Weather forecasts from the ECMWF Artificial Intelli...
#     forecast_domain:      Forecast lead time 0-360 hours (0-15 days) ahead
#     forecast_resolution:  6 hourly
#     license:              CC-BY-4.0
#     name:                 ECMWF AIFS Single forecast
#     spatial_domain:       Global
#     spatial_resolution:   0.25 degrees (~20km)
#     time_domain:          Forecasts initialized 2024-04-01 00:00:00 UTC to Pr...
#     time_resolution:      Forecasts initialized every 6 hours

That seems very similar, and hopefully for the in-dev GDAL xarray backend gdalxarray it’s an equivalent connection (I’m not going to validate any further here in this post).

What this illustrates is while the configuration set up can be confusing, these are the very same generic store-auth settings cast different ways. Is it desirable to consolidate these? Is one necessarily better or more capable? Icechunk-Python auth and addressing through typed constructor arguments, GDAL through the VSI path plus config.

What about not python?

I don’t have a strong opinion on what’s the right moves, I do have a strong opinion though that this not just a Python community. It was hard looking over the fence at xarray and Zarr and going ‘uhoh, this looks like I can’t ignore it’. The landscape and GDAL’s particular take on it is something I had the privilege to take a long deep dive on. I can’t ever remember the typed constructor incantation above - but I also web search EMPTY_DIR to find GDAL_DISABLE_READDIR_ON_OPEN which I just can’t recall though I reach for it often. This for me is about opening up to audiences that are not already in the tent, and maybe some who won’t ever use Python.

Should we have easier wrappers? Write the helper that sniffs an s3:// address and sets the right path-style / anonymous / region knobs? How do we make that cross-language? I love everything out in the open, protocol://path/to/data, a few env vars a few function arguments, but env vars and function calls are hard won expertise.

Is consolidating these worth doing? Is there any scope for that or prior art?

Other R Zarr readers

Beyond GDAL, there is growing native R support for Zarr:

  • pizzarr (CRAN) - pure R Zarr V2 reader; r-universe builds include an optional high-performance Rust backend via the zarrs crate
  • zarr (CRAN) - a native implementation in R with support for all required features of Zarr version 3.
  • zaro (r-universe) - Low-level Zarr interface (V2 and V3) built on Arrow filesystem and codec infrastructure

Work is ongoing on multidim API coverage in the R ecosystem, I particularly want tight api bindings to the GDAL API with gdalraster but that will be a while still in the making.

The GDAL multidim API (used by /vsiicechunk/) is distinct from the classic 2D raster driver - terra now auto-detects which to use for the multidim drivers which are VRT/HDF5/NetCDF/Zarr/GRIB. If terra gives unexpected results, check whether it’s related to recent changes in mdim support.

For some time stars via sf has support for accessing Zarr via the classic or multidimensional raster model of GDAL.

For Icechunk specifically see icechunk.io and see the GDAL icechunk driver docs at gdal.org icechunk and ZARR driver at gdal.org zarr.