Raster Join
Description
A common operation for raster data is reprojecting or warping the data to a different CRS with a specific transform. In many use cases, the particulars of the warp operation depend on another set of raster data. Furthermore, the warp is done to put both sets of raster data to a common set of grid to enable manipulation of the datasets together.
In RasterFrames, you can perform a Raster Join on two DataFrames containing raster data. The operation will perform a spatial join based on the CRS and extent data in each DataFrame. By default it is a left join and uses an intersection operator. For each candidate row, all tile columns on the right hand side are warped to match the left hand side’s CRS, extent, and dimensions. Warping relies on GeoTrellis library code. You can specify the resampling method to be applied as one of: nearest_neighbor, bilinear, cubic_convolution, cubic_spline, lanczos, average, mode, median, max, min, or sum. The operation is also an aggregate, with multiple intersecting right-hand side tiles merge
d into the result. There is no guarantee about the ordering of tiles used to select cell values in the case of overlapping tiles. When using the raster
DataSource you will automatically get the CRS and extent information needed to do this operation.
Example Code
Because the raster join is a distributed spatial join, indexing of both DataFrames using the spatial index is crucial for performance.
# Southern Mozambique December 29, 2016
modis = spark.read.raster('s3://astraea-opendata/MCD43A4.006/21/11/2016297/MCD43A4.A2016297.h21v11.006.2016306075821_B01.TIF',
spatial_index_partitions=True) \
.withColumnRenamed('proj_raster', 'modis')
landsat8 = spark.read.raster('https://landsat-pds.s3.us-west-2.amazonaws.com/c1/L8/167/077/LC08_L1TP_167077_20161015_20170319_01_T1/LC08_L1TP_167077_20161015_20170319_01_T1_B4.TIF',
spatial_index_partitions=True) \
.withColumnRenamed('proj_raster', 'landsat')
rj = landsat8.raster_join(modis, resampling_method="cubic_convolution")
# Show some non-empty tiles
rj.select('landsat', 'modis', 'crs', 'extent') \
.filter(rf_data_cells('modis') > 0) \
.filter(rf_tile_max('landsat') > 0)
Showing only top 5 rows.
landsat | modis | crs | extent |
---|---|---|---|
[+proj=utm +zone=36 +datum=WGS84 +units=m +no_defs ] | [525945.0, -2813925.0, 533625.0, -2806245.0] | ||
[+proj=utm +zone=36 +datum=WGS84 +units=m +no_defs ] | [495225.0, -2652645.0, 502905.0, -2644965.0] | ||
[+proj=utm +zone=36 +datum=WGS84 +units=m +no_defs ] | [641145.0, -2675685.0, 648825.0, -2668005.0] | ||
[+proj=utm +zone=36 +datum=WGS84 +units=m +no_defs ] | [525945.0, -2798565.0, 533625.0, -2790885.0] | ||
[+proj=utm +zone=36 +datum=WGS84 +units=m +no_defs ] | [541305.0, -2798565.0, 548985.0, -2790885.0] |
Additional Options
The following optional arguments are allowed:
left_extent
- the column on the left-hand DataFrame giving the extent of the tile columnsleft_crs
- the column on the left-hand DataFrame giving the CRS of the tile columnsright_extent
- the column on the right-hand DataFrame giving the extent of the tile columnsright_crs
- the column on the right-hand DataFrame giving the CRS of the tile columnsjoin_exprs
- a single column expression as would be used in theon
parameter ofjoin
resampling_method
- resampling algorithm to use in reprojection of right-hand tile column
Note that the join_exprs
will override the join behavior described above. By default the expression is equivalent to:
st_intersects(
st_geometry(left[left_extent]),
st_reproject(st_geometry(right[right_extent]), right[right_crs], left[left_crs])
)
Resampling method to use can be specified by passing one of the following strings into resampling_method
parameter. The point resampling methods are: "nearest_neighbor"
, "bilinear"
, "cubic_convolution"
, "cubic_spline"
, and "lanczos"
. The aggregating resampling methods are: "average"
, "mode"
, "median"
, "max"
, “min
”, or "sum"
. Note the aggregating methods are intended for downsampling. For example a 0.25 factor and max
method returns the maximum value in a 4x4 neighborhood.