Raster Statistics

RasterFrames has a number of extension methods and columnar functions for performing analysis on tiles.

Tile Statistics

Tile Dimensions

Get the nominal tile dimensions. Depending on the tiling there may be some tiles with different sizes on the edges.

scala> rf.select(rf.spatialKeyColumn, tileDimensions($"tile")).show(3)
+-----------+---------------+
|spatial_key|dimension(tile)|
+-----------+---------------+
|      [6,3]|      [128,128]|
|      [4,0]|      [128,128]|
|      [0,0]|      [128,128]|
+-----------+---------------+
only showing top 3 rows

Descriptive Statistics

NoData Counts

Count the numer of NoData and non-NoData cells in each tile.

scala> rf.select(rf.spatialKeyColumn, noDataCells($"tile"), dataCells($"tile")).show(3)
+-----------+-----------------+---------------+
|spatial_key|noDataCells(tile)|dataCells(tile)|
+-----------+-----------------+---------------+
|      [6,3]|            15688|            696|
|      [4,0]|                0|          16384|
|      [0,0]|                0|          16384|
+-----------+-----------------+---------------+
only showing top 3 rows

Tile Mean

Compute the mean value in each tile. Use tileMean for integral cell types, and tileMeanDouble for floating point cell types.

scala> rf.select(rf.spatialKeyColumn, tileMean($"tile")).show(3)
+-----------+------------------+
|spatial_key|    tileMean(tile)|
+-----------+------------------+
|      [6,3]|10757.254310344828|
|      [4,0]| 9883.589050292969|
|      [0,0]|10338.119995117188|
+-----------+------------------+
only showing top 3 rows

Tile Summary Statistics

Compute a suite of summary statistics for each tile. Use tileStats for integral cells types, and tileStatsDouble for floating point cell types.

scala> rf.withColumn("stats", tileStats($"tile")).select(rf.spatialKeyColumn, $"stats.*").show(3)
+-----------+---------+-----------+------+-------+------------------+------------------+
|spatial_key|dataCells|noDataCells|   min|    max|              mean|          variance|
+-----------+---------+-----------+------+-------+------------------+------------------+
|      [6,3]|      696|         -1|7604.0|16143.0|10757.254310344822| 3271125.902280271|
|      [4,0]|    16384|         -1|7678.0|16464.0| 9883.589050292961|2163148.3790329304|
|      [0,0]|    16384|         -1|7291.0|23077.0| 10338.11999511721|3386469.0957086035|
+-----------+---------+-----------+------+-------+------------------+------------------+
only showing top 3 rows

Histogram

The tileHistogram function computes a histogram over the data in each tile. See the GeoTrellis Histogram documentation for details on what’s available in the resulting data structure. Use this version for integral cell types, and tileHistorgramDouble for floating point cells types.

In this example we compute quantile breaks.

scala> rf.select(tileHistogram($"tile")).map(_.quantileBreaks(5)).show(5, false)
+---------------------------------------------------------------------------------------------------+
|value                                                                                              |
+---------------------------------------------------------------------------------------------------+
|[8809.728925619835, 9867.17899408284, 10610.464285714286, 11537.7625, 12449.983431952664]          |
|[8092.536291685227, 8799.830256846086, 9883.927927555094, 10663.851206181313, 11410.889115337006]  |
|[7873.758444506966, 8966.896173598834, 10637.314862591527, 11377.284237089707, 12150.871174122809] |
|[9191.866744203247, 10249.717974547506, 10929.598137035868, 11454.682468942548, 12075.199752189847]|
|[9153.271659984886, 9687.683229942197, 10026.3593930339, 10411.367118562586, 10952.631743056714]   |
+---------------------------------------------------------------------------------------------------+
only showing top 5 rows

Aggregate Statistics

The aggStats function computes the same summary statistics as tileStats, but aggregates them over the whole RasterFrame.

scala> rf.select(aggStats($"tile")).show()
+---------+-----------+------+-------+-----------------+------------------+
|dataCells|noDataCells|   min|    max|             mean|          variance|
+---------+-----------+------+-------+-----------------+------------------+
|   387000|      71752|7209.0|39217.0|10160.48549870801|3315238.5311127007|
+---------+-----------+------+-------+-----------------+------------------+

A more involved example: extract bin counts from a computed Histogram.

scala> rf.select(aggHistogram($"tile")).
     |   map(h => for(v <- h.labels) yield(v, h.itemCount(v))).
     |   select(explode($"value") as "counts").
     |   select("counts._1", "counts._2").
     |   toDF("value", "count").
     |   orderBy(desc("count")).
     |   show(10)
+------------------+-----+
|             value|count|
+------------------+-----+
| 7905.780878889613|59871|
|  9693.36822122893|37138|
|10731.770891323657|33770|
| 10076.43293835417|27512|
| 8365.393423741412|26915|
|11646.288154754428|23883|
| 11084.84999789323|23733|
| 9021.338606741572|22250|
|10385.199022093442|22088|
|11359.327558440293|20491|
+------------------+-----+
only showing top 10 rows