# Raster Statistics

RasterFrames has a number of extension methods and columnar functions for performing analysis on tiles.

## Tile Statistics

### Tile Dimensions

Get the nominal tile dimensions. Depending on the tiling there may be some tiles with different sizes on the edges.

``````scala> rf.select(rf.spatialKeyColumn, tileDimensions(\$"tile")).show(3)
+-----------+---------------+
|spatial_key|dimension(tile)|
+-----------+---------------+
|      [6,3]|      [128,128]|
|      [4,0]|      [128,128]|
|      [0,0]|      [128,128]|
+-----------+---------------+
only showing top 3 rows

``````

### Descriptive Statistics

#### NoData Counts

Count the numer of `NoData` and non-`NoData` cells in each tile.

``````scala> rf.select(rf.spatialKeyColumn, noDataCells(\$"tile"), dataCells(\$"tile")).show(3)
+-----------+-----------------+---------------+
|spatial_key|noDataCells(tile)|dataCells(tile)|
+-----------+-----------------+---------------+
|      [6,3]|            15688|            696|
|      [4,0]|                0|          16384|
|      [0,0]|                0|          16384|
+-----------+-----------------+---------------+
only showing top 3 rows

``````

#### Tile Mean

Compute the mean value in each tile. Use `tileMean` for integral cell types, and `tileMeanDouble` for floating point cell types.

``````scala> rf.select(rf.spatialKeyColumn, tileMean(\$"tile")).show(3)
+-----------+------------------+
|spatial_key|    tileMean(tile)|
+-----------+------------------+
|      [6,3]|10757.254310344828|
|      [4,0]| 9883.589050292969|
|      [0,0]|10338.119995117188|
+-----------+------------------+
only showing top 3 rows

``````

#### Tile Summary Statistics

Compute a suite of summary statistics for each tile. Use `tileStats` for integral cells types, and `tileStatsDouble` for floating point cell types.

``````scala> rf.withColumn("stats", tileStats(\$"tile")).select(rf.spatialKeyColumn, \$"stats.*").show(3)
+-----------+---------+-----------+------+-------+------------------+------------------+
|spatial_key|dataCells|noDataCells|   min|    max|              mean|          variance|
+-----------+---------+-----------+------+-------+------------------+------------------+
|      [6,3]|      696|         -1|7604.0|16143.0|10757.254310344822| 3271125.902280271|
|      [4,0]|    16384|         -1|7678.0|16464.0| 9883.589050292961|2163148.3790329304|
|      [0,0]|    16384|         -1|7291.0|23077.0| 10338.11999511721|3386469.0957086035|
+-----------+---------+-----------+------+-------+------------------+------------------+
only showing top 3 rows

``````

### Histogram

The `tileHistogram` function computes a histogram over the data in each tile. See the GeoTrellis `Histogram` documentation for details on what’s available in the resulting data structure. Use this version for integral cell types, and `tileHistorgramDouble` for floating point cells types.

In this example we compute quantile breaks.

``````scala> rf.select(tileHistogram(\$"tile")).map(_.quantileBreaks(5)).show(5, false)
+---------------------------------------------------------------------------------------------------+
|value                                                                                              |
+---------------------------------------------------------------------------------------------------+
|[8809.728925619835, 9867.17899408284, 10610.464285714286, 11537.7625, 12449.983431952664]          |
|[8092.536291685227, 8799.830256846086, 9883.927927555094, 10663.851206181313, 11410.889115337006]  |
|[7873.758444506966, 8966.896173598834, 10637.314862591527, 11377.284237089707, 12150.871174122809] |
|[9191.866744203247, 10249.717974547506, 10929.598137035868, 11454.682468942548, 12075.199752189847]|
|[9153.271659984886, 9687.683229942197, 10026.3593930339, 10411.367118562586, 10952.631743056714]   |
+---------------------------------------------------------------------------------------------------+
only showing top 5 rows

``````

## Aggregate Statistics

The `aggStats` function computes the same summary statistics as `tileStats`, but aggregates them over the whole RasterFrame.

``````scala> rf.select(aggStats(\$"tile")).show()
+---------+-----------+------+-------+-----------------+------------------+
|dataCells|noDataCells|   min|    max|             mean|          variance|
+---------+-----------+------+-------+-----------------+------------------+
|   387000|      71752|7209.0|39217.0|10160.48549870801|3315238.5311127007|
+---------+-----------+------+-------+-----------------+------------------+

``````

A more involved example: extract bin counts from a computed `Histogram`.

``````scala> rf.select(aggHistogram(\$"tile")).
|   map(h => for(v <- h.labels) yield(v, h.itemCount(v))).
|   select(explode(\$"value") as "counts").
|   select("counts._1", "counts._2").
|   toDF("value", "count").
|   orderBy(desc("count")).
|   show(10)
+------------------+-----+
|             value|count|
+------------------+-----+
| 7905.780878889613|59871|
|  9693.36822122893|37138|
|10731.770891323657|33770|
| 10076.43293835417|27512|
| 8365.393423741412|26915|
|11646.288154754428|23883|
| 11084.84999789323|23733|
| 9021.338606741572|22250|
|10385.199022093442|22088|
|11359.327558440293|20491|
+------------------+-----+
only showing top 10 rows

``````