Function Reference
RasterFrames provides a rich set of columnar function for processing geospatial raster data. In Spark SQL, the functions are already registered in the SQL engine; they are usually prefixed with rf_
. In Python, they are available in the pyrasterframes.rasterfunctions
module.
The convention in this document will be to define the function signature as below, with its return type, the function name, and named arguments with their types.
ReturnDataType function_name(InputDataType argument1, InputDataType argument2)
For the Scala documentation on these functions, see RasterFunctions
. The full Scala API documentation can be found here.
import pyrasterframes
from pyrasterframes.utils import create_rf_spark_session
from pyrasterframes.rasterfunctions import *
from IPython.display import display
import os.path
spark = create_rf_spark_session()
List of Available SQL and Python Functions
- Vector Operations
- Tile Metadata and Mutation
- Tile Creation
- Masking and NoData
- Local Map Algebra
- rf_local_add
- rf_local_subtract
- rf_local_multiply
- rf_local_divide
- rf_normalized_difference
- rf_local_less
- rf_local_less_equal
- rf_local_greater
- rf_local_greater_equal
- rf_local_equal
- rf_local_unequal
- rf_local_is_in
- rf_local_extract_bits
- rf_local_min
- rf_local_max
- rf_local_clamp
- rf_where
- rf_rescale
- rf_standardize
- rf_round
- rf_abs
- rf_exp
- rf_exp10
- rf_exp2
- rf_expm1
- rf_log
- rf_log10
- rf_log2
- rf_log1p
- rf_sqrt
- Tile Statistics
- Aggregate Tile Statistics
- Tile Local Aggregate Statistics
- Converting Tiles
To import RasterFrames functions into the environment, import from pyrasterframes.rasterfunctions
.
from pyrasterframes.rasterfunctions import *
Functions starting with rf_
, which are for raster, and st_
, which are for vector geometry, become available for use with DataFrames. You can view all of the available functions with the following.
[fn for fn in dir() if fn.startswith('rf_') or fn.startswith('st_')]
Vector Operations
Various LocationTech GeoMesa user-defined functions (UDFs) dealing with geomtery
type columns are provided in the SQL engine and within the pyrasterframes.rasterfunctions
Python module. These are documented in the LocationTech GeoMesa Spark SQL documentation. These functions are all prefixed with st_
.
RasterFrames provides some additional functions for vector geometry operations.
st_reproject
Geometry st_reproject(Geometry geom, String origin_crs, String destination_crs)
Reproject the vector geom
from origin_crs
to destination_crs
. Both _crs
arguments are either proj4 strings, EPSG codes or OGC WKT for coordinate reference systems.
st_extent
Struct[Double xmin, Double xmax, Double ymin, Double ymax] st_extent(Geometry geom)
Extracts the bounding box (extent/envelope) of the geometry.
See also GeoMesa st_envelope which returns a Geometry type.
st_geometry
Geometry st_geometry(Struct[Double xmin, Double xmax, Double ymin, Double ymax] extent)
Convert an extent to a Geometry. The extent likely comes from st_extent
or rf_extent
.
rf_xz2_index
Long rf_xz2_index(Geometry geom, CRS crs)
Long rf_xz2_index(Extent extent, CRS crs)
Long rf_xz2_index(ProjectedRasterTile proj_raster)
Constructs a XZ2 index in WGS84/EPSG:4326 from either a Geometry, Extent, ProjectedRasterTile and its CRS. This function is useful for range partitioning.
rf_z2_index
Long rf_z2_index(Geometry geom, CRS crs)
Long rf_z2_index(Extent extent, CRS crs)
Long rf_z2_index(ProjectedRasterTile proj_raster)
Constructs a Z2 index in WGS84/EPSG:4326 from either a Geometry, Extent, ProjectedRasterTile and its CRS. First the native extent is extracted or computed, and then center is used as the indexing location. This function is useful for range partitioning. See Reading Raster Data section for details on how to have an index automatically added when reading raster data.
Tile Metadata and Mutation
Functions to access and change the particulars of a tile
: its shape and the data type of its cells. See section on “NoData” handling for additional discussion of cell types.
rf_dimensions
Struct[Int, Int] rf_dimensions(Tile tile)
Get number of columns and rows in the tile
, as a Struct of cols
and rows
.
rf_cell_type
Struct[String] rf_cell_type(Tile tile)
Get the cell type of the tile
. The cell type can be changed with rf_convert_cell_type.
rf_tile
Tile rf_tile(ProjectedRasterTile proj_raster)
Get the fully realized (non-lazy) tile
from a ProjectedRasterTile
struct column.
rf_extent
Struct[Double xmin, Double xmax, Double ymin, Double ymax] rf_extent(ProjectedRasterTile proj_raster)
Struct[Double xmin, Double xmax, Double ymin, Double ymax] rf_extent(RasterSource proj_raster)
Fetches the extent (bounding box or envelope) of a ProjectedRasterTile
or RasterSource
type tile columns.
rf_crs
Struct rf_crs(ProjectedRasterTile proj_raster)
Struct rf_crs(RasterSource proj_raster)
Struct rf_crs(String crs_spec)
Fetch CRS structure representing the coordinate reference system of a ProjectedRasterTile
or RasterSource
type tile columns, or from a column of strings in the form supported by rf_mk_crs
.
rf_proj_raster
ProjectedRasterTile rf_proj_raster(Tile tile, Extent extent, CRS crs)
Construct a proj_raster
structure from individual Tile, Extent, and CRS columns.
rf_mk_crs
Struct rf_mk_crs(String crsText)
Construct a CRS structure from one of its string representations. Three forms are supported:
- EPSG code:
EPSG:<integer>
- Proj4 string:
+proj <proj4 parameters>
- WKT String with embedded EPSG code:
GEOGCS["<name>", <datum>, <prime meridian>, <angular unit> {,<twin axes>} {,<authority>}]
Example: SELECT rf_mk_crs('EPSG:4326')
rf_convert_cell_type
Tile rf_convert_cell_type(Tile tile_col, CellType cell_type)
Tile rf_convert_cell_type(Tile tile_col, String cell_type)
Convert tile_col
to a different cell type. In Python you can pass a CellType object to cell_type
.
rf_interpret_cell_type_as
Tile rf_interpret_cell_type_as(Tile tile_col, CellType cell_type)
Tile rf_interpret_cell_type_as(Tile tile_col, String cell_type)
Change the interpretation of the tile_col
’s cell values according to specified cell_type
. In Python you can pass a CellType object to cell_type
.
rf_resample
Tile rf_resample(Tile tile, Double factor, [String method])
Tile rf_resample(Tile tile, Int factor, [String method])
Tile rf_resample(Tile tile, Tile shape_tile, [String method])
In SQL, three parameters are required for rf_resample
.:
Tile rf_resample(Tile tile, Double factor, String method)
Tile rf_resample(Tile tile, Int factor, String method)
Tile rf_resample(Tile tile, Tile shape_tile, String method)
Tile rf_resample_nearest(Tile tile, Double factor)
Tile rf_resample_nearest(Tile tile, Int factor)
Tile rf_resample_nearest(Tile tile, Tile shape_tile)
Change the tile dimension by upsampling or downsampling. Passing a numeric factor
will scale the number of columns and rows in the tile: 1.0 is the same number of columns and row; less than one downsamples the tile; and greater than one upsamples the tile. Passing a tile as the second argument resamples such that the output has the same dimension (number of columns and rows) as shape_tile
.
There are two categories: point resampling methods and aggregating resampling methods. Resampling method to use can be specified by one of the following strings, possibly in a column. The point resampling methods are: "nearest_neighbor"
, "bilinear"
, "cubic_convolution"
, "cubic_spline"
, and "lanczos"
. The aggregating resampling methods are: "average"
, "mode"
, "median"
, "max"
, “min
”, or "sum"
.
Note the aggregating methods are intended for downsampling. For example a 0.25 factor and max
method returns the maximum value in a 4x4 neighborhood.
If tile
has an integer CellType
, the returned tile will be coerced to a floating point with the following methods: bilinear, cubic_convolution, cubic_spline, lanczos, average, and median.
Tile Creation
Functions to create a new Tile column, either from scratch or from existing data not yet in a tile
.
rf_make_zeros_tile
Tile rf_make_zeros_tile(Int tile_columns, Int tile_rows, [CellType cell_type])
Tile rf_make_zeros_tile(Int tile_columns, Int tile_rows, [String cell_type_name])
Create a tile
of shape tile_columns
by tile_rows
full of zeros, with the optional cell type; default is float64. See this discussion on cell types for info on the cell_type
argument. All arguments are literal values and not column expressions.
rf_make_ones_tile
Tile rf_make_ones_tile(Int tile_columns, Int tile_rows, [CellType cell_type])
Tile rf_make_ones_tile(Int tile_columns, Int tile_rows, [String cell_type_name])
Create a tile
of shape tile_columns
by tile_rows
full of ones, with the optional cell type; default is float64. See this discussion on cell types for info on the cell_type
argument. All arguments are literal values and not column expressions.
rf_make_constant_tile
Tile rf_make_constant_tile(Numeric constant, Int tile_columns, Int tile_rows, [CellType cell_type])
Tile rf_make_constant_tile(Numeric constant, Int tile_columns, Int tile_rows, [String cell_type_name])
Create a tile
of shape tile_columns
by tile_rows
full of constant
, with the optional cell type; default is float64. See this discussion on cell types for info on the cell_type
argument. All arguments are literal values and not column expressions.
rf_rasterize
Tile rf_rasterize(Geometry geom, Geometry tile_bounds, Int value, Int tile_columns, Int tile_rows)
Convert a vector Geometry geom
into a Tile representation. The value
will be “burned-in” to the returned tile
where the geom
intersects the tile_bounds
. Returned tile
will have shape tile_columns
by tile_rows
. Values outside the geom
will be assigned a NoData value. Returned tile
has cell type int32
, note that value
is of type Int.
Parameters tile_columns
and tile_rows
are literals, not column expressions. The others are column expressions.
rf_array_to_tile
Tile rf_array_to_tile(Array arrayCol, Int numCols, Int numRows)
Python only. Create a tile
from a Spark SQL Array, filling values in row-major order.
rf_assemble_tile
Tile rf_assemble_tile(Int colIndex, Int rowIndex, Numeric cellData, Int numCols, Int numRows, [CellType cell_type])
Tile rf_assemble_tile(Int colIndex, Int rowIndex, Numeric cellData, Int numCols, Int numRows, [String cell_type_name])
SQL: Tile rf_assemble_tile(Int colIndex, Int rowIndex, Numeric cellData, Int numCols, Int numRows)
Create tile
s of dimension numCols
by numRows
from a column of cell data with location indices. This function is the inverse of rf_explode_tiles
. Intended use is with a groupby
, producing one row with a new tile
per group. In Python, the numCols
, numRows
and cellType
arguments are literal values, others are column expressions. See this discussion on cell types for info on the optional cell_type
argument. The default is float64. SQL implementation does not accept a cell_type argument. It returns a float64 cell type tile
by default.
Masking and NoData
See the masking page for conceptual discussion of masking operations.
There are statistical functions of the count of data and NoData values per tile
and aggregate over a tile
column: rf_data_cells
, rf_no_data_cells
, rf_agg_data_cells
, and rf_agg_no_data_cells
.
Masking is a raster operation that sets specific cells to NoData based on the values in another raster.
rf_mask
Tile rf_mask(Tile tile, Tile mask, bool inverse)
Where the mask
contains NoData, replace values in the tile
with NoData.
Returned tile
cell type will be coerced to one supporting NoData if it does not already.
inverse
is a literal not a Column. If inverse
is true, return the tile
with NoData in locations where the mask
does not contain NoData. Equivalent to rf_inverse_mask
.
See also rf_rasterize
.
rf_mask_by_value
Tile rf_mask_by_value(Tile data_tile, Tile mask_tile, Int mask_value, bool inverse)
Generate a tile
with the values from data_tile
, with NoData in cells where the mask_tile
is equal to mask_value
.
inverse
is a literal not a Column. If inverse
is true, return the data_tile
with NoData in locations where the mask_tile
value is not equal to mask_value
. Equivalent to rf_inverse_mask_by_value
.
rf_mask_by_values
Tile rf_mask_by_values(Tile data_tile, Tile mask_tile, Array mask_values)
Tile rf_mask_by_values(Tile data_tile, Tile mask_tile, list mask_values)
Generate a tile
with the values from data_tile
, with NoData in cells where the mask_tile
is in the mask_values
Array or list. mask_values
can be a pyspark.sql.ArrayType
or a list
.
rf_mask_by_bit
Tile rf_mask_by_bits(Tile tile, Tile mask_tile, Int bit_position, Bool mask_value)
Applies a mask using bit values in the mask_tile
. Working from the right, the bit at bit_position
is extracted from cell values of the mask_tile
. In all locations where these are equal to the mask_value
, the returned tile is set to NoData; otherwise the original tile
cell value is returned.
This is a single-bit version of rf_mask_by_bits
.
rf_mask_by_bits
Tile rf_mask_by_bits(Tile tile, Tile mask_tile, Int start_bit, Int num_bits, Array mask_values)
Tile rf_mask_by_bits(Tile tile, Tile mask_tile, Int start_bit, Int num_bits, list mask_values)
Applies a mask from blacklisted bit values in the mask_tile
. Working from the right, the bits from start_bit
to start_bit + num_bits
are extracted from cell values of the mask_tile
. In all locations where these are in the mask_values
, the returned tile is set to NoData; otherwise the original tile
cell value is returned.
This function is not available in the SQL API. The below is equivalent:
SELECT rf_mask_by_values(
tile,
rf_local_extract_bits(mask_tile, start_bit, num_bits),
mask_values
),
rf_inverse_mask
Tile rf_inverse_mask(Tile tile, Tile mask)
Where the mask
does not contain NoData, replace values in tile
with NoData.
rf_inverse_mask_by_value
Tile rf_inverse_mask_by_value(Tile data_tile, Tile mask_tile, Int mask_value)
Generate a tile
with the values from data_tile
, with NoData in cells where the mask_tile
is not equal to mask_value
. In other words, only keep data_tile
cells in locations where the mask_tile
is equal to mask_value
.
rf_is_no_data_tile
Boolean rf_is_no_data_tile(Tile)
Returns true if tile
contains only NoData. By definition returns false if cell type does not support NoData. To count NoData cells or data cells, see rf_no_data_cells
, rf_data_cells
, rf_agg_no_data_cells
, rf_agg_data_cells
, rf_agg_local_no_data_cells
, and rf_agg_local_data_cells
. This function is distinguished from rf_for_all
, which tests that values are not NoData and nonzero.
rf_local_no_data
Tile rf_local_no_data(Tile tile)
Returns a tile with values of 1 in each cell where the input tile contains NoData. Otherwise values are 0.
rf_local_data
Tile rf_local_no_data(Tile tile)
Returns a tile with values of 0 in each cell where the input tile contains NoData. Otherwise values are 1.
rf_local_data
rf_with_no_data
Tile rf_with_no_data(Tile tile, Double no_data_value)
Python only. Return a tile
column marking as NoData all cells equal to no_data_value
.
The no_data_value
argument is a literal Double, not a Column expression.
If input tile
had a NoData value already, the behaviour depends on if its cell type is floating point or not. For floating point cell type tile
, NoData values on the input tile
remain NoData values on the output. For integral cell type tile
s, the previous NoData values become literal values.
Local Map Algebra
Local map algebra raster operations are element-wise operations on a single tile (unary), between a tile
and a scalar, between two tile
s, or across many tile
s.
When these operations encounter a NoData value in either operand, the cell in the resulting tile
will have a NoData.
The binary local map algebra functions have similar variations in the Python API depending on the left hand side type:
rf_local_op
: appliesop
to two columns; the right hand side can be atile
or a numeric column.rf_local_op_double
: appliesop
to atile
and a literal scalar, coercing thetile
to a floating point typerf_local_op_int
: appliesop
to atile
and a literal scalar, without coercing thetile
to a floating point type
The SQL API does not require the rf_local_op_double
or rf_local_op_int
forms (just rf_local_op
).
Local map algebra operations for more than two tile
s are implemented to work across rows in the DataFrame. As such, they are aggregate functions.
rf_local_add
Tile rf_local_add(Tile tile1, Tile rhs)
Tile rf_local_add_int(Tile tile1, Int rhs)
Tile rf_local_add_double(Tile tile1, Double rhs)
Returns a tile
column containing the element-wise sum of tile1
and rhs
.
rf_local_subtract
Tile rf_local_subtract(Tile tile1, Tile rhs)
Tile rf_local_subtract_int(Tile tile1, Int rhs)
Tile rf_local_subtract_double(Tile tile1, Double rhs)
Returns a tile
column containing the element-wise difference of tile1
and rhs
.
rf_local_multiply
Tile rf_local_multiply(Tile tile1, Tile rhs)
Tile rf_local_multiply_int(Tile tile1, Int rhs)
Tile rf_local_multiply_double(Tile tile1, Double rhs)
Returns a tile
column containing the element-wise product of tile1
and rhs
. This is not the matrix multiplication of tile1
and rhs
.
rf_local_divide
Tile rf_local_divide(Tile tile1, Tile rhs)
Tile rf_local_divide_int(Tile tile1, Int rhs)
Tile rf_local_divide_double(Tile tile1, Double rhs)
Returns a tile
column containing the element-wise quotient of tile1
and rhs
.
rf_normalized_difference
Tile rf_normalized_difference(Tile tile1, Tile tile2)
Compute the normalized difference of the the two tile
s: (tile1 - tile2) / (tile1 + tile2)
. Result is always floating point cell type. This function has no scalar variant.
rf_local_less
Tile rf_local_less(Tile tile1, Tile rhs)
Tile rf_local_less_int(Tile tile1, Int rhs)
Tile rf_local_less_double(Tile tile1, Double rhs)
Returns a tile
column containing the element-wise evaluation of tile1
is less than rhs
.
rf_local_less_equal
Tile rf_local_less_equal(Tile tile1, Tile rhs)
Tile rf_local_less_equal_int(Tile tile1, Int rhs)
Tile rf_local_less_equal_double(Tile tile1, Double rhs)
Returns a tile
column containing the element-wise evaluation of tile1
is less than or equal to rhs
.
rf_local_greater
Tile rf_local_greater(Tile tile1, Tile rhs)
Tile rf_local_greater_int(Tile tile1, Int rhs)
Tile rf_local_greater_double(Tile tile1, Double rhs)
Returns a tile
column containing the element-wise evaluation of tile1
is greater than rhs
.
rf_local_greater_equal
Tile rf_local_greater_equal(Tile tile1, Tile rhs)
Tile rf_local_greater_equal_int(Tile tile1, Int rhs)
Tile rf_local_greater_equal_double(Tile tile1, Double rhs)
Returns a tile
column containing the element-wise evaluation of tile1
is greater than or equal to rhs
.
rf_local_equal
Tile rf_local_equal(Tile tile1, Tile rhs)
Tile rf_local_equal_int(Tile tile1, Int rhs)
Tile rf_local_equal_double(Tile tile1, Double rhs)
Returns a tile
column containing the element-wise equality of tile1
and rhs
.
rf_local_unequal
Tile rf_local_unequal(Tile tile1, Tile rhs)
Tile rf_local_unequal_int(Tile tile1, Int rhs)
Tile rf_local_unequal_double(Tile tile1, Double rhs)
Returns a tile
column containing the element-wise inequality of tile1
and rhs
.
rf_local_is_in
Tile rf_local_is_in(Tile tile, Array array)
Tile rf_local_is_in(Tile tile, list l)
Returns a tile
column with cell values of 1 where the tile
cell value is in the provided array or list. The array
is a Spark SQL Array. A python list
of numeric values can also be passed.
rf_local_extract_bits
Tile rf_local_extract_bits(Tile tile, Int start_bit, Int num_bits)
Tile rf_local_extract_bits(Tile tile, Int start_bit)
Extract value from specified bits of the cells’ underlying binary data. Working from the right, the bits from start_bit
to start_bit + num_bits
are extracted from cell values of the tile
. The start_bit
is zero indexed. If num_bits
is not provided, a single bit is extracted.
A common use case for this function is covered by rf_mask_by_bits
.
rf_local_min
Tile rf_local_min(Tile tile, Tile max)
Tile rf_local_min(Tile tile, Numeric max)
Performs cell-wise minimum two tiles or a tile and a scalar.
rf_local_max
Tile rf_local_max(Tile tile, Tile max)
Tile rf_local_max(Tile tile, Numeric max)
Performs cell-wise maximum two tiles or a tile and a scalar.
rf_local_clamp
Tile rf_local_clamp(Tile tile, Tile min, Tile max)
Tile rf_local_clamp(Tile tile, Numeric min, Tile max)
Tile rf_local_clamp(Tile tile, Tile min, Numeric max)
Tile rf_local_clamp(Tile tile, Numeric min, Numeric max)
Return the tile with its values limited to a range defined by min and max, inclusive.
rf_where
Tile rf_where(Tile condition, Tile x, Tile y)
Return a tile with cell values chosen from x
or y
depending on condition
. Operates cell-wise in a similar fashion to Spark SQL when
and otherwise
.
rf_rescale
Tile rf_rescale(Tile tile)
Tile rf_rescale(Tile tile, Double min, Double max)
Rescale cell values such that the minimum is zero and the maximum is one. Other values will be linearly interpolated into the range. If specified, the min
parameter will become the zero value and the max
parameter will become 1. See rf_agg_stats
. Values outside the range will be set to 0 or 1. If min
and max
are not specified, the tile-wise minimum and maximum are used; this can result in inconsistent values across rows in a tile column.
rf_standardize
rf_standardize(Tile tile)
rf_standardize(Tile tile, Double mean, Double stddev)
Standardize cell values such that the mean is zero and the standard deviation is one. If specified, the mean
and stddev
are applied to all tiles in the column. See rf_agg_stats
. If not specified, each tile will be standardized according to the statistics of its cell values; this can result in inconsistent values across rows in a tile column.
rf_round
Tile rf_round(Tile tile)
Round cell values to the nearest integer without changing the cell type.
rf_abs
Tile rf_abs(Tile tile)
Compute the absolute value of cell value.
rf_exp
Tile rf_exp(Tile tile)
Performs cell-wise exponential.
rf_exp10
Tile rf_exp10(Tile tile)
Compute 10 to the power of cell values.
rf_exp2
Tile rf_exp2(Tile tile)
Compute 2 to the power of cell values.
rf_expm1
Tile rf_expm1(Tile tile)
Performs cell-wise exponential, then subtract one. Inverse of log1p
.
rf_log
Tile rf_log(Tile tile)
Performs cell-wise natural logarithm.
rf_log10
Tile rf_log10(Tile tile)
Performs cell-wise logarithm with base 10.
rf_log2
Tile rf_log2(Tile tile)
Performs cell-wise logarithm with base 2.
rf_log1p
Tile rf_log1p(Tile tile)
Performs natural logarithm of cell values plus one. Inverse of rf_expm1
.
rf_sqrt
Tile rf_sqrt(Tile tile)
Perform cell-wise square root.
Tile Statistics
The following functions compute a statistical summary per row of a tile
column. The statistics are computed across the cells of a single tile
, within each DataFrame Row.
rf_tile_sum
Double rf_tile_sum(Tile tile)
Computes the sum of cells in each row of column tile
, ignoring NoData values.
rf_tile_mean
Double rf_tile_mean(Tile tile)
Computes the mean of cells in each row of column tile
, ignoring NoData values.
rf_tile_min
Double rf_tile_min(Tile tile)
Computes the min of cells in each row of column tile
, ignoring NoData values.
rf_tile_max
Double rf_tile_max(Tile tile)
Computes the max of cells in each row of column tile
, ignoring NoData values.
rf_no_data_cells
Long rf_no_data_cells(Tile tile)
Return the count of NoData cells in the tile
.
rf_data_cells
Long rf_data_cells(Tile tile)
Return the count of data cells in the tile
.
rf_exists
Boolean rf_exists(Tile tile)
Returns true if any cells in the tile are true (non-zero and not NoData).
rf_for_all
Boolean rf_for_all(Tile tile)
Returns true if all cells in the tile are true (non-zero and not NoData). See also `rf_is_no_data_tile, which tests that all cells are NoData.
rf_tile_stats
Struct[Long, Long, Double, Double, Double, Double] rf_tile_stats(Tile tile)
Computes the following statistics of cells in each row of column tile
: data cell count, NoData cell count, minimum, maximum, mean, and variance. The minimum, maximum, mean, and variance are computed ignoring NoData values. Resulting column has the below schema.
spark.sql("SELECT rf_tile_stats(rf_make_ones_tile(5, 5, 'float32')) as tile_stats").printSchema()
rf_tile_histogram
Struct[Array[Struct[Double, Long]]] rf_tile_histogram(Tile tile)
Computes a count of cell values within each row of tile
. The bins
array is of tuples of histogram values and counts. Typically values are plotted on the x-axis and counts on the y-axis. Resulting column has the below schema. Related is the rf_agg_approx_histogram
which computes the statistics across all rows in a group.
spark.sql("SELECT rf_tile_histogram(rf_make_ones_tile(5, 5, 'float32')) as tile_histogram").printSchema()
Aggregate Tile Statistics
These functions compute statistical summaries over all of the cell values and across all the rows in the DataFrame or group.
rf_agg_mean
Double rf_agg_mean(Tile tile)
SQL: rf_agg_stats
(tile).mean
Aggregates over the tile
and return the mean of cell values, ignoring NoData. Equivalent to rf_agg_stats
.mean
.
rf_agg_data_cells
Long rf_agg_data_cells(Tile tile)
SQL: rf_agg_stats
(tile).data_cells
Aggregates over the tile
and return the count of data cells. Equivalent to rf_agg_stats
.dataCells
.
rf_agg_no_data_cells
Long rf_agg_no_data_cells(Tile tile)
SQL: rf_agg_stats
(tile).no_data_cells
Aggregates over the tile
and return the count of NoData cells. Equivalent to rf_agg_stats
.noDataCells
. C.F. rf_no_data_cells
a row-wise count of no data cells.
rf_agg_stats
Struct[Long, Long, Double, Double, Double, Double] rf_agg_stats(Tile tile)
Aggregates over the tile
and returns statistical summaries of cell values: number of data cells, number of NoData cells, minimum, maximum, mean, and variance. The minimum, maximum, mean, and variance ignore the presence of NoData.
rf_agg_approx_histogram
Struct[Array[Struct[Double, Long]]] rf_agg_approx_histogram(Tile tile)
Aggregates over all of the rows in DataFrame of tile
and returns a count of each cell value to create a histogram with values are plotted on the x-axis and counts on the y-axis. Related is the rf_tile_histogram
function which operates on a single row at a time.
rf_agg_approx_quantiles
Array[Double] rf_agg_approx_quantiles(Tile tile, List[float] probabilities, float relative_error)
Not supported in SQL.
Calculates the approximate quantiles of a tile column of a DataFrame. probabilities
is a list of float values at which to compute the quantiles. These must belong to [0, 1]. For example 0 is the minimum, 0.5 is the median, 1 is the maximum. Returns an array of values approximately at the specified probabilities
.
rf_agg_extent
Extent rf_agg_extent(Extent extent)
Compute the naive aggregate extent over a column. Assumes CRS homogeneity. With mixed CRS in the column, or if you are unsure, use rf_agg_reprojected_extent
.
rf_agg_reprojected_extent
Extent rf_agg_reprojected_extent(Extent extent, CRS source_crs, String dest_crs)
Compute the aggregate extent over the extent
and source_crs
columns. The dest_crs
is given as a string. Each row’s extent will be reprojected to the dest_crs
before aggregating.
Tile Local Aggregate Statistics
Local statistics compute the element-wise statistics across a DataFrame or group of tile
s, resulting in a tile
that has the same dimension.
When these functions encounter NoData in a cell location, it will be ignored.
rf_agg_local_max
Tile rf_agg_local_max(Tile tile)
Compute the cell-local maximum operation over tile
s in a column.
rf_agg_local_min
Tile rf_agg_local_min(Tile tile)
Compute the cell-local minimum operation over tile
s in a column.
rf_agg_local_mean
Tile rf_agg_local_mean(Tile tile)
Compute the cell-local mean operation over tile
s in a column.
rf_agg_local_data_cells
Tile rf_agg_local_data_cells(Tile tile)
Compute the cell-local count of data cells over tile
s in a column. Returned tile
has a cell type of int32
.
rf_agg_local_no_data_cells
Tile rf_agg_local_no_data_cells(Tile tile)
Compute the cell-local count of NoData cells over tile
s in a column. Returned tile
has a cell type of int32
.
rf_agg_local_stats
Struct[Tile, Tile, Tile, Tile, Tile] rf_agg_local_stats(Tile tile)
Compute cell-local aggregate count, minimum, maximum, mean, and variance for a column of tile
s. Returns a struct of five tile
s.
Converting Tiles
RasterFrames provides several ways to convert a tile
into other data structures. See also functions for creating tiles.
rf_explode_tiles
Int, Int, Numeric* rf_explode_tiles(Tile* tile)
Create a row for each cell in tile
columns. Many tile
columns can be passed in, and the returned DataFrame will have one numeric column per input. There will also be columns for column_index
and row_index
. Inverse of rf_assemble_tile
. When using this function, be sure to have a unique identifier for rows in order to successfully invert the operation.
rf_explode_tiles_sample
Int, Int, Numeric* rf_explode_tiles_sample(Double sample_frac, Long seed, Tile* tile)
Python only. As with rf_explode_tiles
, but taking a randomly sampled subset of cells. Equivalent to the rf_explode-tiles
, but allows a random subset of the data to be selected. Parameter sample_frac
should be between 0.0 and 1.0.
rf_tile_to_array_int
Array rf_tile_to_array_int(Tile tile)
Convert Tile column to Spark SQL Array, in row-major order. Float cell types will be coerced to integral type by flooring.
rf_tile_to_array_double
Array rf_tile_to_arry_double(Tile tile)
Convert tile column to Spark Array, in row-major order. Integral cell types will be coerced to floats.
rf_render_ascii
String rf_render_ascii(Tile tile)
Pretty print the tile values as plain text.
rf_render_matrix
String rf_render_matrix(Tile tile)
Render Tile cell values as a string of numeric values, for debugging purposes.
rf_render_png
Array rf_render_png(Tile red, Tile green, Tile blue)
Converts three tile columns to a three-channel PNG-encoded image bytearray
. First evaluates rf_rgb_composite
on the given tile columns, and then encodes the result. For more about rendering these in a Jupyter or IPython environment, see @Writing Raster Data.
rf_render_color_ramp_png
Array rf_render_png(Tile tile, String color_ramp_name)
Converts given tile into a PNG image, using a color ramp of the given name to convert cells into pixels. color_ramp_name
can be one of the following:
- “BlueToOrange”
- “LightYellowToOrange”
- “BlueToRed”
- “GreenToRedOrange”
- “LightToDarkSunset”
- “LightToDarkGreen”
- “HeatmapYellowToRed”
- “HeatmapBlueToYellowToRedSpectrum”
- “HeatmapDarkRedToYellowWhite”
- “HeatmapLightPurpleToDarkPurpleToWhite”
- “ClassificationBoldLandUse”
- “ClassificationMutedTerrain”
- “Magma”
- “Inferno”
- “Plasma”
- “Viridis”
- “Greyscale2”
- “Greyscale8”
- “Greyscale32”
- “Greyscale64”
- “Greyscale128”
- “Greyscale256”
Further descriptions of these color ramps can be found in the Geotrellis Documentation. For more about rendering these in a Jupyter or IPython environment, see @Writing Raster Data.
rf_agg_overview_raster
Tile rf_agg_overview_raster(Tile proj_raster_col, int cols, int rows, Extent aoi)
Tile rf_agg_overview_raster(Tile tile_col, int cols, int rows, Extent aoi, Extent tile_extent_col, CRS tile_crs_col)
Construct an overview tile of size cols
by rows
. Data is filtered to the specified aoi
which is given in web mercator. Uses bi-linear sampling method. The tile_extent_col
and tile_crs_col
arguments are optional if the first argument has its Extent and CRS embedded.
rf_rgb_composite
Tile rf_rgb_composite(Tile red, Tile green, Tile blue)
Merges three bands into a single byte-packed RGB composite. It first scales each cell to fit into an unsigned byte, in the range 0-255, and then merges all three channels to fit into a 32-bit unsigned integer. This is useful when you want an RGB tile to render or to process with other color imagery tools.