Writing Raster Data

RasterFrames is oriented toward large scale analyses of spatial data. The primary output of these analyses could be a statistical summary, a machine learning model, or some other result that is generally much smaller than the input dataset.

However, there are times in any analysis where writing a representative sample of the work in progress provides valuable feedback on the current state of the process and results.

Tile Samples

We have some convenience methods to quickly visualize tiles (see discussion of the RasterFrame schema for orientation to the concept) when inspecting a subset of the data in a Notebook.

In an IPython or Jupyter interpreter, a Tile object will be displayed as an image with limited metadata.

import pyrasterframes.rf_ipython

def scene(band):
    b = str(band).zfill(2) # converts int 2 to '02'
    return 'https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/' \
spark_df = spark.read.raster(scene(2), tile_dimensions=(256, 256))
tile = spark_df.select(rf_tile('proj_raster').alias('tile')).first()['tile']

DataFrame Samples

Within an IPython or Jupyter interpreter, a Spark and Pandas DataFrames containing a column of tiles will be rendered as the samples discussed above. Simply import the rf_ipython submodule to enable enhanced HTML rendering of these DataFrame types.

samples = spark_df \ .select( rf_extent('proj_raster').alias('extent'), rf_tile('proj_raster').alias('tile'), )\ .select('tile', 'extent.*') samples

Showing only top 5 rows.

tile xmin ymin xmax ymax
-7783653.637667 993342.4642358534 -7665045.582235853 1111950.519667
-7665045.582235853 993342.4642358534 -7546437.526804706 1111950.519667
-7546437.526804707 993342.4642358534 -7427829.47137356 1111950.519667
-7427829.47137356 993342.4642358534 -7309221.415942413 1111950.519667
-7309221.415942414 993342.4642358534 -7190613.360511267 1111950.519667

Rendering Samples with Color

By default the IPython visualizations use the Viridis color map for each single channel tile. There are other options for reasoning about how color should be applied in the results.

Color Composites

Rendering three different bands of imagery together is called a color composite. The bands selected are mapped to the red, green, and blue channels of the resulting display. If the bands chosen are red, green, and blue, the composite is called a true-color composite. Otherwise it is a false-color composite.

Using the rf_rgb_composite function, we will compute a three band PNG image as a bytearray. The resulting bytearray will be displayed as an image in either a Spark or pandas DataFrame display if rf_ipython has been imported.

# Select red, green, and blue, respectively
composite_df = spark.read.raster([[scene(1), scene(4), scene(3)]],
                                 tile_dimensions=(256, 256))
composite_df = composite_df.withColumn('png',
                    rf_render_png('proj_raster_0', 'proj_raster_1', 'proj_raster_2'))

Showing only top 5 rows.