IPython/Jupyter Extensions

The pyrasterframes.rf_ipython module injects a number of visualization extensions into the IPython environment, enhancing visualization of DataFrames and Tiles.

By default, the last expression’s result in a IPython cell is passed to the IPython.display.display function. This function in turn looks for a DisplayFormatter associated with the type, which in turn converts the instance to a display-appropriate representation, based on MIME type. For example, each DisplayFormatter may plain/text version for the IPython shell, and a text/html version for a Jupyter Notebook.

This will be our setup for the following examples:

from pyrasterframes import *
from pyrasterframes.rasterfunctions import *
from pyrasterframes.utils import create_rf_spark_session
import pyrasterframes.rf_ipython
from IPython.display import display
import os.path
spark = create_rf_spark_session()
def scene(band):
    b = str(band).zfill(2) # converts int 2 to '02'
    return 'https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/' \
             'MCD43A4.A2019059.h11v08.006.2019072203257_B{}.TIF'.format(b)
rf = spark.read.raster(scene(2), tile_dimensions=(256, 256))

Tile Samples

We have some convenience methods to quickly visualize tiles (see discussion of the RasterFrame schema for orientation to the concept) when inspecting a subset of the data in a Notebook.

In an IPython or Jupyter interpreter, a Tile object will be displayed as an image with limited metadata.

sample_tile = rf.select(rf_tile('proj_raster').alias('tile')).first()['tile']
sample_tile # or `display(sample_tile)`

DataFrame Samples

Within an IPython or Jupyter interpreter, a Spark and Pandas DataFrames containing a column of tiles will be rendered as the samples discussed above. Simply import the rf_ipython submodule to enable enhanced HTML rendering of these DataFrame types.

rf # or `display(rf)`, or `rf.display()`

Showing only top 5 rows.

proj_raster_path proj_raster
https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF
https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF
https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF
https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF
https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF

Changing Number of Rows

By default the RasterFrame sample display renders 5 rows. Because the IPython.display.display function doesn’t pass parameters to the underlying rendering functions, we have to provide a different means of passing parameters to the rendering code. Pandas approach to this is to use global settings via set_option/get_option. We take a more functional approach and have the user invoke an explicit display method:

rf.display(num_rows=1, truncate=True)

Showing only top 1 rows.

proj_raster_path proj_raster
https://modis-pds.s3.amazonaws.com/MCD43...

Pandas

There is similar rendering support injected into the Pandas by the rf_ipython module, for Pandas Dataframes having Tiles in them:

# Limit copy of data from Spark to a few tiles.
pandas_df = rf.select(rf_tile('proj_raster'), rf_extent('proj_raster')).limit(4).toPandas()
pandas_df # or `display(pandas_df)`
rf_tile(proj_raster) rf_extent(proj_raster)
0 (-7072005.3050801195, 993342.4642358534, -6953397.249648972, 1111950.519667)
1 (-7546437.526804707, 163086.07621782666, -7427829.47137356, 281694.1316489733)
2 (-6834789.194217826, 281694.1316489733, -6716181.138786679, 400302.18708011997)
3