IPython/Jupyter Extensions
The pyrasterframes.rf_ipython
module injects a number of visualization extensions into the IPython environment, enhancing visualization of DataFrame
s and Tile
s.
By default, the last expression’s result in a IPython cell is passed to the IPython.display.display
function. This function in turn looks for a DisplayFormatter
associated with the type, which in turn converts the instance to a display-appropriate representation, based on MIME type. For example, each DisplayFormatter
may plain/text
version for the IPython shell, and a text/html
version for a Jupyter Notebook.
This will be our setup for the following examples:
from pyrasterframes import *
from pyrasterframes.rasterfunctions import *
from pyrasterframes.utils import create_rf_spark_session
import pyrasterframes.rf_ipython
from IPython.display import display
import os.path
spark = create_rf_spark_session()
def scene(band):
b = str(band).zfill(2) # converts int 2 to '02'
return 'https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/' \
'MCD43A4.A2019059.h11v08.006.2019072203257_B{}.TIF'.format(b)
rf = spark.read.raster(scene(2), tile_dimensions=(256, 256))
Tile Samples
We have some convenience methods to quickly visualize tiles (see discussion of the RasterFrame schema for orientation to the concept) when inspecting a subset of the data in a Notebook.
In an IPython or Jupyter interpreter, a Tile
object will be displayed as an image with limited metadata.
sample_tile = rf.select(rf_tile('proj_raster').alias('tile')).first()['tile']
sample_tile # or `display(sample_tile)`
DataFrame Samples
Within an IPython or Jupyter interpreter, a Spark and Pandas DataFrames containing a column of tiles will be rendered as the samples discussed above. Simply import the rf_ipython
submodule to enable enhanced HTML rendering of these DataFrame types.
rf # or `display(rf)`, or `rf.display()`
Showing only top 5 rows.
Changing Number of Rows
By default the RasterFrame sample display renders 5 rows. Because the IPython.display.display
function doesn’t pass parameters to the underlying rendering functions, we have to provide a different means of passing parameters to the rendering code. Pandas approach to this is to use global settings via set_option
/get_option
. We take a more functional approach and have the user invoke an explicit display
method:
rf.display(num_rows=1, truncate=True)
Showing only top 1 rows.
proj_raster_path | proj_raster |
---|---|
https://modis-pds.s3.amazonaws.com/MCD43... |
Pandas
There is similar rendering support injected into the Pandas by the rf_ipython
module, for Pandas Dataframes having Tiles in them:
# Limit copy of data from Spark to a few tiles.
pandas_df = rf.select(rf_tile('proj_raster'), rf_extent('proj_raster')).limit(4).toPandas()
pandas_df # or `display(pandas_df)`
rf_tile(proj_raster) | rf_extent(proj_raster) | |
---|---|---|
0 | (-7072005.3050801195, 993342.4642358534, -6953397.249648972, 1111950.519667) | |
1 | (-7546437.526804707, 163086.07621782666, -7427829.47137356, 281694.1316489733) | |
2 | (-6834789.194217826, 281694.1316489733, -6716181.138786679, 400302.18708011997) | |
3 |