Getting Started
If you are new to Earth-observing imagery, you might consider looking at the Concepts section first.
RasterFrames® is a geospatial raster processing library for Python, Scala and SQL, available through several mechanisms.
The simplest way to get started with RasterFrames is via the Docker image, or from the Python shell. To get started with the Python shell you will need:
- Python installed. Version 3.6 or greater is recommended.
pip
installed. If you are using Python 3,pip
may already be installed.- Java JDK 8 installed on your system and
java
on your systemPATH
orJAVA_HOME
pointing to a Java installation.
pip install pyrasterframes
$ python3 -m pip install pyrasterframes
Then in a python interpreter of your choice, you can get a pyspark
SparkSession
using the local[*]
master.
import pyrasterframes
from pyrasterframes.utils import create_rf_spark_session
spark = create_rf_spark_session()
Then, you can read a raster and work with it in a Spark DataFrame.
from pyrasterframes.rasterfunctions import *
from pyspark.sql.functions import lit
# Read a MODIS surface reflectance granule
df = spark.read.raster('https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF')
# Add 3 element-wise, show some rows of the DataFrame
sample = df.withColumn('added', rf_local_add(df.proj_raster, lit(3))) \
.select(rf_crs('added'), rf_extent('added'), rf_tile('added'))
sample
Showing only top 5 rows.
rf_crs(added) | rf_extent(added) | rf_tile(added) |
---|---|---|
[+proj=sinu +lon_0=0.0 +x_0=0.0 +y_0=0.0 +a=6371007.181 +b=6371007.181 +units=m ] | [-7072005.3050801195, 993342.4642358534, -6953397.249648972, 1111950.519667] | |
[+proj=sinu +lon_0=0.0 +x_0=0.0 +y_0=0.0 +a=6371007.181 +b=6371007.181 +units=m ] | [-7546437.526804707, 163086.07621782666, -7427829.47137356, 281694.1316489733] | |
[+proj=sinu +lon_0=0.0 +x_0=0.0 +y_0=0.0 +a=6371007.181 +b=6371007.181 +units=m ] | [-6834789.194217826, 281694.1316489733, -6716181.138786679, 400302.18708011997] | |
[+proj=sinu +lon_0=0.0 +x_0=0.0 +y_0=0.0 +a=6371007.181 +b=6371007.181 +units=m ] | [-7427829.47137356, 163086.07621782666, -7309221.415942413, 281694.1316489733] | |
[+proj=sinu +lon_0=0.0 +x_0=0.0 +y_0=0.0 +a=6371007.181 +b=6371007.181 +units=m ] | [-6953397.249648973, 874734.4088047068, -6834789.194217826, 993342.4642358534] |
This example is extended in the getting started Jupyter notebook.
Next Steps
To understand more about how and why RasterFrames represents Earth observation in DataFrames, read about the core concepts and the project description. For more hands-on examples, see the chapters about reading and processing with RasterFrames.
Other Options
You can also use RasterFrames in the following environments:
- Jupyter Notebook
pyspark
shell
Using Jupyter Notebook
RasterFrames provides a Docker image for a Jupyter notebook server whose default kernel is already set up for running RasterFrames. To use it:
- Install Docker
- Pull the image:
docker pull s22s/rasterframes-notebook
- Run a container with the image, for example:
docker run -p 8808:8888 -p 44040:4040 -v /path/to/notebooks:/home/jovyan/work rasterframes-notebook:latest
- In a browser, open
localhost:8808
in the example above.
See the RasterFrames Notebook README for instructions on building the Docker image for this Jupyter notebook server.
Using pyspark
shell
You can use RasterFrames in a pyspark
shell. To set up the pyspark
environment, prepare your call with the appropriate --master
and other --conf
arguments for your cluster manager and environment. For RasterFrames support you need to pass arguments pointing to the various Java dependencies. You will also need the Python source zip, even if you have pip installed the package. You can download the source zip here: https://repo1.maven.org/maven2/org/locationtech/rasterframes/pyrasterframes_2.11/${VERSION}/pyrasterframes_2.11-${VERSION}-python.zip.
The pyspark
shell command will look something like this.
pyspark \
--master local[*] \
--py-files pyrasterframes_2.11-${VERSION}-python.zip \
--packages org.locationtech.rasterframes:rasterframes_2.11:${VERSION},org.locationtech.rasterframes:pyrasterframes_2.11:${VERSION},org.locationtech.rasterframes:rasterframes-datasource_2.11:${VERSION} \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \ # these configs improve serialization performance
--conf spark.kryo.registrator=org.locationtech.rasterframes.util.RFKryoRegistrator \
--conf spark.kryoserializer.buffer.max=500m
Then in the pyspark
shell, import the module and call withRasterFrames
on the SparkSession.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.3.2
/_/
Using Python version 3.7.3 (default, Mar 27 2019 15:43:19)
SparkSession available as 'spark'.
>>> import pyrasterframes
>>> spark = spark.withRasterFrames()
>>> df = spark.read.raster('https://landsat-pds.s3.amazonaws.com/c1/L8/158/072/LC08_L1TP_158072_20180515_20180604_01_T1/LC08_L1TP_158072_20180515_20180604_01_T1_B5.TIF')
Now you have the configured SparkSession with RasterFrames enabled.
Scala Development
There is first-class support for Scala in RasterFrames. See the Scala and SQL page for an example application, and the Scala API Documentation for function details.
If you would like to use RasterFrames in Scala, you’ll need to add the following resolvers and dependencies to your sbt project:
resolvers ++= Seq(
"Azavea Public Builds" at "https://dl.bintray.com/azavea/geotrellis",
"locationtech-releases" at "https://repo.locationtech.org/content/groups/releases"
)
libraryDependencies ++= Seq(
"org.locationtech.rasterframes" %% "rasterframes" % ${VERSION},
"org.locationtech.rasterframes" %% "rasterframes-datasource" % ${VERSION},
// This is optional. Provides access to AWS PDS catalogs.
"org.locationtech.rasterframes" %% "rasterframes-experimental" % ${VERSION}
)
RasterFrames is compatible with Spark 2.4.x.
Installing GDAL Support
GDAL provides a wide variety of drivers to read data from many different raster formats. If GDAL is installed in the environment, RasterFrames will be able to read those formats. If you are using the Jupyter Notebook image, GDAL is already installed for you. Otherwise follow the instructions below. Version 2.4.1 or greater is required.
Installing on MacOS
Using homebrew:
brew install gdal
Installing on Linux
Using apt-get
:
sudo apt-get update
sudo apt-get install gdal-bin
Testing For GDAL
gdalinfo --formats
To support GeoTIFF and JPEG2000 formats, you should look for the following drivers from the output above:
GTiff -raster- (rw+vs): GeoTIFF
JPEG2000 -raster,vector- (rwv): JPEG-2000 part 1 (ISO/IEC 15444-1), based on Jasper library
Do the following to see if RasterFrames was able to find GDAL:
from pyrasterframes.utils import gdal_version
print(gdal_version())
This will print out something like “GDAL x.y.z, released 20yy/mm/dd”. If it reports “not available”, then GDAL is not installed in a place where the RasterFrames runtime was able to find it. Please file an issue to get help resolving this problem.