Linfiniti Geo Blog

GIS for Open Source People

Browsing Posts published in September, 2009

So you have moved your spatial data into PostGIS and everything is great….until you start wanting to clip it out again. Luckily clipping data within the postgres environment is trivial with the little help of SQL. However I wanted something a little more fancy – a way to iterate through all my spatial tables and clip out an arbitary polygon, saving the result to a shapefile. To achieve this I wrote a little bash script that does just that. The script will also optionally make sql dumps of the data and clean up after itself when it is done.

The SUFFIX option is used to append to the newly created shapefile / table / sqldump file.
The CLIPTABLE is the table containing polygons that should be used to intersect each table.
The CLIPFIELD is a field used to subset the cliptable e.g. to select a single polygon only.
The CLIPGEOMFIELD is the name of the geometry field in the CLIPTABLE.
The CLIPFIELDVALUE is the clipfield ‘where’ clause e.g. “Eastern Cape”
The DB is the database containing tables to be intersected.

If you set DELETETABLE to 0, it will create clip tables in the database with the suffix you specify.
If you set DELETEGEOM to 1 it will delete the original geometry, leaving only the intersected geometry.
If you set SHPDUMP to 1 it iwll create a shapefile for each intersected table.
If you set SQLDUMP to 1 it will make a sql dump file of each intersected table.

Note that for large dataset, clipping with an arbitary polygon can take a long time! Here is the script:

#!/bin/bash
SUFFIX="ecape"
CLIPTABLE="ecape"
CLIPFIELD="provname"
CLIPGEOMFIELD="the_geom"
CLIPFIELDVALUE="Eastern Cape"
DB="cdsm50k"
DELETETABLE=1
DELETEGEOM=1
SHPDUMP=1
SQLDUMP=0
pushd .
cd /tmp

for TABLE in `echo "\dt" | psql $DB \
    | awk '{print $3}' | grep "^[a-zA-Z]"`

do
  GEOMFIELD=`echo "\d $TABLE" | psql $DB | grep "geometry" \
  | grep -v "geometrytype" |awk '{print $1}'`
  if [ "$GEOMFIELD" == "" ]
  then
    echo "$TABLE has no geometry column, skipping"
  else
    echo "$TABLE -> $GEOMFIELD"
    echo "drop table \"${TABLE}_${SUFFIX}\";" | psql $DB
    # Note we use the && bounding box query first and
    # then the intersects query to avoid unneeded comparison of
    # complex geometries.
    SQL="CREATE TABLE \"${TABLE}_${SUFFIX}\" AS \
            SELECT \
            ST_Intersection(v.$GEOMFIELD, m.$CLIPGEOMFIELD) AS intersection_geom, \
            v.*, \
            m.$CLIPFIELD \
            FROM \
              \"$TABLE\" v, \
              $CLIPTABLE m \
            WHERE \
              ST_Intersects(v.$GEOMFIELD, m.$CLIPGEOMFIELD) AND \
              $CLIPFIELD='$CLIPFIELDVALUE';"

    echo $SQL  | psql $DB

    if [ $DELETEGEOM -eq 1 ]
    then
      echo "alter table \"${TABLE}_${SUFFIX}\" drop column $GEOMFIELD;" | psql $DB
    fi

    if [ $SHPDUMP -eq 1 ]
    then
      pgsql2shp -f ${TABLE}.shp -g "intersection_geom" $DB ${TABLE}_${SUFFIX}
    fi

    if [ $SQLDUMP -eq 1 ]
    then
      pg_dump -D $DB -t ${TABLE}_${SUFFIX} > ${TABLE}_${SUFFIX}.sql
    fi  

    if [ $DELETETABLE -eq 1 ]
    then
      echo "drop table ${TABLE}_${SUFFIX};" | psql $DB
    fi
  fi
done
echo "vacuum analyze;" | psql $DB
popd

Introduction

PyWPS is a great project by Jachym Cepicky and Intevation to provide an open (Open Source and Open Standards) implementation of the OGC Web Processing Service spec. In this article, I will give a quick run through of getting up and running enough to have the obligatory “Hello World” service running.

Check out and install PyWps

The source code for PyWPS can be checked out here (I’m using trunk here which is perhaps not a good idea in a production environment):

svn co https://svn.wald.intevation.org/svn/pywps/trunk pywps-trunk

Also we need to install a couple of dependencies:

sudo apt-get install libapache2-mod-python python-htmltmpl python-magic

Update 6 Jan 2009: python-magic package required as of pywps V3 r878

Right now we can go on to build and install pywps:

sudo python setup.py install

Next we copy or symlink the pywps into your cgi-bin directory:

cd /usr/lib/cgi-bin/
sudo ln -s /usr/local/bin/wps.py .

Apache Configuration

Now you need to configure your apache to set up the correct environment for PyWPS:

sudo vim /etc/apache2/sites-enabled/000-default

For clarity I have added comments to the apache virtual host configuration. Note that on your system you will need to adjust some of the paths shown here to suite.

  ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/
  <Directory "/usr/lib/cgi-bin">
    #Next two lines added by Tim for PyWPS
    SetEnv PYWPS_CFG /etc/pywps.cfg
    SetEnv PYWPS_PROCESSES /home/timlinux/dev/python/wps-processes/clip
    PythonPath "['/home/timlinux/dev/python/',
                 '/home/timlinux/dev/python/wps-processes/clip']
                 + sys.path"
    AllowOverride None
    #Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
    #changed from above for pywps
    Options +ExecCGI -MultiViews +FollowSymLinks
    Order allow,deny
    Allow from all
  </Directory>
  #Alias and dir below added for pywps
  Alias /wps_outputs/ "/tmp/wps_outputs"
  <Directory "/tmp/wps_outputs/">
      Options Indexes MultiViews FollowSymLinks
      AllowOverride None
  </Directory>

The important part here is that you are setting your PYWPS_CFG path and you python path, and relaxing the cgi-bin permissions a little to allow following symlinks.

PyWps Settings Configuration

As you may have noticed above, pywps uses a file in /etc/pywps.cfg to store all of its global settings. Adapting the settings I have used below should get you started quickly:

[wps]
encoding=utf-8
title=PyWPS Development Server
version=1.0.0
abstract=Dev version of PyWPS. See http://pywps.wald.intevation.org
fees=None
constraints=none
serveraddress=http://localhost/cgi-bin/wps.py
keywords=GRASS,GIS,WPS
lang=eng,ger

[provider]
providerName=Linfiniti Consulting CC.
individualName=Tim Sutton
positionName=GIS Consultant
role=Developer
deliveryPoint=Lanseria
city=Johannesburg
postalCode=000 00
country=za
electronicMailAddress=tim@linfiniti.com
providerSite=http://linfiniti.com
phoneVoice=False
phoneFacsimile=False
administrativeArea=False

[server]
maxoperations=3
maxinputparamlength=1024
maxfilesize=3mb
tempPath=/tmp
# Dont specify this if you have set it up in the apache config
#processesPath=
outputUrl=http://linfiniti.com/wps/wps_outputs
outputPath=/tmp/wps_outputs
debug=true

[grass]
path=/usr/lib/grass/bin/:/usr/lib/grass/scripts/
addonPath=
version=6.2.1
gui=text
gisbase=/usr/lib/grass/
ldLibraryPath=/usr/lib/grass/lib
gisdbase=/home/user/grassdata
#home=/var/www/

Note that the grass section is present, although the use of grass within pywps is entirely optional.

Implementing your own processes module

The last thing we need to do before being able to see pywps in action is to write our processess module. The processes module is basically just a directory filled with python classes, each of which can expose a service endpoint.

First lets set up some structure:

mkdir -p /home/timlinux/dev/python/wps-processes
cd /home/timlinux/dev/python/wps-processes

This is our top level processes directory. Under that we will now create a module:

mkdir -p clip
cd clip

I called mine ‘clip’ because after we are done, I plan to deploy an image and vector clipping service so I am planning for the future.

echo '__all__ = ["helloWorld"]' > __init__.py

Note that if you want to deploy more than one process, add the additional ones (separated by commas) after the “helloWorld” process.

Next create your process class (mine is unsurprisingly called helloWorld.py):

from pywps.Process.Process import WPSProcess
from types import *

class Process(WPSProcess):
  """Main process class"""
  def __init__(self):
    """Process initialization"""

    # init process
    WPSProcess.__init__(self,
      identifier = "helloWorld",
      title="Hello World",
      version = "0.1",
      storeSupported = "false",
      statusSupported = "false",
      abstract="Pass me a message, I'll pass one back to you",
      grassLocation = False)

    # process inputs
    #simple input
    self.message = self.addLiteralInput(identifier = "message",
      title = "Your message",
      type = StringType)

    # literal output
    self.textOut = self.addLiteralOutput(identifier="text",
        title="Just some literal output",
        type = StringType)

  def execute(self):
    """Execute process."""
    myReply = "Well hello there : " + str(self.message.value)
    self.textOut.setValue(myReply)

This is probably one of the most simple process implementations you could write. It takes a single parameter and returns a string.

Testing

Make sure to reload your apache instance before testing e.g.:

sudo /etc/init.d/apache reload

Now you can test.I had to chop these into two lines to fit into my blog page, but you should use a single line url for the purpose:


http://localhost/cgi-bin/wps.py?Service=WPS&

Version=1.0.0&Request=GetCapabilities

The above performs a basic capabilities test for your server. You should get a response that looks something like:


<?xml version="1.0" encoding="utf-8"?>
<wps:Capabilities service="WPS" version="1.0.0" xml:lang="eng"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:wps="http://www.opengis.net/wps/1.0.0"
xmlns:ows="http://www.opengis.net/ows/1.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.opengis.net/wps/1.0.0
http://schemas.opengis.net/wps/1.0.0/wpsGetCapabilities_response.xsd"
updateSequence="1">
<ows:ServiceIdentification>
<ows:Title>PyWPS Development Server</ows:Title>
<ows:Abstract>Development version of PyWPS. See http://pywps.wald.intevation.org</ows:Abstract>
<ows:Keywords>
<ows:Keyword>GRASS</ows:Keyword>
<ows:Keyword>GIS</ows:Keyword>
<ows:Keyword>WPS</ows:Keyword>
</ows:Keywords>
<ows:ServiceType>WPS</ows:ServiceType>
<ows:ServiceTypeVersion>1.0.0</ows:ServiceTypeVersion>
<ows:Fees>None</ows:Fees>
<ows:AccessConstraints>none</ows:AccessConstraints>
</ows:ServiceIdentification>
<ows:ServiceProvider>
<ows:ProviderName>Linfiniti Consulting CC, South Africa</ows:ProviderName>
<ows:ProviderSite xlink:href="http://linfiniti.com"/>
<ows:ServiceContact>
<ows:IndividualName>Tim Sutton</ows:IndividualName>
<ows:PositionName>GIS Consultant</ows:PositionName>
<ows:ContactInfo>
<ows:Address>
<ows:DeliveryPoint>Lanseria</ows:DeliveryPoint>
<ows:City>Johannesburg</ows:City>
<ows:PostalCode>000 00</ows:PostalCode>
<ows:Country>za</ows:Country>
<ows:ElectronicMailAddress>tim@linfiniti.com</ows:ElectronicMailAddress>
</ows:Address>
</ows:ContactInfo>
</ows:ServiceContact>
</ows:ServiceProvider>
<ows:OperationsMetadata>
<ows:Operation name="GetCapabilities">
<ows:DCP>
<ows:HTTP>
<ows:Get xlink:href="http://localhost/cgi-bin/wps.py?"/>
<ows:Post xlink:href="http://localhost/cgi-bin/wps.py"/>
</ows:HTTP>
</ows:DCP>
</ows:Operation>
<ows:Operation name="DescribeProcess">
<ows:DCP>
<ows:HTTP>
<ows:Get xlink:href="http://localhost/cgi-bin/wps.py?"/>
<ows:Post xlink:href="http://localhost/cgi-bin/wps.py"/>
</ows:HTTP>
</ows:DCP>
</ows:Operation>
<ows:Operation name="Execute">
<ows:DCP>
<ows:HTTP>
<ows:Get xlink:href="http://localhost/cgi-bin/wps.py?"/>
<ows:Post xlink:href="http://localhost/cgi-bin/wps.py"/>
</ows:HTTP>
</ows:DCP>
</ows:Operation>
</ows:OperationsMetadata>
<wps:ProcessOfferings>
<wps:Process wps:processVersion="0.1">
<ows:Identifier>helloWorld</ows:Identifier>
<ows:Title>Hello World</ows:Title>
<ows:Abstract>Pass me a message, I'll pass one back to you</ows:Abstract>
</wps:Process>
</wps:ProcessOfferings>
<wps:Languages>
<wps:Default>
<ows:Language>eng</ows:Language>
</wps:Default>
<wps:Supported>
<ows:Language>eng</ows:Language>
<ows:Language>ger</ows:Language>
</wps:Supported>
</wps:Languages>
<wps:WSDL xlink:href="http://localhost/cgi-bin/wps.py?WSDL"/>
</wps:Capabilities>


http://localhost/cgi-bin/wps.py?Service=WPS&request=DescribeProcess

&version=1.0.0&identifier=helloWorld

This will result in something like this which describes our hello world service:


<?xml version="1.0" encoding="utf-8"?>
<wps:ProcessDescriptions xmlns:wps="http://www.opengis.net/wps/1.0.0"
xmlns:ows="http://www.opengis.net/ows/1.1"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.opengis.net/wps/1.0.0
http://schemas.opengis.net/wps/1.0.0/wpsDescribeProcess_response.xsd"
service="WPS" version="1.0.0" xml:lang="eng">
<ProcessDescription wps:processVersion="0.1" storeSupported="False" statusSupported="False">
<ows:Identifier>helloWorld</ows:Identifier>
<ows:Title>Hello World</ows:Title>
<ows:Abstract>Pass me a message, I'll pass one back to you</ows:Abstract>
<DataInputs>
<Input minOccurs="1" maxOccurs="1">
<ows:Identifier>message</ows:Identifier>
<ows:Title>Your message</ows:Title>
<LiteralData>
<ows:DataType
ows:reference="http://www.w3.org/TR/xmlschema-2/#string">string</ows:DataType>
<ows:AnyValue />
</LiteralData>
</Input>
</DataInputs>
<ProcessOutputs>
<Output>
<ows:Identifier>text</ows:Identifier>
<ows:Title>Just some literal output</ows:Title>
<LiteralOutput>
<ows:DataType
ows:reference="http://www.w3.org/TR/xmlschema-2/#string">string</ows:DataType>
</LiteralOutput>
</Output>
</ProcessOutputs>
</ProcessDescription>
</wps:ProcessDescriptions>

From the process description we can see that our service takes a single text input and returns a single text output.

Finally we can try to invoke the service itself:


http://localhost/cgi-bin/wps.py?Service=WPS&request=execute&

version=1.0.0&identifier=helloWorld&DataInputs=[message=Tim]

Which returns:


<?xml version="1.0" encoding="utf-8"?>
<wps:ExecuteResponse xmlns:wps="http://www.opengis.net/wps/1.0.0"
xmlns:ows="http://www.opengis.net/ows/1.1"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.opengis.net/wps/1.0.0
http://schemas.opengis.net/wps/1.0.0/wpsGetCapabilities_response.xsd"
service="WPS" version="1.0.0" xml:lang="eng"
serviceInstance="http://localhost/cgi-bin/wps.py?service=WPS&request=GetCapabilities&version=1.0.0"
statusLocation="http://localhost/wps/wps_outputs/pywps-125414601825.xml">
<wps:Process wps:processVersion="0.1">
<ows:Identifier>helloWorld</ows:Identifier>
<ows:Title>Hello World</ows:Title>
<ows:Abstract>Pass me a message, I'll pass one back to you</ows:Abstract>
</wps:Process>
<wps:Status creationTime="Mon Sep 28 15:53:38 2009">
<wps:ProcessSucceeded>PyWPS Process helloWorld successfully
calculated</wps:ProcessSucceeded>
</wps:Status>
<wps:ProcessOutputs>
<wps:Output>
<ows:Identifier>text</ows:Identifier>
<ows:Title>Just some literal output</ows:Title>
<wps:Data>
<wps:LiteralData dataType="string">
Well hello there : Tim
</wps:LiteralData>
</wps:Data>
</wps:Output>
</wps:ProcessOutputs>
</wps:ExecuteResponse>

You can see in the above output that the string I passed the service was returned to me in the form of a nice greeting.

Conclusion

In the course of an hour or two, you can have a simple PyWPS service running. The creation of new services should be trivial from this point on. I will no doubt post a follow up showing PyWPS doing something more substantial in the future. If you are thirsty for more, do check out the intevation PyWPS documentation pages.

If you are a keen GDAL user, you have probably noticed by now that you can extract a rectangular portion of an image into a new image. Sometimes you want to go a little further and actually extract only the cells within a polygonal area from a shapefile or similar vector file. In this article I will show you how.

Overview

There are basically four steps to follow:

- create a shpfile containing only the features you wish to use as your clip mask
- determine the extent of your clipmask
- perform a rectangular clip on the image
- perform a polygonal clip on the image

Heres a little preview of what we will be able to produce:

Hillshade clipped to a shapefile

A quick diversion to add support for MrSid

If my previous article on mosaicking I illustrated how GDAL can can be built with ECW support. Before I get started with my clipping, I am going to add MrSid support since some of the rasters I plan to clip are in MrSid format. To do this you need to download the MrSid geo express sdk from their website (requires registration). Once you have it downloaded, unpack it into /usr/local/GeoExpressSDK and then follow the gdal build guide from my mosaicking article while adding a couple of options to the configure line:

./configure --with-libtiff=internal --with-geotiff=internal \
--with-ecw=/usr/local --with-mrsid=/usr/local/GeoExpressSDK --without-jp2mrsid
make
sudo make install

Once installed, you can check if your gdal supports MrSid by doing:

/usr/local/bin/gdalinfo --formats | grep SID

Which should result in a line like this being displayed:

  MrSID (ro): Multi-resolution Seamless Image Database (MrSID)

Create your mask shapefile

For my purposes, I want to clip data out for the Eastern Cape Province of South Africa. So I loaded a national dataset of SA Provinces in QGIS, selected the Eastern Cape province and the exported that to its own shapefile by right clicking on its legend entry and choosing the ‘Save selection as shapefile’ option. It is important that your shapefile contain only the features needed for clipping as we will be using its bounding box as part of the clip process.

Determine the extent of your clip mask

With a few lines of bash and the ogrinfo programme we can determine the bounding box of a shapefile e.g.:

#!/bin/bash
SHPFILE=$1
BASE=`basename $SHPFILE .shp`
EXTENT=`ogrinfo -so $SHPFILE $BASE | grep Extent \
| sed 's/Extent: //g' | sed 's/(//g' | sed 's/)//g' \
| sed 's/ - /, /g'`
EXTENT=`echo $EXTENT | awk -F ',' '{print $1 " " $4 " " $3 " " $2}'`
echo $EXTENT

If we run the script it produces output like this:

22.735740 -30.001800 30.194720 -34.213860

This gives you the extents in the order and format expected by gdal_translate which we need for our clip operation.

Clip the raster to the bounding box of your shapefile

To do the clip to bounding box, we can simply add a few more lines to our script:

RASTERFILE=$2
gdal_translate -projwin $EXTENT -of GTiff $RASTERFILE /tmp/`basename $RASTERFILE .sid`_bbclip.tif

Ok so that should have clipped out your raster to the bounding box of your shapefile.

Perform a polygonal clip on the image

Last on our list is to clip out just the polygonal area from the clipped-to-bounding-box version of our image. To do this we use gdalwarp e.g.:

gdalwarp -co COMPRESS=DEFLATE -co TILED=YES -of GTiff \
-r lanczos -cutline $SHPFILE \
/tmp/`basename $RASTERFILE .sid`_bbclip.tif /tmp/`basename $RASTERFILE .sid`_shpclip.tif

Tying it all together

Lets see what our final script looks like then:

#!/bin/bash
SHPFILE=$1
BASE=`basename $SHPFILE .shp`
EXTENT=`ogrinfo -so $SHPFILE $BASE | grep Extent \
| sed 's/Extent: //g' | sed 's/(//g' | sed 's/)//g' \
| sed 's/ - /, /g'`
EXTENT=`echo $EXTENT | awk -F ',' '{print $1 " " $4 " " $3 " " $2}'`
echo "Clipping to $EXTENT"
RASTERFILE=$2
gdal_translate -projwin $EXTENT -of GTiff $RASTERFILE /tmp/`basename $RASTERFILE .sid`_bbclip.tif
gdalwarp -co COMPRESS=DEFLATE -co TILED=YES -of GTiff \
-r lanczos -cutline $SHPFILE \
/tmp/`basename $RASTERFILE .sid`_bbclip.tif /tmp/`basename $RASTERFILE .sid`_shpclip.tif

Save the script and make it executable. I saved mine to mine as ~/bin/clip.sh and then made it executable using

chmod +x ~/bin/clip.sh

To invoke the script you would do something like:

~/bin/clip myshapefile.shp mybigraster.sid

Of course there are some things that can be improved in the script – for example its hard coded to assume that you are working with MrSid data. However the script should provide you with a starting point to clipping out your own raster data using arbitary polygons.

We just released QGIS 1.3 ‘Mimas’. Download your copy http://qgis.org/en/download/current-software.html and read the official announcement over on the QGIS blog here

I did some revisions to the Windows all in one installer for this release and was able to shrink it down in size quite a bit – always good news for those of us living in hte ‘internet badlands’.

The vector editing features in QGIS are now becoming a serious contender with commercial applications in my opinion, the new goodies in QGIS 1.3 just go even further to qualify that statement.

Hope you all enjoy using the work of a very dedicated QGIS developer team (these guys always blow my mind with their work!).

I will probably plan the next release for after the QGIS Hackfest in Vienna in November. Really looking forward to that!

Every day being a QGIS user and developer brings a new surprise. Even more so since the python language bindings were introduced to allow people to write plugins easily. Today Andreas Plesch popped a note to the dev list saying he had created a plugin for hill shading. I often like to use a hill shaded relief map as a backdrop to provide context for map data. Now it is easy to do in QGIS. Here is a little example:

QGIS hillshading plugin

The plugin is available from the python plugin manager…here’s looking forward making lots of beautiful relief maps!

Intro

A client recently asked me to prepare some aeronautical data for use with QGIS. The data was supplied in about twenty 8 bit paletted GeoTiff images. The client wanted to produce a seamless mosaic so that operators only needed to open a single file to view the rasters. Also he wanted good read access times for the data (hey who doesn’t :-) ).

Trying things out…

Compressed Tiffs

The first thing I tried was simply resaving the imagery as compressed tiffs, and then building a virtual raster dataset using gdalbuildvrt. Compressing the images was easy with a few lines of bash:

#!/bin/bash

for FILE in *.tif
do
  BASE=`basename $FILE .tif`
  NEWFILE=${BASE}_c.tif
  gdal_translate -of GTiff -co COMPRESS=DEFLATE -co TILED=YES $FILE $NEWFILE
done

The script creates a new version of each tif file with a _c.tif suffix. One caveat is that last time I checked, ESRI GIS products didn’t support compressed tiff images (though it was a while ago that I last checked). The compressed tiff format works especially well on imagery with large areas of contiguous colour.

Creating a mosaic

More recent versions of gdal include the excellent gdalbuildvrt application that will build a seamless mosaic from a collection of images. The output is a ‘virtual raster’ – the original files are left untouched and the virtual raster contains a catalogue of the original files. This .vrt file can then be treated as a single file from within your GIS application. Here is an example:

gdalbuildvrt mosaic.vrt *_c.tif

Pop around to QGIS and use the add raster function to add mosaic.vrt and you should see a nice looking mosaic. However, there was one big problem with this approach – using gdalbuildvrt on paletted imagery will result in one master palette of 256 colours being assigned to all images. This results in the images having a speckled, unusable appearance.

So the next thing I did was to use gdal to expand the imagery out to 24bit rgba imagery. Doing this loses most of the gains we made by compressing the tifs, but resolves the palette incompatibility issue.

First I removed all the _c.tif files:

rm *_c.tif

Next I adapted my script to look like this:


#!/bin/bash

for FILE in *.tif
do
  BASE=`basename $FILE .tif`
  NEWFILE=${BASE}_c.tif
  gdal_translate -expand rgb -of GTiff -co COMPRESS=DEFLATE -co TILED=YES $FILE $NEWFILE
done

The noteworthy bit here is the -expand rgb option I have added which will convert the image from a paletted image to a true colour 24 bit image.

Now we can rerun our mosaic build to create a true-colour virtual raster:

gdalbuildvrt mosaic.vrt *_c.tif

Once again you can open the dataset in QGIS to test it. There are some things to bear in mind when building image mosaics. Firstly the imagery should all be in the same Coordinate Reference System. Secondly, they should all have the same spatial resolution. Lastly the band configuration should be the same for all images.

Building pyramids

If you have a particularly large collection of images, you may still find access times a little slow. You can speed things up (at the expense of some disk space) by building pyramids (also know as overviews). Pyramids presample the imagery at various scales so that it does not to resample as much data as you zoom and out of the image. Building pyramids is easy using the gdaladdo command (there is also an option within the QGIS raster properties dialog to do this).

gdaladdo -ro --config INTERLEAVE_OVERVIEW PIXEL --config \
           COMPRESS_OVERVIEW JPEG mosaic.vrt 2 4 8 16

Performance optimisation – Take 1

Because of the nature of my dataset, even with the above tweaking, data access was still too slow for me to be comfortable with passing it along to my client as a solution. To for my next party trick, I tried chopping up the mosaic into little 1000×1000 pixel tiles. I wrote a simple script to do it:

#!/bin/bash
mkdir tiles
XDIM=79011
YDIM=88977
BLOCKSIZE=1000
XPOS=0
YPOS=0
BLOCKNO=0
while [ $YPOS -le $YDIM ]
do
  while [ $XPOS -le $XDIM ]
  do
    echo "$XPOS $YPOS : ${BLOCKNO}.tif"
    gdal_translate -of GTiff -srcwin $XPOS $YPOS $BLOCKSIZE $BLOCKSIZE mosaic.vrt \
       tiles/${BLOCKNO}.tif
    BLOCKNO=`echo "$BLOCKNO + 1" | bc`
    XPOS=`echo "$XPOS + $BLOCKSIZE" | bc`
  done
  YPOS=`echo "$YPOS + $BLOCKSIZE" | bc`
  XPOS=0
done

This is just a quick and dirty approach to creating tiles and I didn’t spend more than about ten minutes writing the script, so it no doubt can be improved in all sorts of ways. One thing you should note is that I hard-coded the image dimensions.

Next I opened the newly created tiles directory and manually removed all unneeded tiles (e.g. all black ones on the periphery of the map.

Once again I built a virtual raster from this dataset:

cd tiles
gdalbuildvrt mosaic_for_tiles.vrt *_c.tif

I also made a pyramid. However performance was still too slow for my liking so I thought I would try the same approach with bigger tiles and compressing the tiles as ECW.

Performance optimisation – Take 2 (ecw)

GDAL includes support for the ECW wavelet compression format. There is a limitation in that if you don’t possess a commercial license, you may not compress a dataset larger than 500mb. Ubuntu does not ship with ecw support so you need to manually build gdal with it enabled. Here is how I built gdal from the source trunk. First I installed all of the build dependencies with some extra packages so that I could get hdf support:

sudo apt-get install build-essential libjpeg62-dev libtiff4-dev libhdf5-serial-dev \
libhdf5-serial-1.6.6-0 libhdf4g-dev libhdf4g libhdf4g-doc

Next I go hold of a copy of the ecw developer kit. Go to the erdas ecw sdk web page (link below) or just search their site for ‘ecw sdk’.

http://www.erdas.com/Products/ERDASProductInformation/tabid/84/CurrentID/1142/Default.aspx

Get the ECW JPEG2000 Codec SDK Source Code (second item listed – first if win version).

After the download you should have:

ecw_jpeg_2000_sdk_3_3_source.zip

Now unzip it:

cd /tmp
unzip ecw_jpeg_2000_sdk_3_3_source.zip
unzip ImageCompressionSDKSourceCode3.3Setup_20070509.zip
sudo mv libecwj2-3.3 /usr/local/src/
cd /usr/local/src/
sudo chown -R timlinux libecwj2-3.3/
cd libecwj2-3.3

Now build the ecw lib:

./configure
make
sudo make install

Next up we will build gdal. First install the subversion client if you don’t already have it:

sudo apt-get install subversion

Nowcheck it out gdal from svn (perhaps you prefer to use one of the stable release versions – I like to live on the edge…).

cd ~
mkdir -p dev/cpp
cd dev/cpp
svn co https://svn.osgeo.org/gdal/trunk/gdal gdal_svn
cd gdal_svn

Now configure and make gdal. Our build will go into /usr/local by default so it can exist side by side with any package installed version you may have.

make clean
./configure --with-ecw=/usr/local
make -j8
sudo make install
sudo echo "/usr/local/lib" | /etc/ld.so.conf
sudo ldconfig

It might take a little while to build but when it is done you can verify that you now have gdal built with ecw support like this:

/usr/local/bin/gdalinfo --formats | grep ECW

The response should look something like this:

  ECW (rw): ERMapper Compressed Wavelets
  JP2ECW (rw+): ERMapper JPEG2000

Ok now we can make ecw compressed images! Lets head back to our data directory and rechop up our mosaic into larger, ecw compressed pieces. My client has an ermapper license so I believe it would be ok to create a single large image for them using gdal, but for our purposes its better to stick under the 500mb limit. Here is a revision of my bash script from earlier:

#!/bin/bash
mkdir ecwtiles
XDIM=79011
YDIM=88977
#Notice I used a bigger block size now
BLOCKSIZE=10000
XPOS=0
YPOS=0
BLOCKNO=0
while [ $YPOS -le $YDIM ]
do
  while [ $XPOS -le $XDIM ]
  do
    echo "$XPOS $YPOS : ${BLOCKNO}.tif"
   # Notice I am using my hand built gdal now!
    /usr/local/bin/gdal_translate -of ECW -co LARGE_OK=NO -srcwin $XPOS $YPOS \
      $BLOCKSIZE $BLOCKSIZE mosaic.vrt ecwtiles/${BLOCKNO}.ecw
    BLOCKNO=`echo "$BLOCKNO + 1" | bc`
    XPOS=`echo "$XPOS + $BLOCKSIZE" | bc`
  done
  YPOS=`echo "$YPOS + $BLOCKSIZE" | bc`
  XPOS=0
done

When I ran that script, it created a bunch of image tiles for me in ecwtiles/ directory (it took a while!). Adding individual tiles to a QGIS project I found they loaded very quickly and pan and zoom operations were satisfyingly responsive. So how would this work out for a mosaic comprised of these tiles? Its easy to find out:

cd ecwtiles
gdalbuildvrt mosaic_for_tiles.vrt *.ecw

Afterwards I was able to open my mosaic in QGIS. It loaded in a few seconds and pan and soom operations take a second or two each. Quite a good result considering the dataset is quite large (~4GB).

Some reflections

GDAL and Linux provide an incredible toolbox to do geospatial dataprocessing. Its easy to experiment with the aid of writing bash scripts. Many people are command line averse, but I love the repeatability that using the command line gives you. Once I work out a procedure, it is easy to document and / or rerun on a different dataset. If I need to tweak it, I just create a copy of the script containing my tweaks. Compare this to a gui environment for dataprocessing where you typically have to click through sequences of dialogs in a very un-repeatable manner.

Sometimes you don’t know what will work best when you start out processing some data, but having a plethora of free tools to use makes it simple to mix and match, chop and change, until you find just the right combination that works for you.

An aside: dealing with libtiff incompatibilities

I did encounter one issue with building pyramids which (along with its solution) is described here:

http://trac.osgeo.org/gdal/ticket/3139

Here is the symptom:

gdaladdo --config COMPRESS_OVERVIEW JPEG --config INTERLEAVE_OVERVIEW PIXEL mosaic.vrt 2 4 8 16
0gdaladdo: tif_jpeg.c:1299: JPEGPreEncode: Assertion `!sp->cinfo.comm.is_decompressor' failed.
Aborted (core dumped)

At one of my clients we manage a petabyte heirachical storage system (mixed offline tape storage, near-line cache storage and online disk storage) and several high capacity (e.g. 15TB) SAN enclosures. These devices are not cheap.

However, all is not lost – there is a blog post from BackBlaze thats doing the rounds on the web that is particularly interesting. To summarise they show how to build high capacity storage devices on the cheap using off the shelf components. In the GIS world if you work with raster data a lot, or produce tile cached web mapping services, you can quickly run up a large tally of disk space. If you are a little bit technically inclined, building BackBlaze style storage could be a great business opportunity and be of great benifit to your GIS using clients. Here’s hoping that someone picks up on this opportunity here in South Africa….

Eric Raymond and others worked hard to produce and excellent guide on how to ask smart questions. I often give talks about QGIS or open source development in general and I usually end it with a slide or two about how to get help, in which I cover some of the same topics listed in the aforementioned article.

Unsurprisingly, I have been ‘victem’ to many of the scenarios they paint. I use ‘victem’ in a gentle sense as it’s often a source of amusement or triggering mild disinterest rather then getting my hackles up. Asking questions and asking for help is an art that many people just havent grasped. As an open source developer, our most important currency is time. Any question that wastes time is likely to have unsatisfactory outcome. Here then is my bullet list to problem resolution that will hopefully help you get a good result in trying to solve your problems.

  • Try every solution you can think of first
  • Google (and learn to use google advanced search) for a solution
  • Decide on the nature of your question – is it a general user question (how do I do xyz?) or a technical question (feature xyz should do this but does something else, how come?).
  • Post your question to an appropriate list based on the step above. Use your head and pick a list where the people most likely to be able to answer you are most active and responsive
  • Provide just enough detail to make the issue clear
  • Ask only one question at a time / in a thread
  • Dont hijack an existing thread for a new question
  • If someone offers you a suggestion, respond with the outcome of their suggestion when you tried it.
  • If you get the problem resolved, always write back to the list summarising what it was and how it was resolved
  • Always thank the person who helped you – it will make them more inclined to help you in the future
  • Sometimes people don’t know the answer to your question. If you get no answer do some more independent research, reformulate your question and try to repost it.
  • Dont take the fact that someone has tried to help you as an invitation to start an offlist conversation with them. Keeping your discourse on the list makes it searchable and the developer won’t have to answer the same thing another time for someone else.

Sometimes you just won’t get an answer to your question. Thats life, don’t take it personally – you could be asking when others are busy, occupied elsewhere having an actual real life etc. If it happens take another crack at it yourself , try and identify someone who can help you commercially, or start thinking out of the box to get it solved.

A common activity when building a web application is the ‘tweak code -> reload page -> view page source’ cycle. Obviously this gets a bit time consuming and boring after a while. Under Firefox you can use tools like firebug to dynamically inspect and alter the source for your web page, but it adds quite a bit of overhead to the load times of web pages.

Googles chromium (the open source, ubuntu incarnation of their Chrome web browser) is a nice addition to the browser line up on Linux. You can easily add it by dropping a couple of lines into your /etc/apt/sources.list. Visit the Ubuntu Chromium PPA for details.

One of the features I love about chromium is its view-source url prefix e.g.:


view-source:http://yourhost/somepage/?page=4

The above is displayed in the url bar of chromium when viewing the source of a page. The great part is for example above where I am working with a django paginated dataset, I can simply change the url and press enter e.g.:


view-source:http://yourhost/somepage/?page=5

When I do that the source is immediately fetched for the next page and loaded into the page source view. No need to wait for the page to render out and then switch to source view.

Chromium is great to try out and use for cross browser testing. Its very stable in my experience, but note that it has no flash support at this stage.

I was recently interviewed for an article on open access to geospatial data in the City of Johannesburg. The article has been published online now for all to enjoy…