Linfiniti Geo Blog

GIS for Open Source People

Browsing Posts in Postgres & PostGIS

I spent most of last week in Dar es Salaam, Tanzania. A lovely tropical country in the heart of Africa. I was there as part of a project I am working to create tools for Biodiversity Informatics practitioners. Of course the tools are based on Free Software: Quantum GIS and openModeller.

The attendees at the workshop were entertained by my talk about what FOSS is and why it is important, an introduction to QGIS slideshow (superbly presented by Marco Hugentobler), and ending with a tour of openModellerDesktop. We also did some live demonstrations of QGIS and openModeller, before going on to discuss details about how these tools can be used to support their Biodiversity Informatics workflows.

The meeting was funded by the Global Biodiversity Information Facility (GBIF) with Juan Bello as their representitive, and hosted by the Tanzanian Commission for Science and Technology (COSTECH).

In case you are unfamiliar with the aims of GBIF, they are facilitating the digitisation (or digitization for our american readers) of the worlds biodiversity records – herbarium records, museum collections and so on. COSTECH provides the local infrastructure and staff for the ‘TanBif’ node in Tanzania.

The meeting also included ‘in-country’ experts in the fields of GIS, Meteorology, Ecology, IT and so on. I think for all of the attendees, the concept of FOSS was a real eye-opener. African economies can’t compare with those in Europe and the USA and the capital outlay for proprietary software that presents an irritation in the Western world is a major burden in the third world. So just knowing that they could dive in and use QGIS was a great revelation.

We finished our workshop a little early on the Friday so Marco and I offered to go along to the COSTECH offices and geo-enable their PostgreSQL species occurrence database and install QGIS on their desktop PC’s running Windows XP. In the space of a couple of hours we were done – the major part of which was spent showing the TanBif staff members how to bring up the PostGIS layer in QGIS, perform simple queries and make maps. Having spent days in the past trying to get proprietary software like Oracle and Arc*** configured, optimised, licensed and generally usable, I was struck by just how easy and quick it is to get someone up and running with a robust enterprise ready PostGIS geospatial datastore and a user friendly Free Software desktop GIS like QGIS.

Thanks to the friendly Tanzanian folks for their hospitality – I look forward to my next visit! Here are some piccies from the trip…

Juan Bello telling us about the cool things you can do with a good Biodiversity Information repository.

The workshop attendees (Marco and Juan out of shot)


Marco showing Godfrey how to use QGIS to bring up their PostGIS Biodiversity dataset.

Godfrey proudly showing off his first map (made with QGIS)!

Marco killing a mosquito – he became something of an expert!

The problem

I’ve been having a little poke around with Mapnik today (awesome software!). One of the things on my todo list has been to sort out rendering issues with roads we have been having. Our last iteration described roads something like this:

A style…

    <Style name="Freeway30th_style">
        <Rule>
            <LineSymbolizer>
                <CssParameter name="stroke">rgb(169,170,153)</CssParameter>
                <CssParameter name="stroke-width">12.26</CssParameter>
                <CssParameter name="stroke-linejoin">bevel</CssParameter>
                <CssParameter name="stroke-linecap">round</CssParameter>
                <CssParameter name="stroke-opacity">1</CssParameter>
            </LineSymbolizer>
            <LineSymbolizer>
                <CssParameter name="stroke">rgb(255,172,88)</CssParameter>
                <CssParameter name="stroke-width">12.16</CssParameter>
                <CssParameter name="stroke-linejoin">miter</CssParameter>
                <CssParameter name="stroke-linecap">round</CssParameter>
            </LineSymbolizer>
        </Rule>
    </Style>

…and this layer definition…

        <Layer name="Freeway30th" srs="+init=epsg:&srid;" maxzoom="39105.90277777778">
        <StyleName>Freeway30th_style</StyleName>
        <Datasource>
            <Parameter name="dbname">&dbname;</Parameter>
            <Parameter name="estimate_extent">0</Parameter>
            <Parameter name="extent">&extent;</Parameter>
            <Parameter name="geometry_field">&geometry_field;</Parameter>
            <Parameter name="host">&host;</Parameter>
            <Parameter name="password">&password;</Parameter>
            <Parameter name="port">&port;</Parameter>
            <Parameter name="srid">&srid;</Parameter>
            <Parameter name="table">(SELECT * FROM "l_roads" WHERE "type" = \
            'Freeway' ORDER BY LENGTH(&geometry_field;) DESC) as "l_roads"</Parameter>
            <Parameter name="type">&datasourcetype;</Parameter>
            <Parameter name="user">&password;</Parameter>
        </Datasource>
    </Layer>

With the idea being to render freeways with a gray outline and orange center. Unfortunately, it doesnt produce good results:

The problem being those little line ends you see making gray splodges at the end of each segment.

The solution

Michael Migurski’s blog discusses this issue a little in this article but doesnt directly explain how to achieve the desired effect. So here is what you do:

First the styles are split into two…

     <Style name="Freeway30th_style-bottom">
        <Rule>
            <LineSymbolizer>
                <CssParameter name="stroke">rgb(169,170,153)</CssParameter>
                <CssParameter name="stroke-width">12.26</CssParameter>
                <CssParameter name="stroke-linejoin">bevel</CssParameter>
                <CssParameter name="stroke-linecap">round</CssParameter>
                <CssParameter name="stroke-opacity">1</CssParameter>
            </LineSymbolizer>
        </Rule>
    </Style>
    <Style name="Freeway30th_style-top">
        <Rule>
            <LineSymbolizer>
                <CssParameter name="stroke">rgb(255,172,88)</CssParameter>
                <CssParameter name="stroke-width">12.16</CssParameter>
                <CssParameter name="stroke-linejoin">miter</CssParameter>
                <CssParameter name="stroke-linecap">round</CssParameter>
            </LineSymbolizer>
        </Rule>
    </Style>

and then the layer is now rendered as two layers, the bottom layer first, then the top:

    <Layer name="Freeway30th-bottom" srs="+init=epsg:&srid;" maxzoom="39105.90277777778">
        <StyleName>Freeway30th_style-bottom</StyleName>
        <Datasource>
            <Parameter name="dbname">&dbname;</Parameter>
            <Parameter name="estimate_extent">0</Parameter>
            <Parameter name="extent">&extent;</Parameter>
            <Parameter name="geometry_field">&geometry_field;</Parameter>
            <Parameter name="host">&host;</Parameter>
            <Parameter name="password">&password;</Parameter>
            <Parameter name="port">&port;</Parameter>
            <Parameter name="srid">&srid;</Parameter>
            <Parameter name="table">(SELECT * FROM "l_roads" WHERE "type" = \
            'Freeway' ORDER BY LENGTH(&geometry_field;) DESC) as "l_roads"</Parameter>
            <Parameter name="type">&datasourcetype;</Parameter>
            <Parameter name="user">&password;</Parameter>
        </Datasource>
    </Layer>
    <Layer name="Freeway30th-top" srs="+init=epsg:&srid;" maxzoom="39105.90277777778">
        <StyleName>Freeway30th_style-top</StyleName>
        <Datasource>
            <Parameter name="dbname">&dbname;</Parameter>
            <Parameter name="estimate_extent">0</Parameter>
            <Parameter name="extent">&extent;</Parameter>
            <Parameter name="geometry_field">&geometry_field;</Parameter>
            <Parameter name="host">&host;</Parameter>
            <Parameter name="password">&password;</Parameter>
            <Parameter name="port">&port;</Parameter>
            <Parameter name="srid">&srid;</Parameter>
            <Parameter name="table">(SELECT * FROM "l_roads" WHERE "type" = \
            'Freeway' ORDER BY LENGTH(&geometry_field;) DESC) as "l_roads"</Parameter>
            <Parameter name="type">&datasourcetype;</Parameter>
            <Parameter name="user">&password;</Parameter>
         </Datasource>
    </Layer>

A much cleaner rendering!

Note

This approach consumes more cpu time and hits your database harder than the ‘messier’ approach shown first.

Also you can see in the example above, I have adopted Michaels approach of rendering long lines first.

Have fun with your mapnik maps!

Horst and I are spending the week up in Johannesburg at the Satellite Applications Center in Hartebeeshoek. We are doing yet another week long training course (I hope I’m not working the poor guy too hard :-P ). This time we are doing:

- Two days QGIS (with a little GRASS)
- One day PostGIS
- Two days geospatial programming with Bash, Python and QGIS

Tomorrow we start with the PostGIS component. Horst and I have been compiling some course notes for the PostGIS module which we are making available to the world as per usual. The pdf still has some rendering issues – we are aware of that. The document tries to walk the reader through the basics of using SQL and then some basic activities with PostGIS and working with geometries.

I hope some of you out there find it useful – let us know if you do! Also if you have any improvements to make, we’d love to hear from you.

Here is a quick pic or two from the course:




I have a cron job that backs up my PG databases and emails them to me everynight. Today I wanted to upgrade my PostgreSQL 8.3 databases to PG 8.4 so I made a few modifications to my script so that I could dump out my PG data, and then restore it under PG 8.4. In case you are wondering I am doing this because Ubuntu Lucid now ships with PG 8.4 (yay!). I also made the script generate another script to restore the databases. So basically the procedure is to run the script below on your 8.3 cluster, shut that down and bring up your 8.4 cluster, and then restore your databases into that. Here follows my little script:

MYDATE=`date +%d-%B-%Y`
MYBACKUPDIR=/home/timlinux/sql_backups
MYRECIPIENT=tim@linfiniti.com

DBLIST=`psql -l \
  | awk '{print $1}' | grep -v "+" | grep -v "Name" | \
  grep -v "List" | grep -v "(" | grep -v "template" | \
  grep -v "postgres"`
for DB in ${DBLIST}
do
  echo "Backing up $DB"
  FILENAME=${MYBACKUPDIR}/PG_${DB}.${MYDATE}.sql.tar.gz
  pg_dump -f ${FILENAME} -x -O -F tar ${DB}
  #If you want to email the database uncomment
  #below (will cause issues if backups are large)
  #mpack -s "Daily $DB PG backup for ${MYDATE}" $FILENAME $MYRECIPIENT
done

echo "Procedure to restore one of your backups:"
echo "createdb "
echo "pg_restore -F t .sql.tar.gz |psql "
echo "Or to restore in a batch make a script like this:"
echo "for FILE in /home/timlinux/sql_backups/*; do DB=\$(echo $FILE | \"
echo "  sed 's\/home\/timlinux\/sql_backups\/PG_//g' | sed 's/.${MYDATE}.sql.tar.gz//g'); "
echo " 'Restoring: \$DB'; createdb \$DB; pg_restore -F t \$FILE |psql \$DB; done"

Update 02 May 2010 Uncommented pg_dump line which was inadvertantly commented in my original post.

Hi, I’m Sam. I’ve been learning a lot here at Linfiniti (thanks to the brilliant team!) Just like to add a quick note on one of the tasks I learnt this week.

I was working on a shapefile of the suburbs in Cape Town. A client required the suburbs to be grouped by region. After the tedious part of manually grouping the suburbs (using a created field, REGION), the unioning (dissolving) of the suburbs proved to be quick and painless through PostgreSQL, using this SQL command that implements the geomunion function:

create table ct_regions as select geomunion(the_geom), "REGION" \
 from "ct_suburbs" group by "REGION";

“ct_suburbs” is the original shapefile that was loaded into the PostGIS database using the Quantum GIS ‘SPIT’ plugin. “REGION” is the class (attribute) that I wish to union by.  And ct_regions will be the output shapefile. See the result here:

Before dissolving (ct_suburbs shows suburbs)

Cape Town suburbs before dissolving

After dissolving by suburbs (ct_regions shows collections of suburbs that have been merged)

Cape Town suburbs after dissolving

Hopefully this will be of some use when it comes to your own mapping!

I’ve posted this before on my old blog so this is a repeat for those looking to get going with PostGIS in a hurry. This procedure should work on Ubuntu Jaunty or Ubuntu Karmic and possibly earlier versions.

sudo apt-get install postgresql-8.3-postgis
sudo su - postgres
createuser -s -d -r -l -P -E timlinux
createdb gis
createlang plpgsql gis
psql gis < /usr/share/postgresql-8.3-postgis/lwpostgis.sql
psql gis < /usr/share/postgresql-8.3-postgis/spatial_ref_sys.sql

Having done that, you should be able to connect to the database by creating a new postgis connection in QGIS. After that you can start to load data into your spatial database using SPIT (QGIS plugin to import shapefiles into QGIS) or the shp2psql command line tool.

So you have moved your spatial data into PostGIS and everything is great….until you start wanting to clip it out again. Luckily clipping data within the postgres environment is trivial with the little help of SQL. However I wanted something a little more fancy – a way to iterate through all my spatial tables and clip out an arbitary polygon, saving the result to a shapefile. To achieve this I wrote a little bash script that does just that. The script will also optionally make sql dumps of the data and clean up after itself when it is done.

The SUFFIX option is used to append to the newly created shapefile / table / sqldump file.
The CLIPTABLE is the table containing polygons that should be used to intersect each table.
The CLIPFIELD is a field used to subset the cliptable e.g. to select a single polygon only.
The CLIPGEOMFIELD is the name of the geometry field in the CLIPTABLE.
The CLIPFIELDVALUE is the clipfield ‘where’ clause e.g. “Eastern Cape”
The DB is the database containing tables to be intersected.

If you set DELETETABLE to 0, it will create clip tables in the database with the suffix you specify.
If you set DELETEGEOM to 1 it will delete the original geometry, leaving only the intersected geometry.
If you set SHPDUMP to 1 it iwll create a shapefile for each intersected table.
If you set SQLDUMP to 1 it will make a sql dump file of each intersected table.

Note that for large dataset, clipping with an arbitary polygon can take a long time! Here is the script:

#!/bin/bash
SUFFIX="ecape"
CLIPTABLE="ecape"
CLIPFIELD="provname"
CLIPGEOMFIELD="the_geom"
CLIPFIELDVALUE="Eastern Cape"
DB="cdsm50k"
DELETETABLE=1
DELETEGEOM=1
SHPDUMP=1
SQLDUMP=0
pushd .
cd /tmp

for TABLE in `echo "\dt" | psql $DB \
    | awk '{print $3}' | grep "^[a-zA-Z]"`

do
  GEOMFIELD=`echo "\d $TABLE" | psql $DB | grep "geometry" \
  | grep -v "geometrytype" |awk '{print $1}'`
  if [ "$GEOMFIELD" == "" ]
  then
    echo "$TABLE has no geometry column, skipping"
  else
    echo "$TABLE -> $GEOMFIELD"
    echo "drop table \"${TABLE}_${SUFFIX}\";" | psql $DB
    # Note we use the && bounding box query first and
    # then the intersects query to avoid unneeded comparison of
    # complex geometries.
    SQL="CREATE TABLE \"${TABLE}_${SUFFIX}\" AS \
            SELECT \
            ST_Intersection(v.$GEOMFIELD, m.$CLIPGEOMFIELD) AS intersection_geom, \
            v.*, \
            m.$CLIPFIELD \
            FROM \
              \"$TABLE\" v, \
              $CLIPTABLE m \
            WHERE \
              ST_Intersects(v.$GEOMFIELD, m.$CLIPGEOMFIELD) AND \
              $CLIPFIELD='$CLIPFIELDVALUE';"

    echo $SQL  | psql $DB

    if [ $DELETEGEOM -eq 1 ]
    then
      echo "alter table \"${TABLE}_${SUFFIX}\" drop column $GEOMFIELD;" | psql $DB
    fi

    if [ $SHPDUMP -eq 1 ]
    then
      pgsql2shp -f ${TABLE}.shp -g "intersection_geom" $DB ${TABLE}_${SUFFIX}
    fi

    if [ $SQLDUMP -eq 1 ]
    then
      pg_dump -D $DB -t ${TABLE}_${SUFFIX} > ${TABLE}_${SUFFIX}.sql
    fi  

    if [ $DELETETABLE -eq 1 ]
    then
      echo "drop table ${TABLE}_${SUFFIX};" | psql $DB
    fi
  fi
done
echo "vacuum analyze;" | psql $DB
popd

Sometimes just because a thing is possible doesn’t make it a good idea. In the last week I have been helping a client whose web developer has dissapeared. The developer implemented the database using a variation on the Entity Attribute Value (EAV) modelling paradigm. The database consists of only four tables: Entities, Attributes, Values, EntityTypes. So the entities in the database don’t match any real-world entities – there is no products table for products, no categories table for categories and so on. This generic modelling approach might sound good but in practice it’s a real pain. Look at this query diagram representing a simple query to find out what categories exist in the database:

EAVDBModel

The query consists of numerous joins and is completely opaque. The EAV wikipedia article I referenced above contains a list of downsides to the EAV approach, with which I can only concur. For a new developer approach an EAV implementation its an uphill battle to understand what is going on. Writing queries is a cumbersome task, and the logic such as foreign key constraints that would normally reside in the database layer now needs to be implemented in the application codebase.

For me EAV is something I will be avoiding in any new developments I embark on…

Someone asked on twitter it is possible to dump all the tables in Postgres to individual shp files. Some time ago I wrote a script to dump all tables as SQL dumps. The question prompted me to tweak that script to drop out shapefiles instead.

My original script looked like the listing below. The dump files contain data only (see the comments in the bash script below) because I use this script to create fixtures for my django projects.

#!/bin/bash

# A script to create sql formatted fixtures (serialised models)
# used to initialise the application if you install it to another
# machine. You should run this any time you change your models
# or when you need to make a backup of all your data.

# Tim Sutton 2009
mkdir bees/sql
for TABLE in `echo "\d" | psql sabio | grep -v seq | awk '{print $3}'`
do
  echo $TABLE
  # -a data only
  # -t table
  # -D dump as sql inserts
  pg_dump -a -t $TABLE -D sabio > bees/sql/${TABLE}.sql
  #bzip2 bees/sql/${TABLE}.sql
done

To make the script drop out shapefiles I modified it a bit as shown in the next listing. Obviously as we are dumping shapefiles, we should only bother dumping tables with geometry in them so I went the route of using the geometry_columns table to decide which tables to dump…

#!/bin/bash

# A script to dump shapefiles of all tables listed in geometry_columns
# Tim Sutton 2009
mkdir bees/sql
for TABLE in `echo "select f_table_name from geometry_columns;" | psql sabio \
  | head -n -2 | egrep -v "\-\-\-\-\-\-\-\-\-" | egrep -v "f_table_name"`
do
  echo $TABLE
  pgsql2shp sabio $TABLE
done

Hope this is useful to someone out there :-)

Ok so I have a few production databases that I need to back up regularly. The trick is I want to run the backup from a remote machine so that the backup lives on a separate server to the actual database system. You can run backups manually like this (assuming your database is called ‘postgis’):

pg_dump -h dbhost -f postgis_`date +%d%B%Y`.sql.tar.gz -x -O -F tar postgis

When you run the above command, you will be prompted for a password. After entering the password you will find a date stamped backup. Very nice, but you may have noted that pg_dump has no option for giving it the password on the command line – it expects you to do that interactively. So what do we do if we need to automate the packup using a cron job? The solution is to use either ~/.pgpass or the PGPASSWORD environment variable. So here is how I automated the backup by placing a script in /etc/cron.daily/

export PGPASSWORD=secret
pg_dump -h dbhost -f postgis_`date +%d%B%Y`.sql.tar.gz -x -O -F tar postgis