Posted by & filed under QGIS.

If you’re using QGIS and PostGIS, this article is for you. If not, it could still show you a useful approach, so don’t close the tab yet!

Consider this scenario: you have a large vector dataset which you have combined from several mapsheets (for example, the standard 1:50 000 topo sheets issued by the Surveyor-General). You have loaded it into a PostGIS database, some sort of mapserver is implemented, and all is well. You only need to set this up to render in a timely fashion when queried by a user.

Obviously, you will need to generalize the data to cut down on processing / data retrieval time. Imagine having land use data at 1:50 000 for the whole KZN province – if you host this on a server and someone decides to look at the whole lot at once, it will never render (and nothing else will happen on the server until that one user gets bored and quits).

But not to worry, I only need to generalize the data, right? For example:

ST_SimplifyPreserveTopology(the_geom,1000) AS the_geom, landuse
INTO landuse_generalized_1000 FROM "1_50000_landuse_Union";

Chances are, however, that this won’t work as well as it should (if at all) straight off the bat. In my case, I encountered three problems:

  1. The data had lots of tiny internal rings which would still get generalized, but would thereby become invisible polygons with two vertices and zero area (or very thin triangles, but either way). They obviously still need to be loaded, with all their attribute data, making the generalization less useful than it could have been.
  2. The join boundaries from the earlier union (which turned umpteen map sheets into a single dataset) were still around and would need to be dissolved away before generalization, otherwise there’d be a bunch of slivers / gaps in between polygons from different map tiles.
  3. ST_SimplifyPreserveTopology() only accepts Polygon geometries, but once you aggregate the polygons by some attribute to get rid of the boundaries between map tiles, you will have a Multipolygon geometry type.

Hunting around (mainly on for functions that could be useful, and with many suggestions from Tim (who owns this blog), I eventually came up with a nice procedure.

First apply a small amount of generalization (case-dependent – leave it out if you need the data at full detail), then remove the interior rings. This turns your polygons into Linestrings, but you can just re-polygon-ize them in the same step:

AS the_geom, "LAND_USE" AS usetype INTO landuse1
FROM "1_50000_landuse_Union";

This bit is fairly elementary: to get rid of the tile joins, run a Union, then dissolve by land cover type:

SELECT ST_Union(the_geom) AS the_geom, usetype
INTO landuse2 FROM landuse1
GROUP BY usetype;

(Note that this gets rid of all your other unmentioned attributes, so be sure to list them if you want to keep them!)

Last but not least, to get the data back into a type that can be accepted as input by ST_SimplifyPreserveTopology, you need to run a multipart to singlepart. Although PostGIS does not have a one-step, built-in Multipart to Singlepart function, here’s how to fake one:

SELECT (ST_Dump(the_geom)).geom AS the_geom, usetype
INTO landuse3 FROM landuse2;

It’s important to note that you have to specify (ST_Dump(the_geom)).geom AS the_geom, because the output data type of ST_Dump() is not a geometry. Instead, it’s a special data type called “geometry_dump” which is specfic to the “Dump” family of functions. It contains extra data which in this case we don’t need, but the content of “geom” (which is part of the geometry_dump data structure) is equivalent to plain old geometry, so just use that.

Now at last you’re free to generalize the data:

SELECT ST_SimplifyPreserveTopology(the_geom,250) AS the_geom, usetype
INTO landuse_generalized_250 FROM landuse3;

SELECT ST_SimplifyPreserveTopology(the_geom,500) AS the_geom, usetype
INTO landuse_generalized_500 FROM landuse_generalized_250;

SELECT ST_SimplifyPreserveTopology(the_geom,1000) AS the_geom, usetype
INTO landuse_generalized_1000 FROM landuse_generalized_500;

Now you’ve got four datasets: the slightly simplified one you’ve been working from, and the three you generalized. Come to think of it, “landuse3″ is a bad name, so just rename it to, say, “landuse_full”. Also, if your new datasets came out well, you can probably get rid of your intermediate tables at this point (landuse1 and landuse2, in this case).

image: generalized data compared to original

Compared to the original (green), the generalized version (orange outlines) is highly simplified ...

image: zoomed out

... but at the appropriate zoom level, the differences are minor.

You can now delete those tiny, invisible polygons, too. Keep in mind at what scale you’d like to display which layer and view that layer in QGIS at that scale. For safety’s sake, create a test dataset:

SELECT * INTO test1000 FROM landuse_generalized_1000;

Calculate the area in a new column created for the purpose:

ALTER TABLE test1000 ADD area float;
UPDATE test1000 SET area = ST_Area(the_geom);

And get rid of polygons by area:

DELETE FROM test1000 WHERE area < 10;

The maximum area of the polygons that you can remove without making a visual or practical difference to the final map is something you’ll need to determine by trial and error. In the case of my most generalized dataset, I could get rid of about 70 000 entries by throwing out every polygon with an area smaller than 50 000, but obviously this will be highly case-dependent. Still, needing to render about 120 000 rather than 190 000 entries for a visually identical result did speed things up a bit.

Once you’re sure, run those same Alter, Update and Delete statements on the real dataset and remove the test. Even if something horrible happens, you can rest assured knowing that everything was done with SQL statements and getting it all back is as simple as rerunning some commands.

image: deployed example

In its final form together with other map layers. Although the data is simplified, it's not noticeable.

As for the practical application of these various levels of generalized data, QGIS has a handy bit of functionality: scale dependent rendering. In your layer properties (once you’ve loaded all needed datasets into your map), go to the “General” tab and check the “Use scale dependent rendering” box, then set whatever scales seem appropriate. Have your layers switch out at the applicable scales for your map, and the user will never even realize that he’s looking at different datasets with different levels of detail. So in the end, the user gets to look at land use for the whole KZN, and your server gets to not crash. Everybody’s happy!

  • matt wilkie

    FYI and somewhat orthagonal, but using buffers can sometimes yield better generalisation than the “traditional” simplify tools:

    • Rudi Thiede

      Thanks Matt! That’s worth testing as well.

      In my case I wanted the absolute minimum of vertices I could get away with. By default, using ST_Buffer in PostGIS will give you rounded curves with a lot more vertices than you need, so adjusting the quad_segs parameter is essential. An added advantage of buffering is that if your original dataset contains thin gaps within a single feature, like fjords for example, a buffer could get rid of those gaps entirely (even after you do the inner buffer step), which may be what you want. I suppose you’d also have to check to see what buffering does to the topology, but it’s not as if ST_SimplifyPreserveTopology() is flawless in all cases either.

      So, which function you end up using probably depends on the type and degree of suckage of the dataset you have to work with ;)