Convex Hulls with ggplot

I found this code buried in an old google group discussion which I thought I would repost. As with everything ggplot wise hat tip to the incredible Hadley Wickham.

Often it’s nice to break down scatter plots by a third variable, especially if it’s categorical. However if you have lots of categories the space occupied by the different groups isn’t always straightforward to see:

no-convex-hulls

A bounding box around the outermost points in each group (a ‘convex hull’) makes it easier to see the groups. To make them appear in ggplot you can use the following code:

find_hull <- function(df) df[chull(df$X, df$Y), ]
library(plyr)
hulls <- ddply(somedata, “CategoryName”, find_hull)

Where X and Y are the names of your X and Y variables on the scatter plot. With ‘hulls’ in place you can simply add the following to your ggplot command (with a bit of alpha so you can still see the points):

geom_polygon(data = hulls, alpha = 0.2)

convex-hulls

Is it better? On the one hand you can clearly see the groups. On the other hand outliers may give a bit of a distorted impression of what is going on. Perhaps something which trimmed outliers would be a bit better.

No comments yet»

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: