Wednesday, 21 August 2013

Boosting ggplot2 performance

Boosting ggplot2 performance

The ggplot2 package is easily the best plotting system I ever worked with,
except that the performance is not really good for larger datasets (~50k
points). I'm looking into providing web analyses through Shiny, using
ggplot2 as the plotting backend, but I'm not really happy with the
performance, especially in contrast with base graphics. My question is if
there any concrete ways to increase this performance.
The starting point is the following code example:
library(ggplot2)
n = 86400 # a day in seconds
dat = data.frame(id = 1:n, val = sort(runif(n)))
dev.new()
gg_base = ggplot(dat, aes(x = id, y = val))
gg_point = gg_base + geom_point()
gg_line = gg_base + geom_line()
gg_both = gg_base + geom_point() + geom_line()
system.time(print(gg_point))
system.time(print(gg_line))
system.time(print(gg_both))
system.time(plot(dat))
system.time(plot(dat, type = 'l'))
I get the following timings on my MacPro retina:
> system.time(print(gg_point))
user system elapsed
2.523 0.110 2.752
> system.time(print(gg_line))
user system elapsed
2.891 0.164 3.156
> system.time(print(gg_both))
user system elapsed
4.671 0.316 5.121
> system.time(plot(dat))
user system elapsed
1.133 0.004 1.138
> system.time(plot(dat, type = 'l'))
user system elapsed
0.034 0.001 0.036
Some more info on my setup:
> sessionInfo()
R version 2.15.3 (2013-03-01)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] C/UTF-8/C/C/C/C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_0.9.3.1
loaded via a namespace (and not attached):
[1] MASS_7.3-23 RColorBrewer_1.0-5 colorspace_1.2-1 dichromat_2.0-0
[5] digest_0.6.3 grid_2.15.3 gtable_0.1.2 labeling_0.1
[9] munsell_0.4 plyr_1.8 proto_0.3-10 reshape2_1.2.2
[13] scales_0.2.3 stringr_0.6.2

No comments:

Post a Comment