Dan Goldstein Dan Goldstein -4 years ago 72
R Question

How to organize large R programs?

When I undertake an R project of any complexity, my scripts quickly get long and confusing.

What are some practices I can adopt so that my code will always be a pleasure to work with? I'm thinking about things like

  • Placement of functions in source files

  • When to break something out to another source file

  • What should be in the master file

  • Using functions as organizational units (whether this is worthwhile given that R makes it hard to access global state)

  • Indentation / line break practices.

    • Treat ( like {?

    • Put things like )} on 1 or 2 lines?

Basically, what are your rules of thumb for organizing large R scripts?

Answer Source

The standard answer is to use packages -- see the Writing R Extensions manual as well as different tutorials on the web.

It gives you

  • a quasi-automatic way to organize your code by topic
  • strongly encourages you to write a help file, making you think about the interface
  • a lot of sanity checks via R CMD check
  • a chance to add regression tests
  • as well as a means for namespaces.

Just running source() over code works for really short snippets. Everything else should be in a package -- even if you do not plan to publish it as you can write internal packages for internal repositories.

As for the 'how to edit' part, the R Internals manual has excellent R coding standards in Section 6. Otherwise, I tend to use defaults in Emacs' ESS mode.

Update 2008-Aug-13: David Smith just blogged about the Google R Style Guide.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download