Note: These pages are obsolete! They have been superceeded
for NPS & DOI users by new content at the I&M Inside NPS website
If you are outside of DOI but use this content,
email Tom Philippi about
a public-facing mirror of that site.
Using R Statistical and Graphics Tools
for Natural Resource Stewardship Science
Please direct questions and comments about these pages, and about the use of R in NPS Natural Resource Stewardship Science, to Dr. Tom Philippi.
Announcement: Tom is presenting a webinar course on using R for Natural Resources in March and April, 2013. The course announcement is here
These pages seek to meet the needs of the National Park Service Inventory & Monitoring for specific guidance on using R for the reporting, analysis, and synthesis of Inventory and Monitoring data. Despite that narrow targetting, most of the material should be useful to other natural resource scientists and managers: especially those with general database skills and knowledge about their datasets.
The initial impetus for these pages was as a companion to the "Learn R" webinar courses led by Paul Geissler of USGS. Tom answered questions during Paul's presentations, and offered companion webinars that emphasized either laboratory or case studies, or writing R code for Paul's courses that used Rcmdr or another GUI. These pages were an attempt to produce a resource that will be useful for I&M and other natural resource colleagues well after the courses end.
The content has to be strategic, not comprehensive. I'm generating these on the fly from the Learn R course and from my own work. My official workplan does not include these pages, so I can only justify topics that require only incremental work but at the same time will greatly reduce the number of calls for assistance. As any of you complete tasks or projects using R that would be useful to other I&M networks, I would welcome their addition to these pages. Eventually, these need to become community pages for users of R within NPS I&M and the broader natural resource community.
The left-hand navigation shows the topics covered on these pages. In general, there are two sets of pages. The topic-specific pages are meant to be quick references when you need to perform a specific task. The course-specific pages (mostly removed from this new server, but the 2012 R Topics course pages are under "Advanced R Topics") will be more direct companions to webinars or other training, and so will include less detail and tend to include data access and graphing in each page.
Why R for Natural Resource Stewardship Science?
R is an open-source implementation of the S language for statistical computing. For almost 20 years, applied statisticians have been submitting implementations of their new techniques to StatLib. When those implementations were written as libraries for the commercial S-plus implementation, statisticians were providing software for free, but users (including those same statisticians) had to pay a third party to be able to run the software. A very small group of statisticians took it upon themselves to write a complete open-source implementation of S that would run under most operating systems, which they called R. Since then, the vast majority of implementations of new statistical techniques have been made available as R packages, which include the code as a library of functions and at least some documentation.
Because R is very useful for "computing with data", experts in many fields use it for their work. Because R is open source, many of those experts make their field-specific code and functions freely available as packages. For example, climate researchers use R with netCDF files, so there are packages for reading and writing netCDF files (netCDF, ncdf4) as well generating standard climate diagrams, imputing missing weather data, downscaling from coarse data, etc. (climtol, clim.pact, seas, anm, zyp). Phenology researchers provide packages bise and pheno; Jari Oksanen (with help from others) provides package vegan for vegetation analysis (ordination, classification, analysis of similarity); ecologists working on habitat analysis and spatial prediction provide adehabitat, grasp, BIOMOD, ModelMap; wildlife ecologists provide packages for estimating abundances and occupancy, including mra, Rcapture, secr, PresenceAbsence, trip, and tripEstimation. Social scientists provide a set of packages for using the 2000 census data. Bioconductor is a project with many packages for analyzing microarray data, DNA and protein sequence data, and other molecular biology bioinformatics. By learning how to use R, we can leverage their efforts and expertise, and not reinvent those wheels. [If we improve a wheel or write one for a different need, we can in turn make our improved wheel available to others as a new package, or work with the authors of the original package and let them incorporate our additions and improvements.]
There are two major reasons you may want to learn to write R code rather than use a GUI such as Rcommander. First, while more and more of the general statistical methods are being added to Rcmdr via plugins, almost all of the field-specific packages require R code to use. Packages are sets of one or more functions useful for a set of tasks. The advantage of functions in R is that we don't need to understand or modify anything inside the function in order to use tha package (although the source code is available if we need to inspect it or improve it). We only need to know what parameters we need to pass to the function, and how to use the objects (figures, analysis results, or data objects) it returns. Therefore, the amount of coding required of the user is quite limited: mostly creating the data objects the functions require, then calling the functions in the desired order.
Second, scripts document the analysis and workflow in an unambiguous manner, and make the work reproducible. Most scientific work in ecology involves decisions about outliers and missing values, and many options during the statistical analysis, far too many decisions and options to be documented in a standard methods section of a paper. [They can also be difficult to rerun 6 months later when editors and reviewers want one slight change, or a colleague needs to perform a similar analysis, or you have accumulated more data.] Because these details can greatly affect the results, some ecological journals and ecoinformatics groups are considering encouraging or requiring some form of documentation or journaling of the entire scientific workflow. R code (or SAS code or SPSS code) that includes the querying of the database, merging and cleansing data, generating the figures and tables, and performing the analyses themselves are one way to meet that requirement.
But I'm a Busy Resource Manager, Why Should I Care?
Perhaps you shouldn't care. But cleaning, analyzing, and reporting results requires 1/4th to 1/3rd of the total time and effort of both ecological science in general and NPS Inventory & Monitoring in particular. If you consider an average of 8-10 parks per I&M network and 10 vital signs per park, I&M networks simply cannot devote that much effort each year to generating annual reports. Routine reporting must occur, but as much of the repetitive work as possible must be automated, so that network folks can keep up with the workload and have time to devote to occasional larger syntheses. R code, and a bit of thought put into that R code, can make generation of tables, figures, and analyses for annual reports as simple as appending the current year's data onto the cumulative dataset in a database and running a script (from previous years) in R. Sweave and its MS Office cousins R2wd, odfWeave, and Sword have the potential to embed properly formatted tables, figures, and any other R object (e.g., years or dates from the database) in the correct places in a document template that has section headings and boilerplate text, allowing the author to focus on writing just the short interpretation and discussion of the results. Done right, the power of coding can speed the repetitive tasks and get us more time out in the field. [Alas, that hasn't actually happened for me; I just have more time for more tasks.]
Basic Resources for R
|R project home||Core R project page. Main R site -for all operating systems and resources. Use next link to obtain current Windows version of R package.|
|R for Windows||Download latest Windows version via this link.|
|Rstudio||An Integrated Development Environment for R, with some nice hooks for reproducible reporting via Sweave and knitr.|
|R GUI and text editor from sciviews.org; rough equivalent to Rstudio, with the ability to submit R code embedded in Sweave.|
|Rcmdr package||R GUI developed by John Fox. Handles a variety of routine parametric and non-parametric tests, and facilitates production of a range of graphics. Makes R accessible to the casual user, and easily extensible via "plugins" for additional packages or for specific courses.|
Environmetrics Task View: R for Analysis of Ecological and Environmental Data
Click here to view the R community's task view or compendium of packages that are aimed at environmetrics or ecological analyses. Includes population estimation, species distribution modeling, ordination, time series, TREE, extreme value analysis, and related topics. An excellent resource. There is also a new R listserve (R-sig-eco) specifically for ecological and environmental analysis.
Getting Help in R
Aside from the documentation available on the CRAN website, there are at least 2 tools that are better than a raw google search. My first step often is to go to:
Second, I have Jonathan Baron's R search page bookmarked:
It searches most of the R online help, plus the archives of the R-help mailing lists.
Third, if I know what I want to do but don't know which package might do what I need, I sometimes go to the list of R packages
and search in the webpage for the text I want (e.g., Oracle or SAS).
Many universities offer courses on R; several offer courses on using R in natural resources.
The National Center for Ecological ANalysis and Synthesis (nceas.ucsb.edu) offers no-nonsense training in R for ecologists with strong quantitative and programming background. Unfortunately, their course web pages which were a wonderful resource have moved again, so you may have to poke around a bit on their Scientific Computing website.
UCLA ATS has a growing set of R (and other statistical computing) pages at:
Tomislav Hengl's website on Spatial Analysis in R includes a list of training courses in R: r-bloggers.com is a syndication of over 400 R bloggers, including several focussed on applications in biology, ecology, climate, and other natural resource topics Some day I will compile a set of links to R resources, at least those I use enough to bookmark.
r-bloggers.com is a syndication of over 400 R bloggers, including several focussed on applications in biology, ecology, climate, and other natural resource topics
Some day I will compile a set of links to R resources, at least those I use enough to bookmark.