National Park Service

Inventory & Monitoring (I&M)

Topic 2(a): Spatial Data in R

Please direct questions and comments about these pages, and the R-project in general, to Dr. Tom Philippi.

Introduction

While today's topic is spatial data in R, this web page and the webinar can only be a guide to getting started. We can no more teach you everything about spatial data in R in 2 hours than we could teach you ERDAS/Imagine (or ARC/Gis) in 2 hours. In addition to the time constraints, we don't know all of R for spatial data, just like we don't know all of ERDAS or ARC. Therefore, our more modest goal for this session is to introduce you to the R tools we use with spatial data, provide examples of tasks we perform in our duties for NPS natural resources that might get you started on your needs, and give a few pointers toward what else is available and places to find more information. In terms of "learning outcome", at the end of this session you should know what R tools are available to read, write, and process spatial data, have example code for several natural resources applications, and know where to go to look for what you need for your own applications.


Why use R for spatial data?

"R is software for programming with data" provides a pretty good hint on when you might want to use R instead of a GIS tool such as ARC/Gis, Grass, or SAGA. {Note that mention of specific software and commercial products is not an endorsement of them by me, NPS, DOI, or anyone.} I use these spatial tools in R when:

  • I am performing substantial number crunching, such as testing for intersections between millions of polygons of species geographic ranges and hundreds of polygons of NPS and FWS units. R lets me custon-code shortcuts such as first testing for intersections of bounding boxes, then testing point in polygon, and only rarely needing to brute-force vertex by vertex polygon intersection tests, which can be much faster than iterating more general-purpose polygon tools.
  • I am working with very large rasters that cannot fit in memory.
  • I will be doing other computations on the data in R, so I might as well keep it all in one tool and script.
  • I need to do careful or fancy spatial prediction, and I need something more powerful and flexible than ordinary kriging. Many of the newest and most advanced spatial statistical tools for prediction (combining spatial neighbors and covariates), hotspot identification, point pattern analysis, etc., are only available in R. Also, R has a pretty complete suite of approaches to habitat suitability modeling.

Conversely, I would not use R for any but the most rudimentary mapping and not at all for serious cartography (if I had serious cartography skills).

An alternative for ARC/GIS folks is to use the Geospatial Modeling Environment (successor to Hawth's Tools), which provides some nice extensions to ARC analysis tools, and the ability to run arbitrary R code from within ARC/Map.


Before the Webinar

You can install all of the packages we will use this week with the following command:

install.packages(c("rgdal","raster","R.utils","maptools","spsurvey"),dep=TRUE)

If you wish to download the data files and R code to play along at home, I have posted them on the NPS R website:

CABR data (DEM, Boundary, Slope, Aspect, Vegetation Map Polygons)

PRISM data from Colorado (for John Gross' raster examples. Extract the .gz files into a "prism/t_min" for John's code.)

Tom's R code using gdal, overlay, etc.

John's R code using raster

Tom's kmlPolygons.R function to generate labelled kml polygons for Google Earth & Google Maps

⇑ To Top of Page

Spatial Objects

Internally, the R representations of spatial data are the same as vector or raster GIS data, only it is easier to see the structure and manipulate individual components. If you had an introductory course on GIS principles, they should all make sense to you.

Spatial data in R are handled in complex object classes. The sp package defines a pretty complete set of spatial classes, from points to lines to polygons (each possibly with attributes) to grids and pixels. More complex classes are built from or extend the simple classes, and functions for manipulating components have appropriate methods defined for all of these classes. The raster package extends the sp classes to include Rasterlayer, RasterStack, and RasterBrick, and provides tools for automatic tiling of raster objects too large to fit into memory. The trip and tripEstimation packages define classes and tools for space-time data such as animal tracking.

All spatial objects have 2 slots or components in common: a bounding box (or extent in raster) and a coordinate system or CRS. All useful spatial classes have additional slots: lists of coordinates for points, coordinates of vertices for polygons, descriptions of dimensions and a matrix of values for rasters, etc. The str() function returns the structure of an object.

class(VegMap)
str(VegMap)

The bounding box of an object is usually computed on the fly (as are other slots such as centroids for label positioning). The coordinate reference system must either be read or set explicitly on input spatial objects, and defaults to NA. Then, because all spatial operations I can think of require objects in the same CRS, if the result of a spatial operation is a spatial object, it inherits the same CRS as the input objects.

CRS uses proj4 strings to define projections & coordinate reference systems. I know of no comprehensive list to crosswalk from ESRI projection names to proj4 strings. If you know what you are doing with the parameters, you can roll your own: the proj4string() function on the left of an assignment statement can be used to set the CRS of a spatial object. However, when the readOGR() and readGDAL() functions read a file with projection information (.prj for a shapefile, embedded in .img files, etc.), the resulting R object has the correct CRS proj4string. spTransform() reprojects spatial objects; the target CRS is usually the CRS of another R spatial object, and can be set accordingly.

#VegMap <- read.shapefile("Vegetation")
VegMap <- readOGR(".",layer="Vegetation")
proj4string(VegMap) # NAD83

AmphZone27 <- readOGR("Other_Layers","Amph_Eco_Zns")
proj4string(AmphZone27) # NAD27
AZone <- spTransform(AmphZone27,CRS(proj4string(VegMap))) # reproject to CRS of VegMap

Back to the VegMap SpatialPolygonsDataFrame object. It has a single data.frame in the @data slot with 1 row for each vegetation polygon (800), the attribtue table.

The second slot is a list of (800) polygons, each of class Polygons. Each of those objects of class Polygons has a list of Polygons of class polygon (in this file each is a single contiguous polygon without a hole). Each of those objects of class polygon has labpt, the coordinates of a centroid for plotting labels, an area, a logical whether that polygon is the outer boundary or a ring delineating a hole, a ringDir, and a numeric array with 2 columns with the coordinates of the ordered vertices. The polygons of class Polygons also include a plotOrder, lappt coordinates, ID, and area.

The third slot of the SpatialPolygonsDataFrame object is an integer vector of the plotting order of the 800 Polygons.

The forth slot of the SpatialPolygonsDataFrame object is a bbox: a 2x2 matrix with named dimensions of the max & min of the X and Y coordinates.

Finally, the fifth object is the proj4string of class CRS.

One could think of a SpatialPolygonsDataFrame of US States: the attribute table having 1 row for each state or territory. Each state's geometry can consist of multiple polygons think of the islands in Alaska or California, the upper penninsula of Michigan, etc. A state might even have a hole in it.

If necessary, you can access the innards of such complex objects, and extractor functions such as bbox() can be applied to the entire SpatialPolygonsDataFrame (where it simply returns the bbox slot), or to a Polygons (e.g., a state including islands) or to an individual polygon. Extra care is necessary to ensure that you're grabbing the component of the object at the right level of nesting. [I'm not sure what the bounding box is for a donut hole.]

I used this access to code my fast computation of which species geographic ranges (each species range was a shapefile with 1 - ~10000 infividual polygons) overlap which NPS units and FWS refuges (each was 1 or a reasonable number of simplified polygons). If the bounding box of a species range did not overlap CONUS, I knew that none of the NPS units within CONUS were overlapped, and could exclude them all at once. If the bounding box of the species overlapped CONUS, I tested it against the bbox of each NPS unit; again, if the bboxes didn't overlap I was done. If those bboxes did overlap, I could test the bboxes for each component polygon of the species range. If I had an overlap, I switched gears to use point in polygon, which is a very fast computation (the function I used in R drew a vector from the point to infinity and counted the number of crossings of the polygon boundary: odd is in, even is out). I had generated a list of one point within each NPS unit polygon (starting from the labpt coordinates, but adjusting for odd shaped units like BLRI, which I used against each of the component polygons. If I had a hit, then I knew that species range overlapped that NPS unit, and I was done. Only if my selected point for each NPS unit was outside of each species range polygon did I need to use the brute force polygon-polygon intersection test.

⇑ To Top of Page

Reading Spatial Objects

The spatial task view on CRAN lists perhaps a dozen packages for reading and writing spatial data. In the past I have used specific functions in maptools, shapefiles, and PBSmapping. However, I strongly recommend using the rgdal package for pretty much everything, with the exception of kml files and possibly NetCDF. Package rgdal provides bindings to the open source GDAL (Frank Warmerdam's Geospatial Data Abstraction Library for raster data), proj4, and OGR (the complement to gdal for vector data). readGDAL() and readOGR() provide seamless access to raster and vector file formats. The only trick is that specification of directories v. file names is a bit tricky, as GDAL & OGR drivers for different file formats interpret dsn and layer differently (sometimes as directory and file, sometimes as file and layer within file).

readOGR(dsn, layer, verbose = TRUE, p4s=NULL,
        stringsAsFactors=default.stringsAsFactors(),
        drop_unsupported_fields=FALSE, input_field_name_encoding=NULL,   
        pointDropZ=FALSE, dropNULLGeometries=TRUE,   
        useC=TRUE, disambiguateFIDs=FALSE)
        
        VegMap <-   readOGR(".",layer="CABR_VegMap_Draft_June_2011") # ESRI shapefile
        
        readGDAL(fname, offset, region.dim, output.dim, band, p4s=NULL, ...,   
        half.cell=c(0.5, 0.5), silent = FALSE)
        
        Elev <- readGDAL("ned3m.img") # ERDAS Imagine .img file 

The biggest shortcoming is that there is no current tool in R for pulling data from an ESRI v10 geodatabase. As far as I know, the API for these geodatabases is still proprietary, so no one could write such a function. My workaround is to grab what I need from the geodatabase in ARC/Map, then export that feature to a shapefile. Yes that's a pain for separating boundaries for each NPS unit; I can export as a single shapefile, then use R to split into separate objects.

As a special case, there are several packages for working with either map tiles or kml/kmz files from Google maps and Google Earth: ggmap, googleVis, plotGoogleMaps, and RgoogleMaps that I know of.


Creating Spatial Objects

One can easily compute a set of x.y coordinates in R, or a matrix of values that correspond to a grid of spatial points. The coordinates() function on the left side of an assignment sets the coordinates of an R dataframe object, and converts it into a SpatialPointsDataFrame object. The gridded() function can convert a SpatialPointsDataFrame object into a SpatialPixelsDataFrame object.

⇑ To Top of Page

Writing Spatial Objects

I recommend the writeOGR(), writeGDAL(), and create2GDAL() functions in the rgdal package.

writeGDAL(dataset, fname, drivername = "GTiff", type = "Float32",
    mvFlag = NA, options=NULL, copy_drivername = "GTiff", 
    setStatistics=FALSE)  

create2GDAL(dataset, drivername = "GTiff", type = "Float32", mvFlag = NA,
    options=NULL, fname = NULL, setStatistics=FALSE)
            
writeOGR(obj, dsn, layer, driver, dataset_options = NULL,
    layer_options=NULL, verbose = FALSE, check_exists=NULL,
    overwrite_layer=FALSE, delete_dsn=FALSE)


Creating kml files for Google Earth / Google Maps

Note that writeOGR() can even write points, lines, and polygons to kml files:

VegMap.kml <- spTransform(VegMap, CRS("+proj=longlat +datum=WGS84"))
writeOGR(VegMap.kml["Assn_Code"],"CABR_VegMap.kml","Assn_Code","KML")

I have written a wrapper function (kmlPolygons.R) to simplify semi-transparent colors and labels on kml polygons.

⇑ To Top of Page

Spatial Overlays and Extractions

Package sp includes a function over(x,y) for spatial overlays or extraction, with methods for most combinations of spatial classes of x and y, roughly at each observation or row in x, grab information from y..

?over shows the following methods:

x = "SpatialPoints", y = "SpatialPolygons"

returns a numeric vector of length equal to the number of points; the number is the index (number) of the polygon of y in which a point falls; NA denotes the point does not fall in a polygon; if a point falls in multiple polygons, the last polygon is recorded.

x = "SpatialPointsDataFrame", y = "SpatialPolygons"

equal to the previous method, except that an argument fn=xxx is allowed, e.g. fn = mean which will then report a data.frame with the mean attribute values of the x points falling in each polygon (set) of y

x = "SpatialPoints", y = "SpatialPolygonsDataFrame"

returns a data.frame of the second argument with row entries corresponding to the first argument

x = "SpatialPolygons", y = "SpatialPoints"

returns the polygon index of points in y; if x is a SpatialPolygonsDataFrame, a data.frame with rows from x corresponding to points in y is returned.

x = "SpatialGridDataFrame", y = "SpatialPoints"

returns object of class SpatialPointsDataFrame with grid attribute values x at spatial point locations y; NA for NA grid cells or points outside grid, and NA values on NA grid cells.

x = "SpatialGrid", y = "SpatialPoints"

returns grid values x at spatial point locations y; NA for NA grid cells or points outside the grid

x = "SpatialPixelsDataFrame", y = "SpatialPoints"

returns grid values x at spatial point locations y; NA for NA grid cells or points outside the grid

x = "SpatialPixels", y = "SpatialPoints"

returns grid values x at spatial point locations y; NA for NA grid cells or points outside the grid

The following object combinations do not have methodsl defined for over:

x = "SpatialPoints", y = "SpatialGrid"

x = "SpatialPoints", y = "SpatialGridDataFrame"

x = "SpatialPoints", y = "SpatialPixels"

x = "SpatialPoints", y = "SpatialPixelsDataFrame"

x = "SpatialPolygons", y = "SpatialGridDataFrame"


Thus, if I have a set of spatial points (draw, class SpatialPointsDataFrame) and I want to know the value of a raster at each point or the label (actually, all attributes) of the polygon containing each point, I can simply use the over function:

# Vegetation & Amphibian Zone
AmphZone <- over(draw,AZone)$REGION
VegZone <- over(draw,VZone)$REGION
# Vegetation Alliance & Association
# This has all attributes from the Vegetation shapefile: Alliance Association
# be careful!!!! The merge() function reorders the data
# extract attributes from vegetation shapefile at each GRTS point
tempVeg <- over(draw,VegMap)
Alliance <- tempVeg$Alliance

Elev <- over(draw,DEM10)$band1
Aspect <- over(draw,aspect10)$band1
Slope <- over(draw,slope10)$band1
DistRoad <- over(draw,dRoads)$band1
DistTrail <- over(draw,dTrails)$band1
DistMin <- pmin(DistRoad,DistTrail)

That example is testing the properties of various GRTS draws for SAMO vegetation: does equal probability get what we need, or do we need to use strata or unequal inclusion probabilities? How much travel time is there, based on distances from roads and trails? More Thursday.

Example with CABR vegetation map, slope, elevation, and the locations of the herp monitoring arrays.

⇑ To Top of Page

Raster Computations

Rasters are much more compact that vector points, because the coordinates do not need to be recorded for each pixel or cell in the rectangular extent. A raster has a CRS, an origin, a distance or cell size in each direction, a dimension in terms of numbers of cells, and an array of values. If necessary, the coordinates for any cell can be computed. Beyond this compactness, note that many useful computations can be performed on the matrix without reference to the coordinates: think NDVI or cloud masking or even edge detection.

The raster package includes object classes for simple raster layers, RasterStacks, and RasterBricks, functions for converting among these classes, and operators for computations on the raster data. [The data can be pulled out and operated on as a multidimensional array, then the resulting array converted back to a raster if the spatial dimensions remain the same.] In addition, the raster package provides tools for the usual utilities for handlign rasters: merge, mosaic, crop, resample, and projectRaster.

?raster


Package raster documentation

Hijmans, R.J.  2011.  Introduction to package raster.  (cran.r-project.org/web/packages/raster/vignettes/Raster.pdf (But may be best to search google search for most recent version).

Bivand, R.S., E.J. Pebesma and V. Gomez-Rubio, 2008. Applied Spatial Data
Analysis with R . Springer. 378p.


Why use R raster?

  • Often much faster than ArcGIS
  • Can work with data sets/files too large to load into memory. 
  • Keep data in one program – i.e., summarize raster data to e.g. layer means, feed directly into further analyses
  • Doesn’t require a GIS
  • For R users, don’t need to learn another program or language

Who this intro is for:

  • Existing R users who typically struggle with e.g. data subsetting
  • Folks who routinely rely on printed and on-line help

Probably not useful for:

  • R users that find the documentation really helpful and complete
  • R users that write scripts that run the first time
  • People that really understand R objects and aren’t confused by S3 objects, S4 objects, lists, data frames etc.

If any of above three describes you, just download the package, read the Intro, and have at it.

Key raster concepts

Four data objects of primary interest (but there are many other):

  • Raster layer – single layer
  • Raster stack – more than one layer with same extent and resolution. Can be from different sources, with some in memory and others in file(s).
  • Raster brick – similar to stack, but only from one file. E.g., climate model results, multi-band satellite data
  • extent – xmin, xmax, ymin, ymax.

Basic raster functions example:

rm(list = ls())
windows(record = TRUE)

      # create three identical RasterLayer objects
<- r2 <- r3 <- raster(nrow=10, ncol=10)

      # Assign random cell values
n <- ncell(r1)
values(r1) <- runif(n)
values(r2) <- c(rep(3, n/2), rep(5, n/2))
values(r3) <- runif(n)
<- runif(n)        # same as previous line

# combine three RasterLayer objects into a RasterStack
s <- stack(r1, r2, r3)
s
plot(s)

      # now change extent
      # also see setextent() for more flexibility
e <- extent(s)
e
e <- extent(0,5,50,60)
extent(s) <- e
plot(s)

      # simple raster math
r1 <- r1+10
rsqrt <- sqrt(r1)
extent(r1)
extent(rsqrt)

s2 <- stack(r1,rsqrt)
plot(s2)

# extract a layer
lyr <- raster(s2, layer=2)
"plot layer")

# summary stats – “Summary methods” in documentation:  mean,
# median, min max, range, prod, sum, any, all.  These always
# return a raster layer.

mean(s2)

#  cellStats returns single number for each layer.

cellStats(s2,"mean") 


Raster for climate data analyses

Example 1 - PRISM gridded.
Historical data for lower 48 states available from web from 1895-present at 4 km resolution. 

  • Variables are tmin, tmax, precipitation, and tdmean (dew point)
  • Need to extract and process files. Code for this at end of this file.  This takes a long time – we won’t do in Webinar

As example, read in one year of data – run code to “END OF YEAR DEMO”

What’s difference in tmin between 1900 and 2010?

Usually want to compare multi-year averages.  We’ll look at Colorado for circa 1900 and 2000.

Reading climate projection data from NetCDF files.

Not in this presentation, but worth knowing about:
Neighborhood (focal) functions (rasterLayer only)

  • focal – rectangle neighborhood
  • focalFilter – user defined weights (weight can include 0)
  • focalNA – calculate values only for NA (i.e., fill rasters)

⇑ To Top of Page

Links and Extensions

The Spatial Task View on CRAN lists packages for spatial data, and provides links to the rspatial pages (documentation & graphics galleries) and spatial R wikis

Rgeos (now ASU GeoDa Center)

r-sig-geo mailing list archives

Bivand, R.S., E.J. Pebesma and V. Gomez-Rubio, 2008. Applied Spatial Data
Analysis with R . Springer. 378p. Book website with code, data, erratta

Hijmans, R.J.  2011.  Introduction to package raster.  (cran.r-project.org/web/packages/raster has 3 vignettes and a link to the r-forge page for raster).

My (incomplete) list of spatial and habitat packages

⇑ To Top of Page

Last Updated: December 30, 2016 Contact Webmaster