During my first project that involved manipulating big files containing spatial data, to be more precise shapefiles, I couldn’t find a good tutorial that helped me to understand how to handle the structure of the data, it was overwhelming and frustrating, that is why I’m doing this tutorial explaining shapefiles and how to work with them in R using ggplot, hopefully I can help many others that are in my previous situation.
Read data
Let’s start by reading the data, when you have a shapefile (or more precisely a group of files inside a folder that represent spatial points or polygons) the easiest way to do it is using rgdal
.
library(rgdal) MapWhen you are dealing with a big file this might take a while and this is obvioulsy frustrating , fortunately R has a way to save the
Map
object in a format that loading the data next times will be much faster.#Save the data as an R object saveRDS(Map,'Map.RDS') #Now you can load the shapefile using this line MapIf you want to reproduce the maps shown in this tutorial you can download the data here.
Once the data is loaded you might want to take a look at
Map
, sinceMap
class isSpatialPolygonsDataFrame
you will find that each element ofMap
contains 5 slots (data
,polygons
,bbox
,plotOrder
andproj4string
).To access any of these slots instead of using
$
you have to use@
, here is an example:#Check the coordinate system used head(Map@proj4string) #Check data associated to each spatial element head(Map@data)Transform data
Now that the data is loaded we can proceed to manipulate the data so we can create a Map using
ggplot2
.#Transform the data into the desired coordinate system MapThe first step is required if you want to interact with other spatial elements such as Google Maps, otherwise you can omit it; the second step transform the spatial data of the object into a data frame and let us use the data on
ggplot
, although it drops any additional information that might be valuabe for the analysis.Let’s get the join the lost data, this data most of the times is the one you are interested in, here you can find the metrics and data that will give real value to your map.
#Add the Map@data to the Map_draw data frame Map_draw$idPlot data
Now the data is ready to let ggplot do it’s magic,
geom_polygon
will bring to life the layer we want to draw.ggplot() + geom_polygon(data = Map_draw, aes(long, lat, group =group), color = 'blue', fill = 'white') + coord_map()So far the map doesn’t represent any type of data other than the shape, to add real value let’s use the variable
Population
to fill each polygon usingscale_fill_gradient
to determine the colour scale.ggplot() + geom_polygon(data = Map_draw, aes(long, lat, group =group, fill =Population), color = 'gray') + coord_map()+ #Add the scale of colour you want scale_fill_gradient(low = 'light blue', high = 'dark blue')Final details
Now the map shows where are the most crowded areas. To finish lets add final details, We can add and remove some of elements that will make the map much cleaner.
- Delete the axis, on a map data context the axis labels are irrelevant.
ditch_the_axes
- Download the map terrain and replace those ugly gray rectangles as background.
#Zoom should change depending the range of your spatial data library(ggmap) Background
Now everything is ready to create a nice plot combining all the previous elements.
PlotWhen we add the background the plotting area changes, but we can fix that controlling the axis to get our final plot.
Plot + scale_x_continuous(expand = c(-.2,0)) + scale_y_continuous(expand = c(-.24,0))I hope this tutorial was helpful, if you have any question leave it on the comments section.