During my first project that involved manipulating big files containing spatial data, to be more precise shapefiles, I couldn’t find a good tutorial that helped me to understand how to handle the structure of the data, it was overwhelming and frustrating, that is why I’m doing this tutorial explaining shapefiles and how to work with them in R using ggplot, hopefully I can help many others that are in my previous situation.
Let’s start by reading the data, when you have a shapefile (or more precisely a group of files inside a folder that represent spatial points or polygons) the easiest way to do it is using
When you are dealing with a big file this might take a while and this is obvioulsy frustrating , fortunately R has a way to save the
Mapobject in a format that loading the data next times will be much faster.#Save the data as an R object saveRDS(Map,'Map.RDS') #Now you can load the shapefile using this line Map
If you want to reproduce the maps shown in this tutorial you can download the data here.
Once the data is loaded you might want to take a look at
SpatialPolygonsDataFrameyou will find that each element of
Mapcontains 5 slots (
To access any of these slots instead of using
$you have to use
@, here is an example:#Check the coordinate system used head([email protected]) #Check data associated to each spatial element head([email protected])
Now that the data is loaded we can proceed to manipulate the data so we can create a Map using
ggplot2.#Transform the data into the desired coordinate system Map
The first step is required if you want to interact with other spatial elements such as Google Maps, otherwise you can omit it; the second step transform the spatial data of the object into a data frame and let us use the data on
ggplot, although it drops any additional information that might be valuabe for the analysis.
Let’s get the join the lost data, this data most of the times is the one you are interested in, here you can find the metrics and data that will give real value to your map.#Add the [email protected] to the Map_draw data frame Map_draw$id
Now the data is ready to let ggplot do it’s magic,
geom_polygonwill bring to life the layer we want to draw.ggplot() + geom_polygon(data = Map_draw, aes(long, lat, group =group), color = 'blue', fill = 'white') + coord_map()
So far the map doesn’t represent any type of data other than the shape, to add real value let’s use the variable
Populationto fill each polygon using
scale_fill_gradientto determine the colour scale.ggplot() + geom_polygon(data = Map_draw, aes(long, lat, group =group, fill =Population), color = 'gray') + coord_map()+ #Add the scale of colour you want scale_fill_gradient(low = 'light blue', high = 'dark blue')
Now the map shows where are the most crowded areas. To finish lets add final details, We can add and remove some of elements that will make the map much cleaner.
- Delete the axis, on a map data context the axis labels are irrelevant.
- Download the map terrain and replace those ugly gray rectangles as background.
#Zoom should change depending the range of your spatial data library(ggmap) Background
Now everything is ready to create a nice plot combining all the previous elements.
When we add the background the plotting area changes, but we can fix that controlling the axis to get our final plot.Plot + scale_x_continuous(expand = c(-.2,0)) + scale_y_continuous(expand = c(-.24,0))
I hope this tutorial was helpful, if you have any question leave it on the comments section.