Overview

In this short module I will provide an introduction to another mapping package: cartography. I prefer to work with tmap; however, the cartography package can be useful if you want to be able to produce a variety of thematic maps, including choropleth, proportional symbol, dot density, isoline, and flow maps. We will also work with data from Eurostat, which provides statistical information for European Union (EU) countries and other geographic units. These data can be accessed in R using the eurostate package.

In this first code block I am reading in the required packages.

library(eurostat)
library(cartography)
library(ggplot2)
library(dplyr)
library(RColorBrewer)
library(viridis)
library(sf)

Fetch Data

I would like to find data associated with renewable energy us in EU countries. The search_eurostat() function from the eurostate package allows you to search for data using keywords. Here, I am searching for data related to “renewable energy”. I am also specifying that I want data in a table format. I then print the results of the query.

query <- search_eurostat("renewable energy", type="table")
query$title [1] "Share of renewable energy in gross final energy consumption" [2] "Share of renewable energy in gross final energy consumption by sector" [3] "Share of renewable energy in gross final energy consumption" [4] "Share of renewable energy in gross final energy consumption" [5] "Share of renewable energy in gross final energy consumption by sector" [6] "Share of renewable energy in gross final energy consumption by sector" [7] "Share of renewable energy in gross final energy consumption by sector" query$code
[1] "t2020_31"    "sdg_07_40"   "t2020_31"    "t2020_rd330" "sdg_07_40"
[6] "sdg_07_40"   "sdg_07_40"  

I would like to access the data in the first table. This can be accomplished using the get_eurostat() function. I access the table using its unique ID. I also indicate that I want labels to be generated for each column as opposed to using the original codes so that the data are more easily interpreted. I also indicate that I want the data aggregated by year.

I use the glimpse() function to inspect the data. This is similar to summary(); however, it is generally used on tibbles within the tidyverse. The data contains 5 columns.

• unit: unit of measurement (percentages)
• indic_eu: what is being measured (percent renewable energy use)
• geo: the geographic units (countries)
• time: time period (year)
• values: the actual measures (Percent of energy consumption from renewable energy by country)
re_en <- get_eurostat(id="t2020_31", time_fomat="date", type="label", select_time="Y")
glimpse(re_en)
Observations: 517
Variables: 5
$unit <fct> Percentage, Percentage, Percentage, Percentage, Perce...$ indic_eu <fct> Share of renewables, Share of renewables, Share of re...
$geo <fct> Albania, Austria, Belgium, Bulgaria, Cyprus, Czechia,...$ time     <date> 2004-01-01, 2004-01-01, 2004-01-01, 2004-01-01, 2004...
$values <dbl> 29.620, 22.659, 1.891, 9.445, 3.069, 6.855, 6.181, 14... Next, I a generating a new column in which to extract just the year from time attribute. This is accomplished by reformatting the date information. For example, we are converting a date from 2017-01-01 to just 2017. re_en$year <- format(as.Date(re_en$time, format="%Y-%m-%d"),"%Y") Graph Data Before I map the data, I will explore it using ggplot2. In the next code block I am creating a box plot to show the distribution of renewable energy per year. I am also using the viridis package to define a color palette. Since we have already discussed ggplot2, I will not spend time here discussing the specifics. You should be familiar with the syntax. Feel free to manipulate the code to alter the figure. The graph suggests a slight rise in renewable energy use over time when measured at the country-level. Also, there appears to be a lot of variability in renewable energy use between countries. ggplot(re_en, aes(x=as.factor(year), y=values, fill=year))+ geom_boxplot(color= "#302f2d")+ ggtitle("Percent Renewable Energy by Year")+ labs(x="Year", y="Percent")+ scale_y_continuous(expand= c(0,0), breaks = c(10, 20, 30, 40, 50, 60, 70, 80), labels = c("10%", "20%", "30%", "40%", "50%", "60%", "70%", "80%"), limits = c(0, 80))+ scale_fill_viridis(discrete=TRUE)+ theme_classic()+ theme(legend.position="None") Let’s explore the trend for a single country. First, I use dplyr to extract all records for Hungary. hungary_data <- re_en %>% filter(geo=="Hungary") Next, I create a time series graph with lines and points to visualize the trend. This suggests a study increase until 2015 followed by a decrease. ggplot(hungary_data, aes(x=as.numeric(year), y=values))+ geom_line(color="#13502d", size=1)+ geom_point(color="#13502d", size=2.5)+ ggtitle("Percent Renewable Energy")+ labs(x="Year", y="Percent")+ scale_y_continuous(expand= c(0,0), breaks = c(4, 6, 8, 10, 12, 14, 16, 18), labels = c("4%", "6%", "8%", "10%", "12%", "14%", "16%", "18%"), limits = c(4, 18))+ scale_x_continuous(expand= c(.015,.015), breaks = c(2004, 2006, 2008, 2010, 2012, 2014, 2016, 2018), labels = c("2004", "2006", "2008", "2010", "2012", "2014", "2016", "2018"), limits = c(2003, 2018))+ theme_classic() Let’s now compare multiple countries. First, I use dplyr to extract records associated with Hungary, France, or Finland. sub_data <- re_en %>% filter(geo=="Hungary" | geo == "France" | geo == "Finland") Graphing the data, we can see very similar trends and percentages for France and Hungary while Finland has a larger percentage of renewable energy use. ggplot(sub_data, aes(x=as.numeric(year), y=values, color=geo))+ geom_line(size=1)+ geom_point(size=2.5)+ ggtitle("Percent Renewable Energy")+ labs(x="Year", y="Percent", color="Country")+ scale_y_continuous(expand= c(0,0), breaks = c(5, 10, 15, 20, 25, 30, 35, 40, 45, 50), labels = c("5%", "10%", "15%", "20%", "25%", "30%", "35%", "40%", "45%", "50%"), limits = c(0, 50))+ scale_x_continuous(expand= c(.015,.015), breaks = c(2004, 2006, 2008, 2010, 2012, 2014, 2016, 2018), labels = c("2004", "2006", "2008", "2010", "2012", "2014", "2016", "2018"), limits = c(2003, 2018))+ theme_classic() Map Data I will now move on to explore the data using maps. I will demonstrate the cartography package. However, similar visualizations can be created using tmap. I first need to prepare the data for use in map space. Here is the process. 1. Use the get_eurostate_geospatial() function to obtain the geographic data. Here, I am indicating that I want to return an sf object, that I want boundaries as of 2016, and that I want country boundaries (NUTS classification (Nomenclature of territorial units for statistics) level 0). The resolution argument is used to specify the desired simplification of the boundaries. 2. I then write the country codes to a new data frame. These codes are made available in the eurostate package as eu_countries. 3. I convert the codes to factors and store them in a new field called “geo”. 4. I use dplyr and the left_join() function to join the codes to the renewable energy data using the common “geo” field. 5. I filter out just data for 2015. 6. In the geographic data, I create a new field called “code”" in which to copy the “CNTR_CODE” data. This is so that I have a common field name on which to join the tabulated and spatial data. 7. I join the geographic data and tabulated data for 2015 using the common “code” field. 8. Lastly, I remove any records that have missing data in the values field. The data are now ready to be mapped. euro_sf <- get_eurostat_geospatial(output_class = "sf", nuts_level = "0", year="2016") c_codes <- eu_countries c_codes$geo <- as.factor(c_codes$label) c_data <- left_join(c_codes, re_en, by="geo") c_data_2015 <- c_data %>% filter(year==2015) euro_sf$code <- euro_sf\$CNTR_CODE
map_data <- left_join(euro_sf, c_data_2015, by="code")
map_data2 <- map_data %>% filter(!is.na(values))

Next, I us the st_crop() function from sf to extract out only geographic features that occur in a defined geographic extent as specified using latitude and longitude. This is because there are some islands in the data set that I would like to exclude from the analysis. We only want countries on the continent, British Isles, or in the Mediterranean.

map_data3 <- st_crop(map_data2, xmin=-12, xmax=50, ymin=33, ymax=73)

The first map represents percent renewable energy using color, so this is a choropleth map. This is accomplished using the choroLayer() function. I am using a quantile classification method with 5 bins and a green palette. Note that cartography provides a set of color palettes.

choroLayer(x=map_data3, var="values", method="quantile", nclass= 5, col=carto.pal(pal1 = "green.pal", n1 =5))

In the next example, I am using the Fisher-Jenks or natural breaks classification method, which uses an algorithm to search for breaks in the data distribution to define the bins.

choroLayer(x=map_data3, var="values", method="fisher-jenks", nclass= 5, col=carto.pal(pal1 = "green.pal", n1 =5))

Here is an example using the equal interval classification.

The cartography package provides many legend classification methods including: quantile, equal interval, Fisher-Jenks, mean-standard deviation, and geometric progression.

choroLayer(x=map_data3, var="values", method="equal", nclass= 5, col=carto.pal(pal1 = "green.pal", n1 =5))

You can also define your own breaks using a manual legend classification.

choroLayer(x=map_data3, var="values", breaks= c(5, 10, 20, 30, 40, 50, 60), col=carto.pal(pal1 = "green.pal", n1=6))

The propSymbolsLayer() function will produce a proportional symbol map where a continuous variable is mapped to the size of a symbol. First, I plot the country boundaries using the base R plot() function. I then plot the proportional symbols over the boundaries. The inches argument defines the size of the largest symbol in inches. You can experiment with this setting to alter the size distribution and alleviate crowding or sparse symbology.

plot(st_geometry(map_data3), col = "tan")
propSymbolsLayer(x=map_data3, var="values", inches = .2, col="darkgreen", border="gray")

The legend.style argument can be defined as “c” and “e”, or compact and extended. Here, I am using an extended legend. The default is compact, as used in the prior map.

plot(st_geometry(map_data3), col = "tan")
propSymbolsLayer(x=map_data3, var="values", inches = .2, col="darkgreen", border="gray", legend.style="e")

You can combine color and size symbology using the propSymbolsChoroLayer() function. I am mapping the percent unemployment to both color and size; however, you could map different variables to each graphic parameter. I am also using the legend.var.pos and legend.var2.pos arguments to indicate the position of the two legends.

plot(st_geometry(map_data3), col = "tan")
propSymbolsChoroLayer(x=map_data3, var="values",var2="values", inches=.2, method="fisher-jenks", nclass=6, col=carto.pal(pal1 = "green.pal", n1=6), legend.var.style="c", legend.var.pos="bottomright", legend.var2.pos="bottomleft")

The typoLayer() function is used to make a qualitative choropleth map. In the code block below I am mapping the country abbreviation to color.

typoLayer(x=map_data3, var="code")

It is also possible to combine proportional symbols and qualitative colors using propSymbolsTypoLayer(). A continuous variable is mapped to size while a categorical variable would be mapped to color.

plot(st_geometry(map_data3), col = "tan")
propSymbolsTypoLayer(x=map_data3, var="values",var2="CNTR_CODE", inches=.2, legend.var.pos="bottomright", legend.var2.pos="bottomleft")

Dot density maps use the density of dots to represent a continuous variable. In the example, I am using dotDensityLayer() to create this type of thematic map. Using the n argument, I indicate that one dot should represent 2%. you can change this setting to alleviate crowding or sparse symbology. I have also defined a color for the points, a point symbol, and a point size. Quick-R provides a well organized reference of available point and line symbols.

plot(st_geometry(map_data3), col = "tan")
dotDensityLayer(x=map_data3, var="values", n=2, col="red", pch=20, cex=1)

The labelLayer() function is used to plot text as labels. Here, I am creating a base map with the country boundaries in tan and the labels placed above. I have included a halo and defined the desired font size.

plot(st_geometry(map_data3), col = "tan")
labelLayer(x=map_data3, txt="CNTR_CODE", halo=TRUE, font=12)

In this last example, I am further refining a choropleth map. Here are a few notes relating to this map.

1. I define a legend title and position it within the choroLayer() function.
2. Using layoutLayer() I define a title, add a north arrow, exclude a scale bar, add author and source text, and set the title text color to black and the background to gray.
choroLayer(x=map_data3, var="values", method="fisher-jenks", nclass= 5, col=carto.pal(pal1 = "green.pal", n1 =5), legend.title.txt="% Renewable", legend.pos= "bottomright")
layoutLayer(title="% Renewable Energy for EU Countries", north=TRUE, scale=FALSE, author="Prof. Maxwell", sources="Data from Eurostats", frame=FALSE, col="gray", coltitle="black")

As mentioned above, I prefer tmap as opposed to cartography to make map layouts in R. tmap provides a lot of symbology and customization options. It also is fairly intuitive and works similar to ggplot2. However, cartography does allow for the production of a wide rage of thematic maps, so it is worth exploring. I also encourage you to further explore the eurostate package, as it provides a lot of great data for Europe over different time periods and geographic aggregations.

Cheat sheets for eurostat and cartography can be found here.

We are now ready to move on and discuss interactive mapping with Leaflet.

Back to Course Page

Back to WV View