In the R environment, different packages to draw maps are available. I lost the count by now; surely, sp and ggmap deserve consideration. Despite the great availability of R functions dedicated to this topic, in the past, when I needed to draw a very basic map of Italy with regions marked with different colours (namely a choropleth map), I had a bit of difficulties.
My expectation was that building a choropleth map of Italy using R was a extremely trivial procedure, but my experience was different. In fact, if the aim is to represent a map of United States, the most part of the available functions are very easy to use. However, to draw a map of Italy, the procedures become a bit complicated if compared to the banality of the chart (a good tutorial - in Italian – can be found here).
I wasn’t the only one R user to have this problem. Some time ago, in the community Statistica@Ning, Lorenzo di Blasio proposed a good solution using ggplot2. Summarizing the code proposed by Lorenzo, I assembled a first function capable to create a map in a easy and fast way. Finally, Nicola Sturaro of MilanoR group has strongly improved and completed the code and created a new package: mapIT.
Currently, the package mapIT is located into a repository on GitHub. In order to install the package, you can use the package devtools:
In my first use of mapIT, I had to map the number of wineries taken into account in a research regarding Italian wine evaluations. I need to visualize, for each region, the number of wineries whose wines were reviewed. In the following code, there are the data; for each Italian region (first column) the number of wineries (second column) is reported.
wine <- data.frame( Region = c("Abruzzo","Basilicata","Calabria","Campania", "Emilia-Romagna","Friuli-Venezia Giulia","Lazio", "Liguria","Lombardia","Marche","Molise","Piemonte", "Puglia","Sardegna","Sicilia","Toscana", "Trentino-Alto Adige","Umbria","Valle d\'Aosta","Veneto"), Wineries = c(22,8,9,35,24,74,19,8,41,29,5,191,22,14,40,173,57,29,6,92) )
The names of regions can be written both in lowercase and in uppercase. Spaces and other non-alphabetical characters will be ignored. So, you can write indifferently: ‘Trentino-Alto Adige’, ‘Trentino Alto Adige’ or ‘TrentinoAltoAdige’. For regions with bilingual denomination, only the Italian wording is accepted.
To build the map, the package mapIT make available the namesake function mapIT(). The first argument to pass to the function is the numeric variable (Wineries) and the second one is the variable specifying the Italian region (Region). A third argument can be used to specify the data frame from which extract the variables.
Further, there are some additional arguments useful to modify the graphic style. In the following example I used guide.label, which specifies the title label for the legend.
library(mapIT) mapIT(Wineries, Region, data=wine, guide.label="Number of\nwineries")
Easy, right? It was enough to load the package and launch a brief row of code!
The chart can be customized in several ways. The main argument allowing to alter the graphic details is graphPar, consisting in a long list of arguments (for details, see the help function).
One of the first things we want to do, surely will be alter the colours. To alter the colours, you must specify, in the graphPar list, the colours for the minimum value (low) and for the maximum value (high):
gp <- list(low="#fff0f0", high="red3")
For convenience I saved the list into the object gp. Note that colours can be specified using both the hexadecimal code and the R keywords for colours.
mapIT(Wineries, Region, data=wine, guide.label="Number of\nwineries", graphPar=gp)
You can play with colours to find your preferred arrangement. To identify the hexadecimal code for colours, a fast solution is to use a web applications as RGB color picker.
The low and high values of graphPar can be used to convert the chart in black and white. In this case, to make the chart a bit more pleasant, it’s possible use the themes of ggplot2. In the examples below, the first map (left panel) was built using the theme theme_bw, while the second map (right panel) was built using the theme theme_grey.
library(ggplot2) # Theme: black and white gp <- list(low="white", high="gray20", theme=theme_bw()) mapIT(Wineries, Region, data=wine, guide.label="Number of\nwineries", graphPar=gp) # Theme: grey gp <- list(low="white", high="gray20", theme=theme_grey()) mapIT(Wineries, Region, data=wine, guide.label="Number of\nwineries", graphPar=gp)
Still there are different features to implement and, in the future, some things can be changed. If you has some ideas to improve mapIT, or you found a malfunctioning, you can open an issue on GitHub.
The last MilanoR meeting will be held on December 18, 2014. More than 35 R-enthusiasts join the meeting. Please leave your contribution: what you seek in this crazy community?
by Nicola Sturaro
Consultant at Quantide
Shine your Rdata: multi-source approach in media analysis for telco industry
by Giorgio Suighi (Head Of Analytics), Carlo Bonini (Data Scientist), Paolo Della Torre and Gianluca D’Innocenzo (ROI Managers), MEC
Think different your R data: dplyr
by Romain Francois, R Developer, co-author of dplyr
Contrary to general expectations, or at least to my expectations, the logical and analytical concepts behind statistics are rather difficult to understand by engineers. In general, despite their heavy background in maths and their above average fluency with computer programming, there seems to be a broken bridge with the statistical world and often they prefer to stay in the safe "math bank" of the river, despite the road to statistical fluency is shorter than they think.
For example, I noticed the aforementioned issue as I heard engineers referring (inappropriately) to the Law of Large Numbers. Most of the time they were just talking of the so called "Gambler’s fallacy" without having idea of neither of the two.
Worse than the LLN, the t-test and the sampling distribution appear as inscrutable concepts and too heavily affected by a “stochastic mechanisms”, too far away from the rational deterministic approach of engineering. I had definitely confirmation of my thoughts when scrolling through R bloggers in a rainy Saturday afternoon I stumbled in the following video:
In this video, for the first time I saw the essence of R: a collaborative effort that puts together a huge amount of different knowledge and returns back something to everyone. If engineers can program a computer (as they can indeed), they could have direct access to the deepest, most fundamental ideas in statistics (John Rauser – Data Scientist at Pinterest). And this is true in general, for anyone able to/willing to program a computer.
As R requires at the very end of its processes micro-simple-operations, the programming/computational approach it allows to carry out, permits to completely unfold the statistical process behind the complex formulae, so that everyone can understand the deepest meaning of the statistical science.
And at the end of the day, what is really important for R and its community is that the collaborative effort behind the development of the software "make the complicated simple", so that an always bigger audience could understand its power and the science behind.
Piccolissima integrazione al Manuale introduttivo all’uso di R, cui ho aggiunta una appendice
sulla utilizzazione del package data.table, e della struttura che introduce, i data.table, appunto.
I data.table sono come i data.frame ma molto, molto più veloci e anche con una sintassi più compatta
e, dopo un primo apprendimento, semplice.
La documentazione originale è frammentata e non ben organizzata, quindi ho fatto un po’ d’ordine, almeno spero.
Al solito lo trovate qui
Romain Francois is a 32-years old R developer and consultant. He defines himself a "Professional R Enthusiast" and r-enthusiasts.com is his website. He co-authored several R packages, such as dplyr and Rcpp.
Romain Francois writes a world famous blog about R and the R Graph Gallery, that showcases hundreds of examples of data visualization with R.
I am honored to announce that Romain Francois is the special guest of the next MilanoR Meeting.