Great news for the MilanoR community: we are launching R-Lab, a monthly R project of co-working with R on real data science projects.
Either if you are an R expert, a beginner, or you just curious, you are welcome to join us! The first event will be on March 14th, in Mikamai, Milano (more info below).
What is an R-Lab?
The R-Labs are evening kind of mini-hackathons where we work together with R on a real problem.
In short, this is the plan:
- A company/institution/university proposes a real problem, and teaches something about that issue
- We work together on the solution, possibly having fun
- Everything we do is released on Github
We hope to create a space where R users and institutions can meet and exchange their knowledge and needs.
Where can I join the group?
You can join the group on the Meetup platform here:
R-Lab #1 - March 14th, Milano
You can see the event and join from here: https://www.meetup.com/R-Lab-Milano/events/238081972/
For the first event, we are ready to start with the following agenda:
18:45 : Meeting at Mikamai
19:00 : Intro: The R-Lab format + Theme of the day: RStudio addins: a shortcut to your favourite functionalities
19:40 : Get some nice pizza (for those who want it)
19:45 : Coding together in small groups + eating the above pizza
22:00 : Comments and feedbacks
22:30 : Goodbye and see you soon!
A bit more about the theme of the day
Most of you may be familiar with RStudio, but most likely not so many of you know an RStudio feature which is called Addin. Addins are a way of interactively executing R functions from RStudio user interface, either through keyboard shortcuts, or through an interactive menu. In other words, you have a way to call a function, that you often use, with a simple shortcut; more than this, you can interact with RStudio itself, e.g. with the cursor position, the file content etc, to the extent of building addins as local shiny gadgets!
During this first R-Lab we are going to introduce the Addins and how they work.Then we’ll try to address some open problems building ad hoc Addins, working together in small groups.
Everything we build will be collected on Github and made available for everybody.
Admission is free!
H:18.45, March 14th
We will be hosted by Mikamai and LinkMe in their location: Via Giulio e Corrado Venini, 42 (very close to the Pasteur metro station).
What to bring
Be sure to bring your own computer, possibly with the latest version of RStudio
For any additional info, please contact us via meetup: https://www.meetup.com/it-IT/R-Lab-Milano/
Looking forward to seeing you there!!!
Ho appena terminato una nuova fatica!
Ho messo in coda al manuale introduttivo all’uso di R ( vers. 1.06) una corposa appendice sull’ uso di ggplot2 ( ultima versione).
Spero, come al solito, che la cosa possa risultare utile.
Potete trovare il file .PDF qui
R is a powerful statistical and programming language. Despite its reputation of being hard to learn, it is more and more used in different areas of research and has become an essential tool in oceanography and marine ecology. For instance, R is specifically used to read, process and represent in situ oceanographic data through the use of specific packages (e.g. oce) or more generally, to manage satellite data in order to produce high temporal and spatial resolution maps useful to synoptically explore and monitoring vast areas of the world oceans.
In this post we briefly describe a practical use of R in conjunction with satellite data to identify marine bioregions of the Labrador Sea (an arm of the North Atlantic Ocean between the Labrador Peninsula and Greenland) with different patters in the phytoplankton seasonal cycle (https://en.wikipedia.org/wiki/Phytoplankton). Phytoplankton, are microscopic plants that occupy the lowest level of the marine food chain. Their presence in the surface water is revealed because of their chlorophyll-a and other photosynthetic pigments, which changes the color of ocean waters. Nowadays, satellite ocean color sensors are routinely used to estimate the concentrations of chlorophyll-a and other parameters in the surface water of the oceans. All this data are freely available for research and educational purposes.
The approach used for the identification of the bioregions is therefore based on the use of the chlorophyll-a concentration, an index for phytoplankton biomass. The Globcolour project (http://www.globcolour.info), which combines data from several satellites to reduce spatial and temporal gaps, provides a set of different satellite parameters including estimates of chlorophyll-a. The data are provided at several temporal (daily images, 8-day composite images and monthly averages) and spatial (1 km, 25 km, 100 km) resolutions and stored into NetCDF (https://en.wikipedia.org/wiki/NetCDF) files, a format that include metadata information in addition to the data sets. In our case, among other information, each file contains latitude and longitude values to identify each pixel on the grid. For our purpose we downloaded 8-day composite images (about one image every week, from the year 1998 to 2015) with a spatial resolution of 25 km. To work with NetCDF files we used the R package ncdf4, which replaces the former ncdf package.
Once the time series have been downloaded and unzipped (.nc files), to reach our objective several steps were needed:
By using the functions nc_open and ncvar_get contained in the R package ncdf4, the .nc files were opened and the chlorophyll-a values (pixels) extracted together with the spatial coordinates and date.
Subsequently, by assigning to each pixel the corresponding values of latitude and longitude, id-pixel (i.e. each pixel was numbered) and id-date (i.e. year, month and day of the year) a large data frame was created. Basically, within the data frame each pixel was identified uniquely.
A 8-day climatological time series of chlorophyll-a concentrations was created by averaging over the period 1998-2015 each pixel within the area of interest (i.e. averaging all the first weeks, all the second weeks, etc.).
The resulting time series was normalized (https://en.wikipedia.org/wiki/Feature_scaling) in order to scale values between 0 and 1.
On the normalized climatology previously obtained (see point 3 and 4), a cluster analysis was carried out to identify marine regions of similarity (clusters).
To perform the cluster analysis we used the function k-means (package stats). The Calinski-Harabasz index was used to evaluate the optimal number of clusters. However, more detailed information about the procedure previously described can be found in D'Ortenzio and Ribera d'Alcalà 2009 and Lacour et al. 2015.
The final outcome of this analysis is shown in the figure below.
As we can see two main areas were identified: the bioregion 1 (the yellow area) located north of about 60°N and the bioregion 2 (green area) located south of 60°N. The two bioregions present a different climatological phytoplankton biomass cycle (bloom). In the northern part (bioregion 1) of the Labrador Sea the bloom starts earlier (around day 102 - dashed line in the figure) and it is more intense (more than 1.75 mg/m3). Conversely, in the southern part (bioregion 2) the bloom starts later (day 128) and it is less intense (less than 1.75 mg/m3). Note that, for simplicity, the bloom onset (represented by the dashed line and usually used as a warning bell for possible changes in trophic interactions and biogeochemical processes) was identified as the time when the chlorophyll-a concentration increases to the threshold of 1.0 mg/m3. Finally, the figure was created by using three R packages: rasterVis, ggplot2 and gridExtra.
Overall, the simple example used here has shown how the concomitant use of statistical methods implemented through the use of R and satellite data can help to characterize vast oceanic areas and thus to better illustrate ecosystems functioning and possibly their response to environmental changes.
D'Ortenzio F., Ribera d'Alcalà M. (2009) On the trophic regimes of the Mediterranean Sea: a satellite analysis. Biogeosciences, 6, 139-148
Lacour, L., Claustre H., Prieur L., D’Ortenzio F. (2015), Phytoplankton biomass cycles in the North Atlantic subpolar gyre: A similar mechanism for two different blooms in the Labrador Sea, Geophysical Research Letters, 42
The post Using R and satellite data to identify marine bioregions appeared first on MilanoR.
R Users!! Are you ready for an awesome news?!
Microsoft, in collaboration with Quantide, is offering a one-day live course in Milano.
And, listen up …. the course is free and open to everybody!
If you want to deepen your data analysis knowledge using the most modern R tools, or you want to grasp if R is the right solution for you, this is the right class!
The topics range from the first R session to data manipulation, visualization and discovery, from data import and export to data modelling and mining: a wide overview of R as a data science tool.
In particular, we will cover:
- A bit of R history and online resources
- R and R-Studio installation and configuration
- Your first R session
- Data import from external sources
- Data manipulation and data discovery with R
- Data visualization with R
- Statistical models and Data Mining with R
The course will take place on Tuesday 13 December at Microsoft Innovation Campus in Via Lombardia, 2/a, Peschiera Borromeo in province of Milano and will last for an entire working day, from 9:00 in the morning to 18:00 in the evening. Don’t worry about lunch, it is included. You have only to bring your laptop with the latest version of R and R-Studio.
We apologize to non Italian-speakers but the course will be taught in Italian by Andrea Spanò, who is a Rstudio certificated instructor who has worked as an R trainer and consultant for over 20 years.
Be beware: the course is open to a maximum of 30 participants, so if you wish to participate you have to reserve a seat here:
For more information, please contact us at training[at]quantide[dot]com
The 7th MilanoR Meeting went great, you were 45 and we were glad of the influx and the interest you've shown for the two talks.
The MilanoR Facebook page was there with us to support the event with the live streaming and it's growing fast:check it out to be always updated on event, new articles and future meetings about R.
We have collected all slides and materials of the meeting for you and now they're available online, you can find them down below.
by Mariachiara Fortuna, Quantide
Talk 1: R and Big Data
Interactive big data analysis with R: SparkR and MongoDB: a friendly walkthrough
by Thimoty Barbieri, Marco Biglieri
Video presentation: Interactive Data Analysis with SparkR
Talk 2: R and Statistical Learning
Power consumption prediction based on statistical learning techniques
by Davide Pandini