Ho appena terminato una nuova fatica!
Ho messo in coda al manuale introduttivo all’uso di R ( vers. 1.06) una corposa appendice sull’ uso di ggplot2 ( ultima versione).
Spero, come al solito, che la cosa possa risultare utile.
Potete trovare il file .PDF qui
R is a powerful statistical and programming language. Despite its reputation of being hard to learn, it is more and more used in different areas of research and has become an essential tool in oceanography and marine ecology. For instance, R is specifically used to read, process and represent in situ oceanographic data through the use of specific packages (e.g. oce) or more generally, to manage satellite data in order to produce high temporal and spatial resolution maps useful to synoptically explore and monitoring vast areas of the world oceans.
In this post we briefly describe a practical use of R in conjunction with satellite data to identify marine bioregions of the Labrador Sea (an arm of the North Atlantic Ocean between the Labrador Peninsula and Greenland) with different patters in the phytoplankton seasonal cycle (https://en.wikipedia.org/wiki/Phytoplankton). Phytoplankton, are microscopic plants that occupy the lowest level of the marine food chain. Their presence in the surface water is revealed because of their chlorophyll-a and other photosynthetic pigments, which changes the color of ocean waters. Nowadays, satellite ocean color sensors are routinely used to estimate the concentrations of chlorophyll-a and other parameters in the surface water of the oceans. All this data are freely available for research and educational purposes.
The approach used for the identification of the bioregions is therefore based on the use of the chlorophyll-a concentration, an index for phytoplankton biomass. The Globcolour project (http://www.globcolour.info), which combines data from several satellites to reduce spatial and temporal gaps, provides a set of different satellite parameters including estimates of chlorophyll-a. The data are provided at several temporal (daily images, 8-day composite images and monthly averages) and spatial (1 km, 25 km, 100 km) resolutions and stored into NetCDF (https://en.wikipedia.org/wiki/NetCDF) files, a format that include metadata information in addition to the data sets. In our case, among other information, each file contains latitude and longitude values to identify each pixel on the grid. For our purpose we downloaded 8-day composite images (about one image every week, from the year 1998 to 2015) with a spatial resolution of 25 km. To work with NetCDF files we used the R package ncdf4, which replaces the former ncdf package.
Once the time series have been downloaded and unzipped (.nc files), to reach our objective several steps were needed:
By using the functions nc_open and ncvar_get contained in the R package ncdf4, the .nc files were opened and the chlorophyll-a values (pixels) extracted together with the spatial coordinates and date.
Subsequently, by assigning to each pixel the corresponding values of latitude and longitude, id-pixel (i.e. each pixel was numbered) and id-date (i.e. year, month and day of the year) a large data frame was created. Basically, within the data frame each pixel was identified uniquely.
A 8-day climatological time series of chlorophyll-a concentrations was created by averaging over the period 1998-2015 each pixel within the area of interest (i.e. averaging all the first weeks, all the second weeks, etc.).
The resulting time series was normalized (https://en.wikipedia.org/wiki/Feature_scaling) in order to scale values between 0 and 1.
On the normalized climatology previously obtained (see point 3 and 4), a cluster analysis was carried out to identify marine regions of similarity (clusters).
To perform the cluster analysis we used the function k-means (package stats). The Calinski-Harabasz index was used to evaluate the optimal number of clusters. However, more detailed information about the procedure previously described can be found in D'Ortenzio and Ribera d'Alcalà 2009 and Lacour et al. 2015.
The final outcome of this analysis is shown in the figure below.
As we can see two main areas were identified: the bioregion 1 (the yellow area) located north of about 60°N and the bioregion 2 (green area) located south of 60°N. The two bioregions present a different climatological phytoplankton biomass cycle (bloom). In the northern part (bioregion 1) of the Labrador Sea the bloom starts earlier (around day 102 - dashed line in the figure) and it is more intense (more than 1.75 mg/m3). Conversely, in the southern part (bioregion 2) the bloom starts later (day 128) and it is less intense (less than 1.75 mg/m3). Note that, for simplicity, the bloom onset (represented by the dashed line and usually used as a warning bell for possible changes in trophic interactions and biogeochemical processes) was identified as the time when the chlorophyll-a concentration increases to the threshold of 1.0 mg/m3. Finally, the figure was created by using three R packages: rasterVis, ggplot2 and gridExtra.
Overall, the simple example used here has shown how the concomitant use of statistical methods implemented through the use of R and satellite data can help to characterize vast oceanic areas and thus to better illustrate ecosystems functioning and possibly their response to environmental changes.
D'Ortenzio F., Ribera d'Alcalà M. (2009) On the trophic regimes of the Mediterranean Sea: a satellite analysis. Biogeosciences, 6, 139-148
Lacour, L., Claustre H., Prieur L., D’Ortenzio F. (2015), Phytoplankton biomass cycles in the North Atlantic subpolar gyre: A similar mechanism for two different blooms in the Labrador Sea, Geophysical Research Letters, 42
The post Using R and satellite data to identify marine bioregions appeared first on MilanoR.
R Users!! Are you ready for an awesome news?!
Microsoft, in collaboration with Quantide, is offering a one-day live course in Milano.
And, listen up …. the course is free and open to everybody!
If you want to deepen your data analysis knowledge using the most modern R tools, or you want to grasp if R is the right solution for you, this is the right class!
The topics range from the first R session to data manipulation, visualization and discovery, from data import and export to data modelling and mining: a wide overview of R as a data science tool.
In particular, we will cover:
- A bit of R history and online resources
- R and R-Studio installation and configuration
- Your first R session
- Data import from external sources
- Data manipulation and data discovery with R
- Data visualization with R
- Statistical models and Data Mining with R
The course will take place on Tuesday 13 December at Microsoft Innovation Campus in Via Lombardia, 2/a, Peschiera Borromeo in province of Milano and will last for an entire working day, from 9:00 in the morning to 18:00 in the evening. Don’t worry about lunch, it is included. You have only to bring your laptop with the latest version of R and R-Studio.
We apologize to non Italian-speakers but the course will be taught in Italian by Andrea Spanò, who is a Rstudio certificated instructor who has worked as an R trainer and consultant for over 20 years.
Be beware: the course is open to a maximum of 30 participants, so if you wish to participate you have to reserve a seat here:
For more information, please contact us at training[at]quantide[dot]com
The 7th MilanoR Meeting went great, you were 45 and we were glad of the influx and the interest you've shown for the two talks.
The MilanoR Facebook page was there with us to support the event with the live streaming and it's growing fast:check it out to be always updated on event, new articles and future meetings about R.
We have collected all slides and materials of the meeting for you and now they're available online, you can find them down below.
by Mariachiara Fortuna, Quantide
Talk 1: R and Big Data
Interactive big data analysis with R: SparkR and MongoDB: a friendly walkthrough
by Thimoty Barbieri, Marco Biglieri
Video presentation: Interactive Data Analysis with SparkR
Talk 2: R and Statistical Learning
Power consumption prediction based on statistical learning techniques
by Davide Pandini
This is an update on the 7h MilanoR meeting and MilanoR new Facebook page.
You would like to attend the meeting but could not get a ticket? Follow the live event online!
A few days after the announcement of the meeting, all of the tickets were already booked, leaving many interested people without the opportunity to attend. There have been a lot of requests for more tickets but sadly the space is limited therefore they are very likely to remain unfulfilled. This is, of course, a bad news for those who would like to attend the event but could not get a ticket, on the other hand though it is a positive feedback on the growing interest around R and its community.
In order to compensate for the lack of additional tickets, the MilanoR staff has decided to live stream the event online so that everyone can attend the two main talks.
Where and when you can watch the event
The live event will be streamed on the new MilanoR Facebook page at around 6.30 PM on Thursday 27th October. Just go on MilanoR’s Facebook page and check out the live event.
If you would like to know more about the event and the talks, you can check out the official schedule here.
Do not hesitate to leave a comment, ask questions and share your observations.
Sidenote on the new Facebook page
As just mentioned above, MilanoR has just created its very own Facebook page as a new way to promote the R community further. On the page you will find R related news, articles, useful resources, such as the new free R introductory course and other news from the community. Check it out and feel free to contribute with your comments and suggestions.