From original post @ http://analyticsblog.mecglobal.it/en/analytics-tools/adobe-sitecatalyst-api-in-r/
Are you a heavy user of Sitecatalyst and the very famous R package RSitecatalyst to analyze your web analytics data and make insightful vizualitations?
Or wonder why the hell do you need to download lots of excel files to get a simple report from Adobe cloud?
Well today, we are going to solve a problem that everyone whos into reporting face all day. We'll show you what to do in order to provide analysts with more time to concentrate on the data, and zero time to aggregate them.
All you need is The R programming environment settled and we are ready to go:
First thing to do is obviously loading all the libraries we need and login using our username and token which you can find here:
library(RSiteCatalyst) library(sqldf) SCAuth("[USERNAME:COMPANY]","[TOKEN]")
What to do next is simply get our hands dirty with the API.
This is an example on how we can use it to get visits and time on site by tracking code:
elements<-GetElements("[WEBSITE_ID]") visits_per_day_by_tracking_code<-QueueTrended("[WEBSITE_ID]","[DATE_BEGIN]","[DATE_END]","visits",elements="trackingcode" ,date.granularity = "day", top="1000", start="0") time_per_day_by_tracking_code<-QueueTrended("[WEBSITE_ID]","[DATE_BEGIN]","[DATE_END]","totaltimespent",elements="trackingcode" ,date.granularity = "day", top="1000", start="0") visits_and_time_by_tracking_code<-merge(visits_by_tracking_code,time_by_tracking_code,by=c("name","datetime"),all.x=TRUE)
The problem with this data is that is definitely too detailed for our reporting scope. What we really need to know is how Campaigns have performed rather than tracking code.
To accomplish that, we're gonna upload our saint classification file we can download from Adobe and then we're ready to make our analysis easier. Once you have downloaded that here is the code to put all together.
library(xlsx) saint_data<-read.xlsx("[SAINT_CLASSIFICATION_FILE.xlsx]",1) metrics_by_campaign_placement<-merge(visits_and_time_by_tracking_code,saint_data,by.x="name", by.y="Key") landing_pages<-QueueRanked("[WEBSITE_ID]","[DATE_BEGIN]","[DATE_END]", c("visits","bounces"), elements = c("entrypage","trackingcode"), top="50000", start="0" ) landing_pages_by_campaign<-merge(landing_pages,saint_data,by.x="trackingcode", by.y="Key",all.x=TRUE)
As you can see the code is quite straight forward, all we need to do is basically a inner join over the key which is unique in the tracking code and in the data retrieved from the API.
The cool thing here is that we can attach each metric we think would be crucial for our analysis of understanding the advertising activities in the saint classification without the need of loading it into the platform; What we internally attach are key metrics such as advertising costs and impressions by placement, campaigns divided by KPIS such as brand awareness or business performance.
To give you an example of what you can do, here we provide you with the code to generate a chart which give you an idea of which advertising publisher is performing better in terms of visits,time on site and impressions:
Campaign_performance<-sqldf("select Campaigns,sum(visits) from metrics_by_campaign_placement group by Campaigns order by sum(visits)") Publisher_performance<-sqldf("select Publisher,sum(visits) as visits,sum(totaltimespent) time_on_site,sum(impressions) as impressions from metrics_by_campaign_placement group by Publisher order by sum(visits)") Placement_performance<-sqldf("select Ads,sum(visits),sum(totaltimespent),sum(impressions) from metrics_by_campaign_placement group by Ads order by sum(visits)") landing_pages_performance<-sqldf("select Campaigns,entrypage,sum(visits),sum(bounces)/sum(visits) as bouncerate from landing_pages_by_campaign group by Campaigns order by sum(visits) desc")
To summaries with a clear chart, here's the output you can expect with these data using the ggplot package:
# PLOT THE DATA USING GGPLOT2 ggplot(data=Publisher_performance, aes(x=Visits, y=Time_on_site)) + geom_point(aes(size=impressions,colour = placement)) + scale_size_continuous(range=c(2,15)) + theme(legend.position = "none")+ geom_text(aes(label=placement),hjust = 1.5 )
This week, the post is an interview with Michele Usuelli. Michele is the author of the book "R Machine Learning Essentials".
Hi, Michele. Welcome back to MilanoR. You're the second author of this blog, after Max Marchi, who wrote a book about R. How has this idea started?
Everything started when Pack Publishing contacted me on LinkedIn out of the blue. First, they proposed me to write a book about a well-know R tool to build charts: the ggplot2 package. Since I was more interested in writing about Machine Learning techniques rather than an R package, I didn't follow up. When a few weeks later they proposed me to write a book about R and Machine Learning, I was really enthusiast and it took just a few days to made up my mind and start writing.
You've written this book by yourself. How was your experience as an author?
Writing this book involved a lot of challenges: decomposing the fundamental Machine Learning concepts, explaining them clearly, using a good writing style, managing my time. Each of these challenges allowed me to learn new skills and to grow. Timewise speaking, it took about 10 hours per week for 5 months, mostly during the summer. I recommend to work on such a big project in the winter, when there are less things going on
Let’s get into the book. What kind of knowledge is expected from the audience? Should readers be a bit familiar with R? What about IT knowledge?
It'll definitely be helpful to be a little bit familiar with programming and/or statistics. However, no prior experience is required. The book starts from scratch explaining the ideas behind Machine Learning. Then, it leads the reader through a path rather than just explaining concepts. The more familiarity has the author, the faster it will be to go through the path.
If you had to choose an example from your book, which code chunk would you share with the readers of this blog?
I'd choose the last chapter since it displays a nice practical application. It is business-driven and at the same time it shows different branches of Machine Learning. The business challenge is planning a marketing campaign. In order to determine which customers will be more likely to subscribe, the book performs Supervised Learning on the data of a past campaign. In additions, it applies Unsupervised Learning techniques to explore the customer base.
Is there any suggestion you’d like to give to someone who wants to write a book about R?
The outcomes of writing a book has paid off the effort. Plus, R has recently become a hot topic, so there are a lot of interesting unexplored topics. Don't be worry in investing a part of your time in a similar project. Also, try to focus on the skills that you can improve rather than just on the final product.
From original post @ http://analyticsblog.mecglobal.it/analytics-tools/bashr/
In the world of data analysis, the term automation runs hand in hand with the term “scripting”. There’s not the best programming language, only the most suitable to perform the required function.
In our case, many data aggregation procedures are run from unix/linux servers, collecting API data in real time, so it becomes essential to make sure that data is formatted and correctly stored for the analysis/visualization needs.
In our case some automatic procedures run via cron at night, calling multiple R scripts with some parameters.
Our challenge was to ensure that R scripts could perform certain procedures or not, depending on the parameters passed via bash script. The question was: how to send parameters from bash script to R in real time?
The answer is very simple and two aspects needed to be considered: the bash script that invokes the R script passing the parameters, and the R script itself that must read the parameters.
In this example we will create a variable containing the date of yesterday (the variable “fileshort”) and we will pass this variable to R in order to save a file using the variable as filename.
Let’s start from the bash script:
#!/bin/bash data=`date --date=-1day +%Y%m%d` fileshort=test_$data.csv Rscript /home/file_repo/testfile.R $fileshort --save
As you can see a simple variable fileshort is created and then sent to R script. As for the syntax, to invoke R you can use either “Rscript” “R <“: the result will be identical.
Now it’s time to edir our R script. First we need tell our script to intercept the parameters/arguments passed by shell, checking them with the print method as you can see below:
args <- commandArgs() print(args)
on console R will print what follows:
 "/usr/lib/R/bin/exec/R"  "--slave"  "--no-restore"  "--file=/home/file_repo/testfile.R"  "--args"  "test_20150201.csv"  "--save"
In our case the required parameter is the filename, or “test_20150201.csv” which is the sixth element of the array .
At this point you just need to assign a variable with the element that interests us:
name <- args
and use our variable as we prefer. In our example to write a file:
require(lubridate) write.table(db_final,paste0(name), append = FALSE, quote = FALSE, sep = ",", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = FALSE, qmethod = c("escape", "double"), fileEncoding = "")
The generated file will have name “test_20150201.csv”
Photos of the 6th MilanoR meeting
Milano; December 18, 2014
Up to 4 Post-doctoral positions on computational economics/behavioural sciences/networks/decision theory
Up to 4 posts are available to advance the understanding of the socio-economics of climate change using innovative research methods from computational, behavioural and complexity sciences. The selected applicants will join new research teams led by Prof. Valentina Bosetti and Prof. Massimo Tavoni, at Bocconi University (Department of Economics) and Politecnico di Milano (Department of Management and Economics), respectively. These projects are funded by two independent grants awarded by the European Research Council (ERC), but share the common aim of advancing the understanding of individual and group behaviour in climate change mitigation, as well as developing a new class of integrated assessment models. The two Principal Investigators are keen on building on the momentum offered by these two large grants to create a new and stimulating research group. General information about the two ERC projects can be found here and here.
Researchers will be recruited to work on one of the following topics:
1. Modeling climate change decision making under uncertainty
2. Integrating uncertainty, risk biases and perception issues into climate change policy assessment
3. Social networks and complex dynamic systems
4. Behavioral climate economics
The selected applicants are expected to begin their assignment in Summer-Fall 2015 in Milan, Italy. Positions will last between 2 and 4 years, with yearly evaluations. Gross salary is negotiable and competitive with other research and academic institutions.
The candidates should have a Phd (or be close to completion) in either computational economics/science, behavioural economics, or complex systems/networks. More senior (e.g. assistant professor) or junior (e.g. Phd student) candidates can be considered for positions (3) and (4). Joint spouse applications are welcome (please specify).
How to apply
Applicants should send:
- Detailed curriculum
- Cover letter
Two letters of recommendation will be required for shortlisted candidates.