An Introduction to Reproducible Research with R Part 1

The concept is simple, provide a set of files that anyone can use to recreate your output in exactly the same fashion that you have. Reproducible Research is not new. The ability to look back and say “here is how we did that” – is worth the extra time required to do so. R is an effective way to generate large amounts of customized reporting in a very short time span, with limited room for error (with a bit of forethought) and a relatively small amount of time. This concept can be extended to generating reports for large distributions or even more customized or exploratory reporting.

With R this is quite simple. The most common method used is to create an *.Rnw file that loads all of your libraries, data and calculations within. Sharing this file makes it easy for others to simply reproduce your output and regenerate your code. Below is a really simple example of the results of a linear regression on auto-generated data. For the time being, we will not go into much detail about Sweave, but you should know – it’s the magic behind getting your R code into a format that can be converted into PDF.

Here are just a few usage case scenarios

  • High quality, corporate themed articles and reports that are easy to duplicate or recreate
  • Customized reports, for business intelligence, marketing research, or data mining exercises
  • Reporting on financial data, or metrics.
  • Human readable tables for model selection or parameter interpretation
  • Exploratory data analysis: plot and tabulate everything, interpret later

The real challenge that surrounds reproducible research with R, is learning LaTex. It’s an old format and there are quite a few peculiarities that you will encounter that are neither immediately obvious nor intuitive. Let’s keep this in mind.

However, there are a few really useful places for information on using LaTex:

  1. Introduction to Latex Before you begin, you must read it and follow the installation instructions for your operating system. It is an excellent source of information for a beginner and tackles many of the frequently asked questions that would generally crush one’s soul otherwise
  2. Tex Stackexchange is where you go if your problem is more specific to LaTeX, you can always get a quick answer from real people there
  3. And of course, there’s Stack Overflow for questions pertaining to R

Without further ado, let’s create a small piece of reproducible research

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
% Preamble
\documentclass{Article}
\usepackage{Sweave}
 
% Start Document
\begin{document}
 
% Titlepage
\begin{titlepage}
\title{A Title}
\author{Author's Name}
\date {\today}
\maketitle
\end{titlepage}
 
% Content Begins
 
\section{A Linear Regression}
 
Here, we generate some sample data. By default, sweave echoes the results - showing
us what commands are run in R. But we can stop that in our code chunk by specifying
 "echo=FALSE". In the example below, we will generate random variables from the
 normal and uniform distribution, put them in a data frame, run a linear regression
 and output a table that shows us the pertinent information from the regression.  
 
% R Code Chunk
<<echo=FALSE, results=tex>>=
library(xtable)
set.seed(1234) # Set seed for RNG
x <- rnorm(100) # Generate a vector of 100 random numbers from the normal distribution
y <- runif(100) # Generate a vector of 100 random numbers from the uniform distribution 
dat <- data.frame(x,y) # Create a data.frame object
fit <- lm(y ~ x, dat) # Fit x and y as a linear model in the form y= x + a
xtable(summary(fit)) # Output a latex summary of the results
@ 
 
\subsection{Plotting Example}
It is also quite simple to include plots and graphics. Resizing them, is quite
 painful if you are generating them automatically. My recommendation is to always
 save them as a seperate file, and then place them. But for the sake of example,
 here is a plot of Residuals vs. Fitted from our model and Normal Q-Q. Note, that
 when we want the output of a figure we have to specify "fig=TRUE" for our code chunk:
 
% R Code Chunk
<<echo=FALSE, fig=TRUE>>=
plot(fit, which=1)
@
 
% End Document
\end{document}

Now, it’s a fairly simple matter of setting your working directory and running a few more lines of code in R. If you’re using RStudio (which I highly recommend) you can simply open the Rnw file and click on “CompilePDF” on the toolbar. From the console:

1
2
3
setwd('path/to/Example.Rnw/')
Sweave('Example.Rnw')
texi2pdf('Example.tex')

That’s a wrap.

Posted in Articles, R | Leave a comment

Remove “Leave a response” from hybrid theme

If you are using disqus to manage your websites feedback and commentary. You may prefer to remove the built in hooks for WordPress’s build in comment system while using the hybrid theme1.

In ./wp-content/themes/hybrid/functions.php

On line 242, find the following code:

242
$meta = '' . __( '[entry-terms taxonomy="category" before="Posted in "] [entry-terms taxonomy="post_tag" before="| Tagged "] [entry-comments-link before="| "]', hybrid_get_textdomain() ) . '';

And replace it with the following line:

242
$meta = '' . __( '[entry-terms taxonomy="category" before="Posted in "] [entry-terms taxonomy="post_tag" before="| Tagged "]', hybrid_get_textdomain() ) . '';

Voila, you’re finished.

Posted in Articles | Leave a comment

Loading Fundserve Data into R

If, like me, you do a bit of work with mutual fund data across a number of sources, you’ll eventually need to pull in the dealer codes or the fund codes. Below is a working example of how to pull in active dealership codes

1
2
3
4
5
library(XML)
theurl <- "http://www.fundserv.com/services/code-lists.php?status_type=1&file_type=d"
dealer.codes <- readHTMLTable(theurl)
n.rows <- unlist(lapply(dealer.codes, function(t) dim(t)[1]))
dealer.codes <- dealer.codes[[which.max(n.rows)]]

Minor adjustments of the method above can be used to pull any most types of tabular data from a webpage to R.

Posted in Articles, R | Leave a comment
-->