The concept is simple, provide a set of files that anyone can use to recreate your output in exactly the same fashion that you have. Reproducible Research is not new. The ability to look back and say “here is how we did that” – is worth the extra time required to do so. R is an effective way to generate large amounts of customized reporting in a very short time span, with limited room for error (with a bit of forethought) and a relatively small amount of time. This concept can be extended to generating reports for large distributions or even more customized or exploratory reporting.
With R this is quite simple. The most common method used is to create an *.Rnw file that loads all of your libraries, data and calculations within. Sharing this file makes it easy for others to simply reproduce your output and regenerate your code. Below is a really simple example of the results of a linear regression on auto-generated data. For the time being, we will not go into much detail about Sweave, but you should know – it’s the magic behind getting your R code into a format that can be converted into PDF.
Here are just a few usage case scenarios
- High quality, corporate themed articles and reports that are easy to duplicate or recreate
- Customized reports, for business intelligence, marketing research, or data mining exercises
- Reporting on financial data, or metrics.
- Human readable tables for model selection or parameter interpretation
- Exploratory data analysis: plot and tabulate everything, interpret later
The real challenge that surrounds reproducible research with R, is learning LaTex. It’s an old format and there are quite a few peculiarities that you will encounter that are neither immediately obvious nor intuitive. Let’s keep this in mind.
However, there are a few really useful places for information on using LaTex:
- Introduction to Latex Before you begin, you must read it and follow the installation instructions for your operating system. It is an excellent source of information for a beginner and tackles many of the frequently asked questions that would generally crush one’s soul otherwise
- Tex Stackexchange is where you go if your problem is more specific to LaTeX, you can always get a quick answer from real people there
- And of course, there’s Stack Overflow for questions pertaining to R
Without further ado, let’s create a small piece of reproducible research
- Here is the Rnw file
- And here is the resulting PDF
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | % Preamble \documentclass{Article} \usepackage{Sweave} % Start Document \begin{document} % Titlepage \begin{titlepage} \title{A Title} \author{Author's Name} \date {\today} \maketitle \end{titlepage} % Content Begins \section{A Linear Regression} Here, we generate some sample data. By default, sweave echoes the results - showing us what commands are run in R. But we can stop that in our code chunk by specifying "echo=FALSE". In the example below, we will generate random variables from the normal and uniform distribution, put them in a data frame, run a linear regression and output a table that shows us the pertinent information from the regression. % R Code Chunk <<echo=FALSE, results=tex>>= library(xtable) set.seed(1234) # Set seed for RNG x <- rnorm(100) # Generate a vector of 100 random numbers from the normal distribution y <- runif(100) # Generate a vector of 100 random numbers from the uniform distribution dat <- data.frame(x,y) # Create a data.frame object fit <- lm(y ~ x, dat) # Fit x and y as a linear model in the form y= x + a xtable(summary(fit)) # Output a latex summary of the results @ \subsection{Plotting Example} It is also quite simple to include plots and graphics. Resizing them, is quite painful if you are generating them automatically. My recommendation is to always save them as a seperate file, and then place them. But for the sake of example, here is a plot of Residuals vs. Fitted from our model and Normal Q-Q. Note, that when we want the output of a figure we have to specify "fig=TRUE" for our code chunk: % R Code Chunk <<echo=FALSE, fig=TRUE>>= plot(fit, which=1) @ % End Document \end{document} |
Now, it’s a fairly simple matter of setting your working directory and running a few more lines of code in R. If you’re using RStudio (which I highly recommend) you can simply open the Rnw file and click on “CompilePDF” on the toolbar. From the console:
1 2 3 | setwd('path/to/Example.Rnw/') Sweave('Example.Rnw') texi2pdf('Example.tex') |
That’s a wrap.



