Monday, September 29, 2014

Pander Tables with No Emphasized Columns or Rows

Pander is a great R package for writing pandoc markdown from R, for example, from an R markdown (*.rmd) document. (R is an open-source statistics platform; pandoc converts documents to different formats [e.g., markdown to HTML or HTML to LaTeX].) Pander is great because it makes it easy to integrate statistical analyses (conducted in R) into reports written in pandoc markdown, which pandoc can then easily convert to nearly any other format you would want.

Default Behavior

Pander can write paragraphs, lists, and tables out to pandoc markdown—all very useful. But its table-writing functions have at least one problem: they automatically emphasize (i.e., bold) row labels, and pander provides no option to change this behavior.1

Here is an example to illustrate the default behavior. Create an R markdown document with the following contents:

# Load pander
library(pander) 

# Get a small dataframe for demonstration purposes
df <- subset(mtcars, select=c("mpg","hp","wt"))[1:5,] 

# Print out the dataframe as a markdown table using pander
pander(df)

If you process the file with R, it creates a markdown file with the following contents:

----------------------------------------
        &nbsp;           mpg   hp   wt  
----------------------- ----- ---- -----
     **Mazda RX4**       21   110  2.62 

   **Mazda RX4 Wag**     21   110  2.875

    **Datsun 710**      22.8   93  2.32 

  **Hornet 4 Drive**    21.4  110  3.215

 **Hornet Sportabout**  18.7  175  3.44
----------------------------------------

The double-asterisks enclosing each term in the first column will bold those terms when converted to another format, such as HTML. If the file above is called demo.md, it can be converted to HTML with the following command in a terminal:

pandoc -r markdown -w html -o demo.html demo.md

And here are the contents of the file the command generates (called demo.html):

mpg hp wt
Mazda RX4 21 110 2.62
Mazda RX4 Wag 21 110 2.875
Datsun 710 22.8 93 2.32
Hornet 4 Drive 21.4 110 3.215
Hornet Sportabout 18.7 175 3.44

It’s an HTML table (with no border), all the columns are center-justified, and all the row labels appear in bold.

I’m not sure why this behavior is the default, let alone why it can’t be changed. Two of the style guides that I am most familiar with, APA and the Chicago Manual of Style, do not recommend setting row labels in bold. (These are also two of the most popular style guides for writing in American English.) In fact, I can’t think of any style guide that recommends setting row labels in bold. Such style guides could exist, but they’re clearly in the minority on this element of formatting. So without some kind of workaround, pander produces tables that can’t be used in most professional documents.

The Fix

It happens that I need to produce professional-looking tables using pander, so I wrote the following function in R:

no.emphasis.table <- function(df){
  the.row.names <- rownames(df) 

  # For some reason, when 'pandoc' writes the markdown 
  # table to LaTeX, it doesn't make the first column 
  # wide enough unless some padding is added to the row 
  # names
  add.space <- function(x){
    return(paste0(x, "&nbsp;"))
  }
  the.row.names.m <- as.vector(sapply(the.row.names, add.space))
  rownames(df) <- NULL
  df <- cbind(the.row.names.m, df)
  colnames(df)[1] <- '' 

  # Set horizontal justification for columns
  v.justify <- vector()
  v.justify[seq(1, length(df))] <- 'center'
  v.justify[1] <- 'left'
  set.alignment(v.justify) 
  return(df)
}

This function can then wrap data frames before being called by pander as shown here:

pander(no.emphasis.table(df))

When the original table is processed with this new command, the output is:

 ----------------------------------------
 &nbsp;                   mpg   hp   wt  
 ----------------------- ----- ---- -----
 Mazda RX4&nbsp;          21   110  2.62 
 
 Mazda RX4 Wag&nbsp;      21   110  2.875
 
 Datsun 710&nbsp;        22.8   93  2.32 
 
 Hornet 4 Drive&nbsp;    21.4  110  3.215
 
 Hornet Sportabout&nbsp; 18.7  175  3.44 
 ----------------------------------------

Pandoc converts the above markdown to the following HTML:

mpg hp wt
Mazda RX4 21 110 2.62
Mazda RX4 Wag 21 110 2.875
Datsun 710 22.8 93 2.32
Hornet 4 Drive 21.4 110 3.215
Hornet Sportabout 18.7 175 3.44

Notice that the row labels are no longer in bold, but everything else is the same.


Notes:


  1. Note that pander’s author fixed this problem after I submitted a bug report. However, the solution presented in this post can still be used with older versions of pander.

No comments:

Post a Comment