Saturday, July 28, 2012

Academic Workflow for the Ages

I’ve been a connoisseur of citation software for a while now. But when people ask me which citation program they should use, my response is always: “It depends.” Aside from being a nice, safe answer (I am a well-trained graduate student at this point), it’s also true.

Do you want to prepare a manuscript for an academic journal? Do you need to share your references with colleagues? Are you willing to pay for the software? What word processor do you use?

Asking which citation software is the best is the wrong question. The right question is “What software do I need for a complete academic workflow?”

And in this case again, there are several possibilities depending on your specific situation, but at least you will arrive at a good answer. In this post, I first outline some of the strengths of different citation programs, and then I outline two good options for an academic workflow, starting with the one that I personally use (the aforementioned “Academic Workflow for the Ages.”)

Some Citation Programs

First, for the love of your own sanity, use citation software! I cry a little every time I encounter a grad student nearing the end of his or her thesis without using any citation software. I shout to the heavens, “Why?!” and curse the gods for allowing this tragedy to occur. Please don’t do this to yourself. And talk to your peers: Friends don’t let friends do grad school without citation software. Second, you should never have to write references into a paper for class or a publication. To avoid this tedious work, make sure your citation program is compatible with your word processor. If you use Microsoft Word, make sure that whichever citation program you’re using can automatically insert in-text citations and a bibliography at the end (most can). If you’re using OpenOffice or Pages or LaTeX, similarly make sure that your citation software is compatible. Third, don’t worry about the file format that the program uses. (For example, Zotero uses *.ris files; JabRef uses *.bib files and so on.) In my experience, programs can import and export any format you need, so you can throw this criterion out. Finally, with the exception of Sente, the programs I mention below will work on both PCs and Macs.

Programs for research

These programs are better suited to doing literature reviews because they have some functions for taking notes on references in addition to managing citations.

Zotero versus Mendeley

Zotero and Mendeley are largely comparable: They’re both free; you can share libraries with people; they can automatically import citations from web pages and PDFs; and they both have cite-while-you-write plug-ins for Microsoft Word. Those are the basics, and both of these programs have them for free. I always recommend Zotero because, in my personal experience, it’s more stable than Mendeley and more user friendly. I’ve had Mendeley crash and delete references and do various weird things to my reference libraries. I don’t like that. Also, I like the way that Zotero grabs citation information from web pages. You simply click the icon in your browser’s address bar and in comes the reference information. Mendeley requires you to use a link (which you should save as a bookmark) that takes you to a new page and so forth. It’s awkward. So for my money, I’ll take Zotero every time.

Mendeley’s one saving grace is its PDF annotation feature. If you’re looking for free software that annotates PDFs, Mendeley has the advantage here, but if you’re willing to pay (and if you’re working on a PhD you probably should be), then there are better options. Also, Zotero can save notes on references which isn’t as good as proper annotation but helps nonetheless.

Sente

Sente is paid software, but you also get something for the investment: the best PDF annotation software available (more details on this below). It also has the same features as these other programs: automatically import reference information, share reference libraries, and so forth. It doesn’t have a cite-while-you-write feature for the latest version of Microsoft Word, but it has a document scanning feature, which serves the same purpose. I actually like document scanning better than cite-while-you-write because you can copy and paste references across documents in different formats (e.g., from a *.rtf file to a *.docx file). There’s also an iPad version of Sente, so you can sync your library across your Mac and iPad and then review PDFs on the tablet. Personally, I think the iPad is overpriced, but if I had one, I would love to review PDFs on it.

Preparing Manuscripts

These are candidates for drafting manuscripts for academic journals because they have large libraries of citation formats that cover most journals. This feature is valuable because many journals have custom citation styles. You might know Chicago or APA style, but there’s a very small chance that you know Journal of Industrial Ecology-style. So instead of manually writing in-text citations and a bibliography in a new citation style, these programs will do that tedious work for you.

Endnote versus RefWorks

I recommend EndNote because the one time I used RefWorks, its format for the journal was incorrect! So I had fix all the references by hand. Thanks RefWorks. The downside of using Endnote is that it costs money. Keep in mind that in some fields, like mathematics, authors typically submit articles in LaTeX. If you’re writing manuscripts in LaTeX, a BibTeX program is best.

BibTeX Programs

By BibTeX programs, I mean software like JabRef (which is cross platform) and BibDesk (which is Mac specific). These programs make the most sense if you’re writing your papers in some version of TeX (e.g., LaTeX). But you can make some of them work with other word processors. For example, you can export your references from JabRef and import them directly into an MS Word document. In my experience, BibTeX programs require a bit more manual work, too. Using JabRef, I had to manually link each PDF to the reference in the library, and if I wanted to move the *.bib file to a new computer, I would need to remake all those links manually.

Workflows

I use the notecard method for reading literature and writing papers. Steps in the notecard method are the following: review a source; record each important quote or thought on its own notecard along with that source’s citation key and the page number; repeat for all your sources; organize notecards into piles; and finally turn notecard piles into the paragraphs of your paper. That’s the rough idea.

So for me, the goal is to create an electronic workflow that tracks citation information, that allows me to make electronic notecards for important quotes and thoughts, and that allows me to organize those notes and draft paragraphs based on them.

Academic Workflow for the Ages

This workflow uses the following software:

Unfortunately, all this software costs money. Academic licenses cut down on costs, but getting all this software will cost over $300. To run the software, you also need a Mac, and those aren’t cheap. I personally think that a PhD is such a massive investment of your own time that paying for the right setup is worth a little investment, too.

Consider the amount of money that you spend on a car (if you have one) and compare the amount of time you spend in your car to the amount of time you spend in front of your computer. I’d be surprised if buying the car plus insurance costs less than owning a computer, and that calculation doesn’t even take into account the fact that you’re almost certainly using the computer far more often. I use my computer more than anything else, and it’s also the most important tool I have for getting my research done. That warrants some investment in my opinion. Also, these programs all have free trial periods, so you can test all this out before making the investment.

Sente 6 is a good bibliography program, and it’s the best PDF annotation software available. When you’re annotating a PDF, Sente gives you a big view of the PDF and a sidebar for note taking. Each Sente note has four fields: title, page number, quote, and comment. You can highlight text in a PDF, and Sente can automatically create a note with a title (the first couple words of the quote), the quote itself, and the page number. It leaves the comment field blank, so you can enter your own thoughts. You don’t need to type anything except your own thoughts. That sounds like a pretty damn efficient way to make notes on sources to me. And Sente automatically tracks citation information (more on that later). After reviewing a source, I write an annotated bibliography and save it as a note for that source, as well.

Next is the most beautiful part of this workflow: use Robin Trew’s AppleScript to export your notes from Sente into DevonThink. The script gives each source its own folder. Each note is a text file stored in that source’s folder. The text file contains the note’s title, quote, comment and citation information. The citation information is a tag, like {Goldman 2009@375}. “Goldman 2009” points to the reference, and “@375” refers to the page number. Sente can read these citation tags in documents and replace them with properly formatted citations (more on that later). If you assign keyword tags to sources in Sente, those will be transferred into DevonThink, as well. You can also create new keyword tags in DevonThink. These text files are the equivalent of electronic notecards. And with Trew’s applescript, you can have a searchable database of them. Imagine what it would be like to have a searchable database of three years of literature review.

There are different versions of DevonThink, and I recommend to get the most expensive version because it comes with an OCR engine. OCR stands for optical character recognition. With OCR, you can import an image into DevonThink, and DevonThink will convert any text in the image into selectable and searchable text. You don’t need this feature for the PDFs that contemporary academic journals produce. Those PDFs are high quality, and you can select text in them and copy and paste easily. That’s important because in order to annotate the PDFs in Sente, the text in the PDF needs to be selectable.

There are two instances where OCR is valuable: (1) old journal articles and (2) selections from Google Books. Old journal articles tend to be PDFs as images without selectable text, so you won’t be able to annotate them in Sente. If you run them through DevonThink’s OCR engine, the text becomes selectable, and you can annotate them. Similarly with pages from Google Books, the pages are actually image files (*.png, I believe), so if you want to import them into Sente and take notes, you’ll need to run them through an OCR program.

When you’re ready to start writing, you can search your database of notes in DevonThink and drag and drop the most promising ones into Scrivener. Then, you can use Scrivener to write your first draft. Keep in mind that each notecard contains the citation (including the page number) of the source. So as you write your draft, you simply carry over the citation tags.

You can then export your draft from Scrivener and copy it into a Word document for formatting. When you’re done writing the document, you can scan it with Sente. Sente will go through the document, replace citation tags with properly formatted in-text citations and then put a properly formatted bibliography at the end. (Have a look at Sente’s guidance on citation tags.) Sente supports many citation formats but not as many as EndNote.

Another Elegant Solution

Using BibTeX software along with LaTeX is an efficient way to create documents. Unfortunately, I don’t know a good way to annotate PDFs and store notes with this workflow. The advantage is that LaTeX PDFs that are far more attractive than anything Word can produce. Sente can generate BibTeX tags, so Sente may offer a good solution. For certain fields, like mathematics, LaTeX is required. And more journals are allowing authors to submit manuscripts using LaTeX, and with its beautiful PDFs, LaTeX is worth considering.

Conclusion

These are some of the broad strokes of citation software and creating an integrated academic workflow. In future posts, I plan to provide more specifics, for example, on turning a chapter of a Google Book into a PDF with selectable text.

Sunday, July 22, 2012

Using All Kinds of Fonts in LaTeX (Part 1)

The reason I like using LaTeX is because of the beautiful PDFs that it produces. However, for a typesetting system that intends to be everything a typographer wants it to be, it’s surprisingly difficult to use a variety of fonts and maintain strong control over the typography.

The LaTeX Font Catalogue is a great resource for fonts. All of the fonts listed there can be used by including packages in the preamble, so they’re very easy to use. However, the selection is a bit limited. Arial is available as a close approximation to the ubiquitous Helvetica. But Arial/Helvetica is relatively thick. What if you want to use Helvetic Neue Light or Ultralight? As far as I know, I can’t use a package for either of those. And if I’m serious about document design, I’ll want access to display fonts for titles and headings, too. Unfortunately, the LaTeX font catalogue has a very poor selection of display fonts.

The next easiest way to get non-standard fonts into a document is to use XeTeX/XeLaTeX as the typesetting engine and the fontspec package. fontspec lets you insert system fonts into your documents (and unfortunately, it’s not compatible with pdfLaTeX). On a Mac, you can find your system fonts by looking into the Font Book application or by entering the following command in the Terminal:

fc-list

Because you’re going to get a lot of text in the output, you may want to pipe the results into a text file. You can use the following command to pipe the output into a plain-text file called fontlist.txt:

fc-list > fontlist.txt

Any font that is listed in the output can be included in a LaTeX document with the following code (in the document preamble). The example uses Helvetica Neue Light.

\usepackage{fontspec}         % Provide features for AAT
                              %   and OpenType fonts
\setmainfont{Helvetica Light} % Define the default font 
                              %   family

You can also set a sans-serif font and a monospaced font. For example, to set Helvetica Neue Light as the document’s sans-serif font, enter the following in the preamble of your LaTeX document:

\setsansfont{Helvetica Light} % Make Helvetica Neue Light 
                              %   the default font for 
                              %   sans-serif text. 

And if you want to use a new font, you can simply install it on your system. For Mac OSX, at least, installing fonts is incredibly easy. You can go to a variety of websites to find and download new fonts to your computer.1 Usually, they’re in ZIP files, so you simply unzip the file and then double click on the font file (e.g., a *.ttf file or an *.otf file). It will open in Font Book, and then you can click the “Install Font” button to install it (Source). It’s pretty straightforward.

So it seems like this solves the problem of getting different fonts into a LaTeX document, right? Well, yes, but there’s an important disadvantage to using XeLaTeX to typeset your document: you can’t use any of the features in the microtype package. The microtype package makes available several functions that typographers need, such as control over kerning and inter-word spacing. (I recommend to play around with inter-word spacing, for example, to quickly see how it changes the “feel” of the document.) microtype also does protrusion, which makes documents’ margins appear straight to the human eye. And because all things are complicated in LaTeX, microtype is not compatible with XeLaTeX.

So if you want access to all the features in microtype, you need to use pdfLaTeX as the typesetting engine, and in this case, you can’t use the fontspec package to include system fonts in the document. As a result, you need to install new fonts in your TeX system. I’ll explain how to do that in the next blog post.


Notes:


  1. I haven’t gotten that far into fonts, but Font Squirrel seems like a good resource. Keep in mind that fonts can be free for personal use but not for commercial use. Personally, I would prefer to use only unrestricted, free fonts, because some of my work is commercial, and I’m not sure what I’ll be using my personal work for in the future.

Tuesday, July 10, 2012

LaTeX Isn't Just for Technical Documents

LaTeX is not just for peer-reviewed math journals. You can use LaTeX to make PDFs that rival those made with expensive software, like the Adobe line of products. You have to get the basics down first, but then it’s just a matter of picking and using good fonts, colors, and layouts.

So here is one of my examples to prove this point. It’s a quick reference sheet for cooking. Although the content isn’t actually my own, the formatting and layout are. In a reddit post on the LaTeX sub-reddit, user a_contact_juggler challenged the sub-reddit to make a LaTeX version of user Fredthecoolfish’s single-sided, handwritten cooking cheat sheet. The original handwritten cheat sheet is available here.

You can see other people’s LaTeX versions in this reddit post. My version is below. Click on it to download the PDF.

Cooking cheat sheet

Cooking cheat sheet

And here are the raw files:

There isn’t much education in this post. The main point is to show that it’s possible to make colorful, attractive PDFs for a general audience—not just engineers and math enthusiasts—with LaTeX.

Although the main point here isn’t to describe new functions in LaTeX, I did make an interesting discovery while working on this: it’s possible to turn images into links with the standard \href command. For example, enter the following in a LaTeX document:

\href{URL}{%
  \includegraphics[height=18pt,keepaspectratio]%
  {IMAGE_FILENAME}%
}

Replace URL with the URL you want to link to, and replace IMAGE_FILENAME with file name of the image.

Sunday, July 1, 2012

Using In-text Fonts in LaTeX Figures

One thing that has always frustrated me about putting figures into Word documents is the mismatch between the fonts in the figures and the running text.

Drafting images in Word is a hassle, and it’s hard to retain their look when exporting them to other programs. More importantly, figures drafted in Word don’t look very sharp or professional. If you create the image in another program like MS Visio and then import it, the font size rarely matches the font size in the rest of the document. And if you change the font type in the Word document, you have to go back and rework all the text in the Visio file. Some of line breaks may have changed because the new font is larger or smaller. You may need to change the alignment and indents to get the right look. And so on. It’s a time-consuming nuisance.

And if you use another program that only gives you an image file as output (e.g., a PDF or JPEG), it’s even harder to match the font size in the rest of the document. All you can do is resize the entire image until the font looks about the same as the surrounding text. If the figure is larger than the margins of the document when the font sizes finally match, you’ll have to redraft it and try again.

If you’re using LaTeX, there’s an elegant solution to this problem, and it’s one of the reasons I love preparing documents in LaTeX: You can use Inkscape to prepare the figure and then save a copy as a PDF with this option checked: “PDF+LaTeX: Omit Text in PDF, and create LaTeX file”.

First, you need to draft the figure in Inkscape. I won’t go into details about how to use Inkscape except to provide the following tip on horizontally and vertically centering text in a box or rectangle:

  1. Create a shape like a box or rectangle and create a text object.
  2. Open the “Align and distribute...” dialogue in the “Object” menu (or press Shift+Ctrl+A).
  3. Choose a “Relative to:” mode that refers to an object. I prefer “Biggest object” as the text object ought to be smaller than the shape it appears in.
  4. Select both the text and the shape.
  5. Press both horizontal and vertical align buttons in the “Align and distribute...” dialogue.

Now the text will appear vertically and horizontally centered in the shape. The following image is an example I created for illustrative purposes.

Example figure

Example figure

After drafting the figure in Inkscape:

  1. Go to the “File” menu, select “Save a copy...
  2. Choose “Portable Document Format (*.pdf)”.
  3. On the following screen, check the box for “PDF+LaTeX: Omit text in PDF, and create LaTeX file”. (See the figure below.)

Inkscape Screenshot: File -> Save a copy... -> Portable Document Format (*.pdf)

Inkscape Screenshot: File -> Save a copy... -> Portable Document Format (*.pdf)

Inkscape separates the graphics and the text into two files: One is a PDF with the graphics and the other is a LaTeX file (with a *.pdf_tex extension) which contains instructions for placing the text and the text itself.

To insert this image into the LaTeX file, simply insert the following code in your LaTeX document:

\input{filename.pdf_tex}

You can set the width of the figure with \svgwidth, and LaTeX will maintain the aspect ratio from the original image. For example, insert the following code in your LaTeX document:

\def\svgwidth{\textwidth}

So the full code for inserting a figure like this could be the following:

\begin{figure}
  \centering
  \def\svgwidth{\textwidth}
  \input{filename.pdf_tex}
  \caption{the caption}
  \label{fig:thelabel}
\end{figure}

And you need to include the pstricks package in the preamble of your LaTeX document with the following code:

\include{pstricks}

(Note that you don’t need the graphicx package.)

There are a lot of ways to get figures into LaTeX documents, but I like this approach and promote it here, because it’s relatively easy. With other approaches like PGF/TikZ, PSTricks and Xfig, you need to either learn an extensive new set of commands and directly code the figure or use graphical user interfaces (GUIs) that will output code for you. But in my experience, GUI programs for those packages tend to be awkward and not highly developed, so you’re left with the unappealing and time-consuming chore of digging through instruction manuals to learn how to code your figure. Inkscape, on the other hand, is relatively easy to use, and if you don’t want to draft the figure in Inkscape directly, you can import it from another program. I have imported PDFs from Visio into Inkscape, for example, touched them up and then exported them as described above.

Here are some example files to see this approach in action. Put all three in the same directory and then typeset the *.tex file for the document using XeLaTeX.

Here is the output, and here is the SVG file for the image in case you want to play around with it in Inkscape.

And finally I have to admit that this is a post with poor attribution. I learned about this approach several months ago, and the sources that I used to figure it out have since been lost to my foggy memory. If people have sources and post them in the comments, that would be welcome and appreciated.