Monday, March 2, 2015

Automating Sente Exports

In this post, I’ll provide some guidance on getting and using Sente’s reference library without opening the application itself. In last week’s post, I wrote about fixing Sente’s export to BibTeX. Being able to fix up the *.bib file that Sente exports helps with writing documents in LaTeX. Using the fix that I demoed in last week’s post, you could also make your reference metadata more consistent (e.g., removing internal periods from US state abbreviations) and ensure the metadata conforms to your style guide (e.g., capitalization of source titles). However, the fix that I demoed requires opening up Sente and exporting the reference library both as SenteXML and as BibTeX. Doing so is not an unbearable burden, but it obstructs fully scripting (i.e., fully automating) parts of the workflow.

Sente Library Basics

The Sente reference library is an SQLite3 database. SQLite is a implementation of SQL, and, according to its website, “the most widely deployed SQL database engine in the world.” In my experience, it is lightweight and relatively easy to use.

To access Sente’e SQLite database, you first have to find it. Sente library files have the extension *.sente6lib. This file is actually a folder that contains several files including the SQLite database and PDFs for articles (if you have elected to include them in the Sente library file).

Open a terminal and navigate to the folder containing the Sente library file. Then, navigate into the sub-folder with the SQLite database. One of my library files is called ASU-References or ASU-References.sente6lib in full, so to get to the folder containing the SQLite database, I enter the following in the terminal:

cd ASU-References.sente6lib/Contents

As far as I can tell, the SQLite database is always called primaryLibrary.sente601, although it may differ in other versions of Sente. To get into the database, enter the following in the terminal:

sqlite3 primaryLibrary.sente601

This command opens SQLite and allows you to get information about the database and its tables and to perform SQL queries on the database. To get a list of tables in the database, enter

.tables

To get the schema for a given table, enter

.schema TABLENAME

For example,

.schema Reference

To get just the column headings, enter

.indices TABLENAME

For example,

.indices Reference

These are some helpful basics. For a full list of SQLite commands, simply enter

.help

To exit SQLite, enter

.q

Sente Citation Identifier

One of the most valuable pieces of identifying information about a source is its Sente citation identifier. The citation identifier is the tag that Sente reads when it scans text documents (e.g., Word documents). An example in-text citation follows:

This is a made up sentence for display purposes only {Arnstein 1969}.

The tag is “{Arnstein 1969}”. So if you’re using Sente to cite sources, your documents are full of Sente citation identifiers. And if you want to do something automatic with citations, there is a good chance you will want to retrieve information about a source based on its citation identifier.

The citation identifier appears in the SparseAttribute table, which contains a lot of other additional information about each citation, too. The table has three columns:

  • ReferenceUUID
  • AttributeName
  • AttributeValue

Here are the table contents for one reference in the SparseAttributes table:

C914FC95|EndNote reference number|116
C914FC95|publicationCountry|
C914FC95|publisher|Routledge
C914FC95|Primary contributor role|Author
C914FC95|DOI|10.1080/01944366908977225
C914FC95|BibTeX cite tag|arnstein1969ladder
C914FC95|Citation identifier|Arnstein 1969
C914FC95|publicationStatus|Unknown
C914FC95|ISSN|0194-4363
C914FC95|Web data source|Google Scholar

In the output above and throughout the rest of the post, I have abbreviated the entry in the first column, which is the ReferenceUUID, from C914FC95-5EE5-4C2D-BC80-1621EC22C2EC to C914FC95.

This particular reference is a well-known article on community participation. Its full citation is

Arnstein, Sherry R. 1969. “A Ladder of Citizen Participation.” Journal of the American Planning Association 35 (4): 216–224. doi:10.1080/01944366908977225.

Sente References’ ReferenceUUID

As far as I can tell, the ReferenceUUID is Sente’s true key for each entry in the database. In database terms, the ReferenceUUID is the unique, primary key for a reference.

Aside from being the primary key for a reference, the ReferenceUUID is also important because Sente will serve it up to other applications that request information about highlighted Sente references. For example, to use Robin Trew’s AppleScript (which exports Sente reference notes to DevonThink), you highlight references in Sente and then run the script. The script gets the ReferenceUUID for highlighted references, uses it to grab more information (including notes) about each reference, applies some formatting, and then ships the results to DevonThink. I recommended using this AppleScript in my original post on academic workflows.

It should be possible to pull the citation identifier out of the AttributeValue column by looking up the ReferenceUUID for a given reference, selecting the row that contains “Citation identifier” in the AttributeName column, and pulling the value in the AttributeValue column for that row. For the Arnstein reference, the goal is to retrieve the seventh row from the output above. The seventh row is

C914FC95|Citation identifier|Arnstein 1969

I would expect the following SQL query to retrieve the citation identifier:

select AttributeValue from SparseAttribute 
where AttributeName='Citation identifier' 
and ReferenceUUID='C914FC95'; 

But this query fails. To figure out why it fails, I examined the schema for the SparseAttribute table, which is

CREATE TABLE SparseAttribute 
    ( ReferenceUUID varchar, 
    AttributeName varchar COLLATE SenteLocalizedNoCase, 
    AttributeValue varchar COLLATE SenteLocalizedNoCase );
CREATE INDEX SparseAttributeByName 
    ON SparseAttribute ( AttributeName );
CREATE UNIQUE INDEX SparseAttributePK 
    ON SparseAttribute 
    ( ReferenceUUID, AttributeName );

Note that this table has a collation sequence. I have to admit that I don’t know what that is or what it does, but it causes the SQL query to throw an error. The following is a corrected version of the SQL query:

select AttributeValue from SparseAttribute 
where AttributeName='Citation identifier' COLLATE NOCASE 
and ReferenceUUID='C914FC95' COLLATE NOCASE;

This SQL query gives the following result:

Arnstein 1969

This is the correct citation identifier for the reference.

Next Steps

When I was working on my dissertation, I found that Robin Trew’s AppleScript stopped working. I was relying on the script to export Sente reference notes to DevonThink, and it was a key part of my workflow. The AppleScript makes SQL queries to Sente’s reference database, processes the information it retrieves, and then sends it to DevonThink. So I used all this information about Sente’s SQLite database to update the script and get it working again. If people are interested, I can post my version of the script, although mine may have become out of date by now, too.

In general, I hope this post helps other people who want to write scripts that retrieve information about their Sente references. With access to Sente’s SQL database, it’s possible to do … pretty much anything with metadata about references in your Sente library. You could even reimplement Sente’s export commands (e.g., to fix errors in the way Sente exports to BibTeX).

No comments:

Post a Comment