Scripts for writing papers

One of the nice things about LaTeX is that the definitive version of the document is stored as plain text. This simplifies version control and avoids the need to use a complicated editor with an opaque binary file format. It also means that writing scripts to check for writing mistakes is relatively easy. Of course, detecting grammatical errors in general is very challenging, but certain classes of mistakes are easy to check for. Matt Might has posted a few scripts which I’ve found quite useful.

I had cause to write another script to check for consistent use of hyphens. For example, both “non-deterministic” and “nondeterministic” are acceptable, but a single document should pick one variant and use it consistently. You can find the script here; suggestions for improvement (rather, patches) are welcome.

What other common mistakes are amenable to a simple script? Checking for consistent use of the Oxford comma should be straightforward, for example. There are also scripts like chktex that look for mistakes in LaTeX usage (although I’ve personally found chktex to be too noisy to be useful).

Naturally, there are limits on what you can do without parsing natural language. After The Deadline claims to provide some NLP-based grammar analysis, but I haven’t used it personally.

Resources

  • atdtool, a command-line interface to After The Deadline
  • diction, a GNU tool to check for common writing mistakes
  • LanguageTool, another open source style and grammar checker
  • TextLint, a writing style checker
Advertisements

2 Comments

Filed under Uncategorized

2 responses to “Scripts for writing papers

  1. joshrosen

    I’m a fan of style-check.rb (http://www.cs.umd.edu/~nspring/software/style-check-readme.html), a suite of grammar and style rules. It also detects some LaTeX formatting errors, but I’m not sure how it compares to chktex.

    I recently wrote something similar to your script in Python (https://github.com/JoshRosen/inconsistency.py). I tried to add some heuristics to detect capitalization inconsistencies in compound words. A main challenge is avoiding false-positives due to capitalized words at the starts of sentences. Capitalization following em-dashes or colons can also cause false-positives, but these are harder to deal with because the capitalization rule depends on content and context.

    I came across a couple of false-positives due to words whose proper hyphenation depends on context. For example, ‘state-of-the-art’ is hyphenated when used as an adjective, but not as a noun.

  2. Charlie

    I am pleased to have found these scripts. I’ve been looking for something like this for a while.

    I personally find it difficult to be sure that I have kept my tenses and author’s voice consistent throughout a long document. I sometimes check this just by searching for we or I in a document. Sometimes it is a reasonable style choice to vary these things. Have you tried anything along these lines?

    Similar to your script, there are some words (such as lake-bed and lake bed) which are acceptable (according to the Oxford English Dictionary) in slightly different forms, but need to be self consistent. I suppose a list of these words could be made from which a script could do a search. I might try to add it myself, although I’ve never used or run ruby before.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s