A match made in heaven: academic writing with latex and git

Alternative titles:

  • A match made in heavenhell: academic writing with latex and git
  • Procrastinating writing by over-engineering my workflow

If you are like me, you can happily write code for hours and hours on end but as soon as you need to write a paper you end up staring at a blank page. Luckily, I have come up with a fool proof way to trick myself into thinking I am coding when in reality I am finalling getting around to writing up the work my supervisor has been wanting for the last month. Introducing Latex and git- this was my approach to draft a review paper recently and in this blopig post I will go through some of the ups and downs I had using these tools.

Latex

I’m not going to cover what Latex is in too much detail because there are tons of great resources online describing exactly what it is and how to use it (see https://www.overleaf.com/learn/latex/Learn_LaTeX_in_30_minutes and https://www.blopig.com/blog/?s=Latex). At a high level though, latex is a type setting software that produces nice looking pdf documents from a source file. It is written almost as code in a source file, before being compiled into a great looking pdf document.

For me, the main advantages of writing in latex is that you don’t waste time redoing formatting on things like headings, body paragraphs, title pages, and figure captions. I find when writing in word, I repeat the process of colouring my headings in a specific way about a hundred times. With latex, the instructions are given and the document is produced in a reproducible way every time. Also, when writing something like a reivew paper with 100+ references- keeping track of these and citing them correctly can become a real headache. There are plugins for word that help with this but I don’t find them anywhere as good as the baked in bibtex/biblatex that uses unique keys to track references throughout your document before producing a perfect bibliography at the end. Paired with a program like zotero that produces bibtex references for you, citing becomes a breeze. My biggest pet peeve with using something like microsoft word is that figures are always moving around and placed in weird spots. If you go to edit/write something in the document above a figure, it can jump around, get separated from the caption, or any number of things. With latex, figures are placed where the program decides looks best- and in my opinion is always better than what I could come up with.

The cherry on top of using Latex, is that I can use the same workflow I have for writing code to write the source code. Since the source files are written in a plain text form, whatever editor you use can be used for writing latex- VSCode, (neo)vim, sublime, and TexShop all have great Latex support by downloading a plugin or two; this really makes it feel as though you are in your happy place coding instead slogging out a boring paper.

Git (version control not just an insult)

If you don’t already know what git is, I would say it’s probably not worth trying to use it for your writing. It would most likely introduce more headaches than it would solve. But for me, I find it incredibly useful for version controlling my documents. What really annoys me writting is the number of different documents I end up needing to keep track of when writing. I always end up with 10 different word documents titled things like my-doc_v1.docx, my-doc_v1_rev3.docx, my-doc_vfinal.docx, my-doc_vActually_final.docx, my-doc_v_final_edited.docx, and so on. Trying to figure out which one of these is the most recent working copy can be a real nightmare. Enter git- since git works well with any plain text files, .tex source files are no exception.

The main advantage of using git is that it provides version control over your document. Every change you make is tracked, logged and annotated with a description of why it was made. When I open up the document, I can see every change I’ve made and easily go back if I decide I like the way something was before a change. It gives me confidence to edit aggresively because I know I can always go back to what it was before if needed. Another feature I use heavily from git when writing are branches. Branches allow you to split your work into different streams and work on them independently. I use them to help organize more experimental ideas away from the main draft. For example, I was experimenting with different document layouts for one-column vs two-column. Since I was doing this I could keep both versions of the document up-to-date content wise, while deciding on what looked better. I also used them for some big re-writes I did where I wanted to try re-framing some of the body sections and changing the way the narrative was told. I could do these big re-writes, while still having a main draft that I could send out if anyone asked for it. Another feature I like about git is tags. Every time I sent out a copy of the manuscript, I would add a tag to the commit that draft was taken from. Since I was still getting feedback on the document while writing and editing it, I could see based on the tag what changes I may have already made and what still needed to get done.

Here are some general tips for using git to version control latex documents

.gitignore is your friend

Don’t track anything that can be compiled from your source code. Things like .pdf files are formatted as binary files and therefore git needs to store the entire thing every time you commit. These files can easily be created from compiling your document so no need to track there evolution over time. Here is my .gitignore file for the manuscript I was writing:

**.aux
**.bbl
**.bcf
**.blg
**.fdb_latexmk
**.fls
**.log
**.out
**.pdf*
**.run.xml
**.synctex.gz
**.dvi
**.docx
**/.DS_Store

Use githooks!

I like adding hooks to compile the document and clean outputs. Things like pre-commit and post-checkout hooks are great to ensure that all of the states your document live in are compilable and switching between them happens seemlessly.

Why not version control your figures as well

Along with the main document body- I generated my figures as plain text SVG files using inkscape. These were then version controlled along with the rest of the document. I added a script to compile them to pdf with separate tex overlays for the text objects before compiling the main document. This seemed to work the best to keep the quality of the figures high and give an integrated look to the document where the body and annotations keep the same font.

Are there any disadvantages?

Of course there are. This workflow was largely the product of procrastination and is certainly not necessary to produce your manuscripts. I sometimes find that to get through writers block, I just need to get words down on the page and not worry about what section I’m working on or where in the document I am. This free form style is much harder to keep in a tidy version history since the changes are happening all over and are not entirely formulated at some points. For these cases I will have a commit with a description like:

commit b9854369be5c5c6b8f5d756ec954ee5f544b9514
Author: Benjamin McMaster <benjamin.mcmaster@rdm.ox.ac.uk>
Date:   Wed Jun 21 13:57:32 2023 +0100

    Big edits

which isn’t that helpful to track what was going on.

… but can you send me a word document?

Alright you have made it, your manuscript is finally written up- your final document has been type set in Latex, the reference list is perfectly cited, your figures are placed in exactly the right place, so you send your beautiful pdf document to your supervisor… but then the dreaded question comes “can you send me a word document of this?”. Arggggg.

Fear not, there are some tools to help make this process a bit smoother- but it is still a pain. Pandoc does a decent job of converting your Latex source code into a word document. Running the command like:

 pandoc my-manuscript.tex --citeproc --bibliography my-referneces.bib -o word-document.docx 

will produce a word document of your latex source code with all of the referencing done. From personal experience I find somethings like figures and greek letters do not convert well and so some fiddling with the document is required to get it right.

Self-plug: I have created a github repository with some scripts I use while writing that will be hopefully useful to draft your manuscript. These include commands for cleaning up the latex compilation files, creating latex-pdfs from inkscape SVGs, and converting latex source code into word document format.

https://github.com/benjiemc/writing-tools

Author