Section 6 Key Technologies

The key technologies to implement all of the ideas proposed here already exist. What is needed is a software solution which ties a number of them together in a particular way that fits the use case of writing, reviewing and publishing academic manuscripts.

6.1 Literate Programming

The case for writing in plain text

Why would I write in a plain text file when I could use something like word or google docs? I used to be a word aficionado I new all the fancy features, and when real-time collaborative editing came along in google docs I was on-board and longing for reference management integrations. Now I write almost exclusively in plain text files, I can still do all the same stuff I did in word and google docs but also much more. I do this even when I’m not coding I wrote my thesis like this, I’m doing it now. I can work in an editor that is a bit more like word or google docs in appearance but because I often write code mixed in with my prose I generally don’t. I’m using a simple set of rules for formatting my text document called markdown.

Here is ~98% of all the markdown syntax you are likely to need to know:

# Heading 1

## Heading 2

## Heading 3 {#h3}

a reference to a section \@ref(h3)

- bullet point
- another point

1. *italics!*
2. __Bold__

[hyperlink](https://en.wikipedia.org/wiki/Hyperlink)

Images: [!Alt text](path/to/image)

an inline reference to @key123 or a parenthasized one [@key123]

inline `code`

```
# A block of code
print("Hello, World!")

```

If using mathjax or a simialar tool: inline latex math: $\frac{x}{y}$

Equations:

$$
a=\frac{x}{y^2}
$$

Pretty simple right? If you can learn to use some of the slightly more advanced features of MS Word like styles, sections, track changes or automated reference management you can learn markdown pretty easily. Also there are now some quite nice visual editors to give you more of the WYSIWYG editing experience if you need a leg-up with the syntax, there is one built right into the tool I’m authoring this document in right now RStudio. It even has a nice feature that will let me insert references from my Zotero library or by DOI search and add them to the bibliography for this bookdown project.

But why do I do this? for one reason as I write this it is being rendered every time this file is saved to a window I have open on my second monitor so I can see what you will be seeing. That is if you are reading this on the website because I’m also generating an epub and pdf document from my source text. If I felt so inclined I could add a few lines of configuration and also generate a word version, or use I could use rticles to format it in the style of a paper from my favorite journal. I could also try for various forms of slideshow including powerpoint but this is a bit verbose for slides. The point is you get a lot of different output options from the same starting point. This is because writing this way makes use of a key insight that is now central to good web design, the separation of the semantic content from how it is formatted.

If like me you are old enough to have been taught to write webpages in school in the early 2000’s or to have had a myspace page you may remember setting formatting rules for text in html tags. Setting the color of some text in an html tag is now anathema to the web designer, formatting is done with css not in the html! Why? Because you can completely change the look, layout, navigation or the way you display the same content if you structure things this way, that’s why I’m able to render this document in three different formats at a keystroke. Its semantic structure is governed by the very simple rules outline above, from that this text can be parsed into a data structure that permits it to be formatted and output in a myriad different ways. This also allows for much improved accessibility as a sane order for screen readers inherent in the structure of the source document and what ever size and typeface you’d like this represented in is up to you. As well as easy text mining and searching that does not require wrestling this text back out of a pdf into a nice structure for computers to work with.

Also as I alluded to above I often write a mixture of prose and code this is called Literate Programming where the documentation to my code and my description of what it does is interwoven with the code. It is from this concept that I get the title of this piece “Literate science” my proposition is that we should move to a paradigm of authoring academic manuscripts that resembles literate programming. To be clear I’m not suggesting that everyone learns to code nor that our manuscripts should be detailed documentation of code. What I am suggesting is that the source document that generates your manuscript should generate every number and figure in that manuscript from your input data. You shouldn’t need to learn to code to do this. You may, however, need to use tools which generate code which is inserted into the source document for your manuscript, instead of using tools where there is no means of exactly reproducing the steps you took in the analysis you performed with a gui tool (see section 5.2). Remember that interactive plot I told you to play with back in section 4.2 on in

4

6.2 Collaboration and version control with git

git is a tool used by programmers to keep track of and navigate through the history of changes they have made to code, also to collaborate on code. Originally developed by Linus Torvalds to facilitate the distributed development of the Linux kernel this tool can be applied as well to any body of text as is can to computer source code. This document I’m writing now is in a git repository. It is keeping track of the differences between what is in the files I’m editing now and the last time I ‘commited’ my previous set of changes. With it I can navigate through all my past changes, and revert to previous versions should I choose to. Each commit is associated with a unique hash so if you needed to cite a very specific version of this document you could identify the commit hash associated with the exact version where the change you are interested in was made. This is a very useful feature of git for the purposes of academic publishing, the ability to cite a repository and a specific version of that repository permits us to handle updates and retractions much more smoothly. If you go to a

I can collaborate on this text asynchronously with others using git. If they ‘fork’ off their own ‘branch’ of the project and make some changes we can ‘merge’ them back together looking at all the places our versions differ and choosing which to keep. It also keeps track of who authored which parts of the text, a potentially useful feature for the attribution of authorship credit but with the clear limitation of capturing only who actually wrote the text. One could also do google docs style real-time collaborative editing of your source document and then commit the resultant changes with git using a tool such as hackmd.io.

git has a fairly rich and complex feature set but these are the key ones for our purposes and they can all be accomplished with fairly easy to use gui interfaces such as those implemented at github and git lab. However a git interface tailored to the needs of document authors and not explicitly software development would be a useful addition to the ecosystem and something I would want to see a a part of the platform I’m proposing.

6.3 Containers and build pipelines

git and gitlab for versioning in collaborative plaintext document editing markdown as a simple to use plaintext markup lanugage Rmarkdown/kintr/pandoc with Rstudio as document editor/renderer literate programming gitlab ci-cd / docker for reproducible computational environments and automated checking automatically see if your edit to a manuscript if that change breaks a formatting rule automated QC checking and helping, readability analysis, plagarism, stats checks etc.

6.4 Why not Blockchain (yet)?

You might want to skip this if your not a crytpo person, it’s mostly to explain to people on the crypto train while I’m not on board yet for this project

All this stuff about putting up a bounty, being able to claim that bounty, having amounts of money accrue back to certain parties, doing compute, covering the cost of compute, being open and public sounds like the sort of thing that would gel really interestingly with blockchain with smart contracts and distributed compute? - I hear you ask. Yes, I answer, but the tooling is not there and the ecosystem is not yet mature enough to support what I am proposing here. most of tech stack to support my proposals exists in stuff that was not built to talk to blockchains. I expect to see a version of what I propose built on blockchain technology in the future but I anticipate a timescale on the order of decades. In addition It would be a barrier to the wider adoption of my proposed approach until such time as crypto has gained wider acceptance and understanding. Like it or not justified or not crypto has in the minds of many some negative connotations. I’m already asking for quite a big jump from people crypto is like a bridge to far.

There are many potential advantages: Cyptographically backed identity management for authors, very robust provenance of ideas in a public record, preventing the raising of barriers to knowledge or censorship by governmental of corporate entities through its decentralized nature, etc. A strong crytographic base might useful for the implementation of certain security tools such as limiting data asses to approved individuals with verified credentials and access rights. e.g. allowing access to data only if an individual has an api key signed by by several parties such as their personal identification token to demonstrate it is them and by the tokens of an ethics review board. It may also be possible to do things like run analyses on data to which the researcher is never granted access directly and the code to run the analysis has to be signed by a technical review board who have checked the code is not able to exfiltrate the data it is given access to for analysis.

The ability to transact and have ‘smart contracts’*, to govern the review bounty process does provide a potential upside of this tech especially when it comes to dealing with potential regulatory hurdles associated with this. Any situation where you are holding money which you may at some point be obliged to give back to another party tends to attract a lot of red tape. Cardano is I think the most likely candidate in the current crypto market to be the basis for such a system given it’s technical characteristics and its governance structure’s fondness for peer-review. (Declaration of interest: I have some modest investments in cardano, etherium, bitcoin and monero; cardano having the largest share)

However in short almost all of the aims I outline here and even many of those stated by people trying to deploy blockchain tech in this space, such as DEIP & VitaDAO, can be achieved without the use of blockchain technology. Things like author contribution metrics tied to git commits, (imperfect as these would be as a representation of relative contribution), and reputation metrics associated with an author’s ID, even crytographically verifiable IDs, can be accomplished with a PGP web of trust a yubikey and existing git tools like github/gitlab. Even much of the decentralized model is achievable through a federated architecture, don’t get me wrong I’m bullish on crypto I just don’t think the layer 2/3 tech that I think this approach to publishing would need to be blockchain-native is ready for prime-time just yet. Happy to be convinced otherwise, but I’m going to want to see some running code.

*This is a terrible name BTW, automated contracts would be a lot more accurate, basic conditional logic != smart.


  1. NB Markdown’s competitors that largely lost the format wars reStructuredText (rST) and AsciiDoc would arguably be a better choice for the text-based lingua franka format of academic publication going forward but the tooling and ecosystem around these is just not quite as mature.↩︎