Bitcoin: A Solution to Publisher Authentication and Usage Accounting

Posted by Phil Davis, in the Scholarly Kitchen

Until recently, I’ve considered Bitcoin to be a shady digital currency that facilitates the activities of drug lords, arms dealers, smugglers, prostitution rings, and other nefarious activities that hide in the shadows of an open market. In recent years, however, Bitcoin, has been moving into more pedestrian and lawful activities. Many online stores, pubs and coffee shops now accept Bitcoin. On April 1st, PeerJ announced that it would start accepting Bitcoin, leading some to wonder whether it was a clever joke. Nope. No joke.

This post is not about Bitcoin as a financial tool — publishers can decide how they want to structure their own financial transactions — but to explore how the technology behind Bitcoin — the blockchain — can be used to solve the intractable problems of authentication and usage accounting.

For more than a decade, publishers and institutions have settled, in most cases, on an IP-model for authentication to paywalled material. If you’re a researcher or graduate student or librarian sitting in front of a computer that is physically located within an IP range, you have access. If you don’t, you need to find a work-around, like a proxy-server (a computer that sits within an authenticated IP range), or a virtual private network (VPN), which creates a secure connection that mimics a physical one. There are other models for remote authentication, none of them, in my opinion works exceptionally well, especially if you are an infrequent user. We should not be surprised to find that people who should have access to this paywalled material ultimately decide to turn to the dark web, like Sci-Hub, for their access. If publishers wish to keep with a traditional paywall model, they will need to develop a simpler authentication model that identifies individuals and not the network location of their devices.

When users turn to the dark web, it creates a second problem — a problem of accounting. More than ten years ago, as a publisher-librarian committee developed Project COUNTER, it was assumed, a priori, that most usage could be counted and reported from individual content providers. COUNTER’s role at the time was standardizing how downloads were counted and how to report them to their customers with some degree of trust.

Today, it is not clear how many downloads are taking place that go unrecorded, but one thing that we can be certain about is that that undocumented usage is growing and may eclipse the traditional model of distribution from publisher to reader. Many years ago, free accessibility from PubMed Central was responsible for a significant diversion of traffic from the publishers’ websites. Today, we may add institutional and subject repositories, peer-to-peer sharing, commercial archives (ResearchGate, Academia.edu), and now the likes of Sci-Hub, which is built upon abusing the authentication system built to facilitate access to online journals. Publisher-provided usage statistics may be reporting just a small and declining slice of total usage:  our bigger problem is not knowing what the other slices look like.

Strangely, at a time when we can capture and report tweets, blog posts, news, Facebook, Google+, and reference manager use of scholarly content, the metric that may best indicate reading — the article download — is becoming more elusive.

The savvy librarian or consortial negotiator will use this information — or rather, the lack thereof — to her advantage. “Look,” she says. “Article downloads have dropped this year by 5%. Let’s begin our price negotiation at 5% lower than last year.” While both the librarian and the publisher know full well that their publisher-derived downloads reflect just a portion of overall use by its institutional members, no one at that table can provide even an estimation of overall use.

This is not just a problem for publishers. Authors may also be misled by the underreporting of article downloads when viewing their article-performance dashboard, as are their funders, who are interested in the impact of the research they sponsor. A lack of reliable usage data also means that editors are incapable of learning from their decisions on what to accept for publication. Put simply, a dark web occludes what can be known about article impact.

Understanding how the dark web affects their business, publishers have come together recently to discuss how their articles can be shared, resulting in a website that attempts to educate and document publisher policies, although I’m not convinced how this will change reader behavior. Moreover, the best publishers have achieved through these discussions is a draft set of voluntary principles. This is like agreeing that the sea is rising, but offering no concrete solution besides crying “every man for himself!”

This is the point in the post where readers anticipate a solution, and I first need to state outright that I don’t know if this solution will work technologically, politically, legally, and socially. But sitting back and complaining is not enough. Those readers who take a pessimistic view of scholarly publishing are welcome to remain critical: I am willing to try to work towards a solution. This solution may not be an ultimate solution, but at least, it may be better that what we have right now, which is a model where content providers are finding it more and more difficult accounting for what they do. I’m going to propose that a possible solution to the growing problem of authentication and accounting in scholarly publishing may be found in the technology behind Bitcoin — the blockchain.

One of the first principles of Bitcoin is that each and every transaction is a public and transparent transition. When Joe sends Alice some bitcoins, this transaction is broadcasted publicly and recorded in a public ledger. The ledger does not record the names “Joe” and “Alice” but includes each of their digital signatures, which are private, unique, and anonymous. Once recorded, this transaction is verified and validated by other public ledgers. The accuracy of these public transaction ledgers is maintained by other computers (called “miners”) that work out a computationally difficult problem in order to validate whether the transaction was real. In the Bitcoin system, miners are rewarded with new Bitcoins, so there is a financial incentive to devote computational bandwidth to maintaining the integrity of the accounting system.

If we apply this system to publishing, it is not hard to substitute a published document (a journal article, book chapter, or dataset) as the currency of transaction. A transaction from Journal A to User B is recorded in the same way a peer-to-peer transaction from User C to User D, or a transaction from Repository D to User E. In this model, every document transaction is recorded and public. There is no longer a dark web. We see the entire usage pie, not just one small slice of it.

While the system of distributed ledgers is public, it is built around privacy. Digital signatures do not necessarily need to disclose the identity of an individual, only that a transactor is a unique individual — not a computer or a proxy server — but an individual. Similarly, the blockchain does not need to disclose the full details of what was sent, only that the document was tied to a unique content creator. I think these two details are essential for such a system to be adopted: users will want to maintain their privacy and publishers will want to keep detailed information away from their competitors.

As for who will devote computational power to maintain the public ledgers, I see several large groups who are incentivized to take this role: publishers themselves, libraries and their consortia, and funders.

One of the strengths of a public and distributed accounting system is its decentralized nature. There is no need for librarians to trust the numbers they receive from their publishers. There is no need to trust that Project COUNTER is doing their job and auditing these publisher reports. There is no need to trust the numbers reported by third-party services and archives. Trust is built into the transaction system itself and verified by other players. More importantly, it is very, very difficult to game this system.

Obviously, there are some changes that will need to take place before such an open transactional model can be implemented. First, every reader will need a digital signature. This signature will identify the individual wherever s/he goes, whether it is back and forth from home to the office, to a conference, or to a new institution. It will be like an ORCID-ID you keep in your wallet at all times. Second, the digital signature will need to include information that will be used to authenticate that individual if the content is restricted to members of an institution. Unlike the current IP-based model that authenticates individual machines, the digital signature will authenticate individual people. The digital signature will work very much like a passport, but unlike passports, you can only have one.

The bitcoin public ledger model described above attempts to solve the problem of tracking and accounting for the distribution of scholarly documents around an increasingly dark web. There is another way that publishers and institutions can use blockchain to solve the problem of authentication.

If we move away from the IP-model of document authentication and replace it with the individual, the logical place to put it is in the document reader itself (i.e. Adobe Reader, Mendeley, Readcube, and Papers). Digital Rights Management (DRM) will need to be built into document readers. Each user will need a digital signature and allow their document reader to access it. In this way, it doesn’t matter whether the user is physically in the lab, in the library, at home, or on a business trip. No proxy server, no VPN, no Shibboleth. Moreover, every digital signature is encrypted and is not based on username and password. Under this model, even Bob1968 is unbreakable.

A DRM-based reader means that content will only display if the reader is able to authenticate the individual. I don’t see anyway around this, and it does mean that someone going offline will either need to pre-authenticate in order to view the documents at a later date or receive some kind of grace period until the individual returns back online. This notion of individual authentication will be a problem for some, but I should remind readers that the free reference managers listed above already track and send detailed information about reader behavior back to individual publishers. A digital signature based on blockchain would provide much more privacy than the personal registration model currently used in the reference manager model.

While I admit that I don’t have all of the details worked out in this blog post, the distributed public ledger model using encrypted blockchain may provide a working model for solving some intractable problems we currently face in authenticating users and accounting for usage and sharing in an increasingly dark web. Often, technology is used to solve problems it wasn’t initially built to solve.