Rethinking Authentication, Revamping the Business

Posted by Roger C. Schonfeld ⋅ Jun 22, 2016

IP authentication is the most important mechanism for authorizing access to licensed e-resources. Substantial business and policy issues for libraries and publishers alike connect up to IP authentication. Today, there is growing interest in eliminating IP authentication, so it is timely to examine the implications if we were soon to see its end. Earlier this month, I had the opportunity to attend and participate in the Universal Resource Access Forum, hosted by the Copyright Clearance Center (CCC). The day-long meeting brought together publishers, corporate librarians and other information professionals, publishing technologists, and several academic librarians. It focused primarily on barriers to information usage in the corporate sector, with somewhat of an emphasis on multinational pharmaceutical companies. I spoke about the findings of a project on barriers to discovery and access in the corporate sector.  There were many interlinked issues discussed during the course of the day. IP authentication, and the opportunities to move beyond it for licensed e-resources, was perhaps the forum’s most important theme.

The Bedrock Of the Site License

First, some basic background. IP authentication is the bedrock of access control for licensed e-resources. A content provider calculates whether the internet address of a user is within a subscribing institution’s range of IP addresses. If yes, access to content is provided. If no, access is denied.

Off-site access is only growing in importance, and when a user is working remotely from the campus or corporate network, one of several mechanisms is made available to provide access. In US academic libraries, the most common solution is a proxy, which make the user appear to be on the corporate network by using institutional credentials to login to a separate service. In many other countries, a SAML-based solution is more common; these allow a user to login directly through an institution’s single sign on infrastructure. As they are implemented, the SAML-based systems are more likely to appear at the right moment in a research workflow, although there is no inherent reason why proxies cannot be used in this way. Of more substantive differentiation, SAML-based solutions allow a user to be more readily associated with one’s usage activity, providing advantages both for security and personalization. By contrast, proxies provide greater anonymity and privacy.

IP authentication in combination with proxy servers for remote access are the basis for the bundled site license model. The bundled aspect receives the greatest attention in discussions about what is wrong with the model for libraries. Two other characteristics are key: First, that it provides access to an entire site as defined by IP address; and Second, that it affords unlimited usage within that site as a result.

Not all e-resources used IP authentication in combination with proxy-based off-site access. For example, the library collaboration HathiTrust does not permit IP-based or proxy-driven access, explaining that “HathiTrust uses rate-limiting to ensure compliance with third-party agreements…Our rate-limiting mechanisms treat all users accessing through a proxy server as a single user…” Nevertheless, IP-based site licenses, with proxy-based off-site access for US academia, remains the common solution.

Recently, however, publishers have had a substantial change of heart. Among the factors at play is the experience of users who require greater seamlessness and personalization. But let there be no doubt that the growing scope and prominence of Sci-Hub has concentrated the mind of many a publishing executive. In this view, the anonymity of IP authentication has facilitated piracy and continues to complicate their efforts to shut down suspicious use.

IP Alternatives

While I have heard these arguments on and off this year, the meeting hosted by CCC made abundantly clear that there is great dissatisfaction with IP-based authentication across the community. Publishers want to move away from it due to their piracy concerns, their desire to improve seamlessness for researchers, and their expectations about the value they can offer through greater personalization. Corporate librarians want to move away from it because of administrative headaches and workflow deficiencies it imposes in their environment. And at least some academic librarians want to move away from it because of the poor user experience, especially with off-site access. Taking aim at IP authentication and proxy servers has become all the rage. But what might supplant them?

The most radical, and I would argue the most user-centric option, is to decouple identity from institutional affiliation. Right now, for authentication purposes, one’s identity is provided by a single institution. But any person is more complicated that a single institutional role; we may be employees of one organization, students or faculty members of another, alumni of a third, and residents with privileges at a local public library. We have access rights from each of these affiliations, each of which may be in flux, and want to be able to work across then seamlessly. Unfortunately, developing this approach would require substantial platform re-engineering. And let’s not forget that, at least in the basic analysis, empowering users would commensurately weaken the role of publishers and institutions in the various data strategies that all are pursuing.

But more pragmatic options are advancing in the near term. One option, which has not received the attention it deserves, Google Apps for Education, which is in widespread use across US higher education and beyond. The outsourcing of email, calendar, and other basic applications to Google and Microsoft has opened up the possibility of using these services for authentication, a modified version of “social” login. The benefit is using these existing consumer-grade authentication providers at little to no cost. There are presumably risks to outsourcing this important function in a way that further consolidates Google’s position in publishing and discovery. Perhaps this is one reason it has not been rolled out broadly, although Google apps authentication is available for a number Gale products.

Another development of some importance is that publisher platform providers will offer more seamless authentication across the platforms they power. This direction, being pioneered by Semantico, will reduce the number of tiny content-platform authentication silos that currently exist. It does not eliminate the underlying authentication issues that are motivating publishers and libraries to wish to move beyond IP authentication.

One direction that seems most likely to gain traction is the further rollout of SAML-based solutions. For licensed e-resources, SAML had been implemented most commonly through Shibboleth federations or OpenAthens. Users attempting to access licensed content are typically confronted with a list of institutions with such implementations, sometimes grouped by the federations of which they are members. The inability for the content provider to send a user automatically to the appropriate institutional authentication service (the so-called Where Are You From, or WAYF tool) creates a confusing and complicated step for the user. But WAYF is a problem that has ready technical solutions. SAML-based solutions are in more widespread use for off-site access elsewhere, but they have not gained traction as the preferred solution for US academic libraries. While SAML is sometimes used for single sign on course management systems, grades, and other student information, I am not aware of it being used for on-site e-resource authentication .

At the Universal Resource Access Forum, presenters proposed a number of approaches to address the issues in question. There were suggestions that corporations could organize their own Shibboleth federation or that SAML solutions could be implemented without Shibboleth. A number of potential pilots are under consideration.

While a SAML-driven solution may not take hold in the long run, we should expect to see much greater energy in alternatives to IP authentication and proxies. Libraries, publishers, and intermediaries should be planning on policy, business, and technical levels for the future they wish to see. Here are several areas for consideration:

Privacy and Personalization

Libraries have stood up for the privacy of their user communities in many ways, and in recent years have expressed concern that data collection and personalization efforts by vendors not betray these principles. The site license model, built on IP authentication, has enabled some important efforts to ensure user privacy. While few libraries have in fact attempted to route all users through a single anonymous IP address, even this type of effort, common in large corporations, has been possible.

Any new authentication model will, in all likelihood, connect directly with vendor platform user accounts. Such a connection will be a real boon to personalization, allowing for the tracking of user-level usage patterns and the delivery of personalization with every interaction. It will commensurately interfere with privacy in a variety of ways.

Key factors to consider:

  • Will it be possible for users or libraries to opt in or opt out of the new tracking techniques?
  • Will it be functionally possible for users be anonymous? Or will the data gathering apply to everyone, whether the personalization is delivered to all?
  • Will it be possible for users to merge more aspects of their identity and activity across services together? In other words, will those who desire a more personalized experience be able to have their data shared across platforms?

Pricing Models

Content providers have been interested for some time in moving away from the unlimited-access site-license model. Many academic libraries, at least, are today paying for journal bundles based on their historical print journal spend plus a variety of inflationary factors and the effects of negotiations over time. Most content providers have at least examined alternatives and some have attempted to establish versions of “value-based pricing” in the marketplace. Such models can emphasize FTE or research expenditures, but perhaps even more promising, in one sense, are models that utilize article downloads or other usage metrics.

In a new authentication model, content providers will gain access to more granular user data than has previously been available to them. For example, they will likely gain access to information about the total number of active users from each institution and the pattern of usage across those users. This may increase the opportunities to introduce pricing models that distinguish more effectively based on the value that customer libraries receive.

While libraries have substantial concerns with the pricing models in place today, they generally prefer not to move to a different model for fee calculation. One of the benefits of unlimited access models has been that the price is known in advance rather than introducing variability into library budgets. While from a publisher perspective this pricing and authentication model has a variety of disadvantages, it has in effect been grandfathered in.

Key factors to consider:

  • What additional data will publishers and vendors derive from various authentication alternatives that may influence the pricing models available to them?
  • Are there alternative pricing models that would be of benefit to publishers and libraries alike?

Library as Gateway

My colleagues and I at Ithaka S+R have been tracking the evolving position of the library as research starting point, or gateway, for more than 15 years. Our most recent surveys of academics in the US and UK have found a recovery in the share perceiving the library as a research starting point: evidence to some that the index-based discovery services are influencing discovery behaviors.

Authentication also plays a vital role in influencing discovery. As I showed in a presentation last year, proxy-based authentication forces users to navigate through the library infrastructure, and to use library discovery tools, in order to gain access. Libraries and library vendors should anticipate that changes to authentication models could negatively impact the library’s role as a gateway.

Key factors to consider:

  • Is it strategically important that the library be seen by researchers as their starting point?
  • Will the elimination of an authentication workflow that routes researchers through the library website weaken the library’s gateway role?
  • Is the mechanism of authentication an appropriate way for the library to defend an intermediary role?

Unaffiliated Users

IP authentication made it possible for libraries to serve anonymous and walk-in users. This includes the unaffiliated general public, which is a key priority for public libraries and academic libraries in public universities. A system can incorporate stronger authentication of individual users while still maintaining options for anonymous and unaffiliated walk-in use, although libraries might need to route that unaffiliated use through a specific account, which presumably could then have different permissions, usage throttling policies, and privacy considerations.

Key questions to consider:

  • Will there be a way to authorize unaffiliated users?
  • Will any restrictions be imposed as compared with the fairly anonymous approach in an IP-authenticated environment?
  • Will libraries be able to accept any possible restrictions on walk-in use in order to maintain their ability to serve these users?

Stepping Back

Most academic libraries and scholarly publishers have accepted IP authentication as a given for licensed e-resources. Few have thought about the strategic, policy, and business implications of a fundamental shift in how users are authorized. The time to begin doing so is now. While simple existing solutions like SAML-based options may have been perfectly acceptable as one in an array of alternatives, they have different affordances when examined as the potential sole means of authentication. In that light, the key issue is whether a comparatively easy fix, using existing technologies, can be acceptable enough to all parties — or whether a more extensive but technically complex solution would allow for a better negotiated transition.