[RE] Our Software Dependency Problem

by Ciprian Dorin Craciun (https://volution.ro/ciprian) on 
with regard to reading

About the "dependency hell" that lately seems to plague many of the (modern) development ecosystems.

// permanent-link // Lobsters // HackerNews // index // RSS









Overview

This article presents Russ Cox's informed view about the state we are in with regard to software dependencies. Unfortunately, although providing 100% correct assesments and suggestions, his perspective is extreemly limited, namely that as a Google employee that develops high-value / high-stakes software systems, with (presumably) large budget and lots of talent at hand.

Unfortunately he does not provide any insight on how such an approach could be scaled-down for the small start-ups or the hobbyist developers.

Moreover he fails to highlight the same shortcomings in Google's own Go, especially since Russ is a core developer of this language and tooling.

I don't want to seem harsh on Russ, I highly respect his talent, achievements and oppinions, but I had much higher expectations when I started reading this article. I hope in future iterations he would touch upon some of these sortcomings.

However, the article does an excelent job at presenting the issues and the "correct" techniques to handle them.

Highlights

The first paragraph I think perfectly describes the current state of afairs -- from almost "no reuse", to "overuse" in only a handfull of years:

For decades, discussion of software reuse was far more common than actual software reuse. Today, the situation is reversed: developers reuse software written by others every day, in the form of software dependencies, and the situation goes mostly unexamined. [...] Software dependencies carry with them serious risks that are too often overlooked.

It then follows with a few observations about the rise of "dependency managers" that in the end make it easier to write, publish and reuse a 5-line module, than copy-pasting it in one's own projects (I took the libery to include such a large quote because I think if someone takes only one thing from this article it should be this):

As dependency managers make individual packages easier to download and install, the lower fixed costs make smaller packages economical to publish and reuse. For example, the Node.js dependency manager NPM provides access to over 750,000 packages. One of them, escape-string-regexp, provides a single function that escapes regular expression operators in its input. The entire implementation is:

var matchOperatorsRe = /[|\\{}()[\]^$+*?.]/g;

module.exports = function (str) {
    if (typeof str !== 'string') {
        throw new TypeError('Expected a string');
    }
    return str.replace(matchOperatorsRe, '\\$&');
};

Before dependency managers, publishing an eight-line code library would have been unthinkable: too much overhead for too little benefit.

It also provides an interesting view upon using software dependencies, namely equivalating it to "development outsorcing", although it "dramatizes" the issue a little too much:

Adding a package as a dependency outsources the work of developing that code—designing, writing, testing, debugging, and maintaining—to someone else on the internet, someone you often don't know. [...] Your program's execution now literally depends on code downloaded from this stranger on the internet.

Building upon that it formalizes the "cost" of such "free" reuse, which although perhaps a little bit too academic, does present a perspective that perhaps few have ever viewed from.

The cost of adopting a bad dependency can be viewed as the sum, over all possible bad outcomes, of the cost of each bad outcome multiplied by its probability of happening (risk). [...] The context where a dependency will be used determines the cost of a bad outcome. At one end of the spectrum is a personal hobby project [...]. At the other end of the spectrum is production software, [...], sensitive data may be divulged, customers may be harmed, companies may fail. High failure costs make it much more important to estimate and then reduce any risk of a serious failure.

However, although the previous section related to costs and risks was not followed with clear advice on how to "evaluate" said metrics, it does follow with a few hands-on advice on how one can assess the "quality" of a dependency:

Is package's documentation clear? Does the API have a clear design? If the authors can explain the package's API and its design well to you, [...] they have explained the implementation well to the computer, in the source code. [...]

Is the code well-written? Read some of it. [...] Does it look like code you'd want to debug? You may need to. [...] Keep an open mind to development practices you may not be familiar with. [...]

Does the code have tests? Can you run them? Do they pass? [...]

Find the package's issue tracker. [...]

Look at the package's commit history. [...]

Do many other packages depend on this code? [...]

[...] Does it have a history of security problems listed in the National Vulnerability Database (NVD)? [...]

Is the code properly licensed? [...]

Does the code have dependencies of its own? [...] each of them should ideally be inspected as described in this section.

It then continues with a few items that don't require just "passively" evaluating your dependency, but instead requires hands-on "active" evaluation by:

[...] If the package passes the inspection and you decide to make your project depend on it, the next step should be to write new tests focused on the functionality needed by your application. [...]

[...] it makes sense to define an interface of your own, along with a thin wrapper implementing that interface using the dependency. [...]

It may also be appropriate to isolate a dependency at run-time, to limit the possible damage caused by bugs in it. [...]

Perhaps the best (and easiest to apply) suggestions come at the end:

For a long time, the conventional wisdom about software was "if it ain't broke, don't fix it." [...] The second is the cost of discovering already-fixed bugs the hard way. [...] The window for security-critical upgrades is especially short. [...]

Even after all that work, you’re not done tending your dependencies. It’s important to continue to monitor them and perhaps even re-evaluate your decision to use them. [...]

Finally the conclusion offers "three broad recommendations":

Recognize the problem. Establish best practices for today. Develop better dependency technology for tomorrow.

Regarding the aplicability of these suggestions...

Throughout the entire article there are a few examples used to prove the points it makes, namely PCRE (the regular expression library), Google Code Search, and a few other Google services or products. However these are large and complex libraries and products, and especially in the case of Google products the development budget surely enables such thorough dependency analysis.

However the author fails to describe how these suggestions scale-down to small-sized projects, especially in small start-ups where business related tasks (features, UI, etc.) trump everything else (and unfortunately most of the time including security).

For example the suggestion of "abstarting the dependency" (or "isolating the dependency", etc.) is not applicable in general, because the main reason one searches for dependencies is the lack of "resources" (time or talent) to (re)implement them; therefore "wrapping" a dependency could become as costly as writing it from scratch.

Regarding missing "irony" towards Go ecosystem...

Given that the author of the article, Russ Cox, is a core developer of the Go language, and the fact that to date Go still misses a proper "dependency manager" and thus forcing the developers to either "vendor" (i.e. embedd the source code of dependencies in their own repositories) or "download the latest version" I find the lack of any Go related criticism to be a major fault of this article...

You can't "preach one thing" and "do another"... Russ cites a lot fiascos that happened in the last few years, from the NodeJS left-pad and event-stream libraries, to Equifax breach, however it fails to clearly highlight how the official Go language and tools (to date) do (nothing?) for the Go ecosystem to avoid falling into these pitfalls.

Moreover he explicitly states that one should even record "the cryptographic hash of the expected source code", yet with the current Go tools one can't even choose a particular version of said dependency... In fact Go doesn't even has the concept of "dependency" or "module", let alone "version"... On the contrary (to date) if one wants to distribute a certain variant of a Go dependency library (that one has reviewed, tested, abstacted and isolated), it has to jump to a lot of hoops to distribute that dependency.

(I keep mentioning "to date" because lately there was some work done in the Go ecosystem to include in the official tools a "dependeny manager".)