h1

Reuse Considered Harmful

July 31, 2007

A place for everything and everything in its place
Isabella Mary Beeton The Book of Household Management, 1861

For a number of years it has been a matter of faith that the more content a technical documentation team reuses, the more efficient they are presumed to be. Vasont Systems, a Content Management System (CMS) vendor, proudly claims its users average 71% content reuse. That’s a bold claim, but I suspect that if you could show that even 30% or 40% of your content is reused in more than one deliverable, you’d earn bonus brownie points with pretty much any manager. But, are you really more efficient? Let’s take a deeper look.

First of all, let’s define our terms. Duplication means that you separately maintain more than one copy of some piece of content in your source control or Content Management System (CMS). Reuse means that you put the same piece of content into more than one location in the same output medium. Single sourcing means that you deliver the same piece of content via different media.

If you keep two copies of a glossary definition in your source control system, that would be duplication. If you have just one copy of that glossary definition in source control, but include it in the printed versions of your Installation Guide and User’s Guide, that would be reuse. And, if you deliver that glossary definition in print and also on the web, that would be single-sourcing. Any given piece of content can be reused or single-sourced or both.

I doubt anyone would argue against minimizing duplication. The benefits are clear and the exceptions are relatively few. I also agree wholeheartedly that single sourcing makes very good sense. But, I differ with the mainstream regarding reuse. I believe you should minimize reuse, not maximize it.

There are two main reasons for minimizing reuse. First, every time you reuse some content, you give your users yet another place to look when they search on line or in a printed index. If you have the same content in several different places, your users end up jumping around among those places, trying to figure out which one they should use. Having one, authoritative place for any particular module will simplify their search and avoid confusion.

Second, even with highly structured methodologies, reusing content is not free. For example, if you aggressively try to drive out duplication, you will inevitably find places where the same content almost fits in two or more contexts, and you need to decide how to handle the situation. When this happens, you have three choices:

  1. decide that the modules are in fact different and maintain them separately,
  2. edit the modules to cover both situations and use just one copy in your source control, or
  3. eliminate one, or both, of the situations.

All three of these choices will cost something. If you take choice one, which gives up on any kind of reuse, you will end up with more content to maintain. That doesn’t make this a bad choice as long as the cost maintaining the content is less than the cost of using one of the other options.

Choice two is classic reuse; you will have some additional work making the module work in both situations, and you’ll have some additional work over time maintaining that independence.

Choice three eliminates duplication and reuse. If you can eliminate one of the situations that used the code, you’ve not only eliminated the duplication, you’ve reduced your overall content. When it works, this choice is the most efficient of the three and ought to be your first choice.

However, if you try to maximize reuse, you will probably head straight to choice two, and may not even consider choice three. In fact, given that it’s easy to create a metric to measure reuse, but difficult to create one to measure where you’ve avoided the need to reuse, the metrics themselves will create a bias towards choice two.

As if that weren’t bad enough, human nature pushes you towards choice two. After all, choice three requires you to eliminate content, and nearly all content was originally generated because someone needed it. Therefore, your natural inclination will be to keep content, even if it’s redundant.

Finally, the typical CMS is designed to make reuse easy. Just mix and match modules, push a button, and poof, you’ve got a new deliverable.

If unchecked, these biases will leave you with a lot of unnecessary reuse. You can argue that’s not a big deal, but even when well structured, a heavily reused module will take more maintenance than one that is used in just one place. In addition, it will needlessly increase the bulk of your deliverables. Both of these factors decrease efficiency. If you are serious about maximizing your efficiency, you need to restructure your documentation with a bias against both duplication and reuse.

So, am I arguing against modular documentation? No. The consistent structure and style used in modular methodologies help people use your documentation. And good methodologies give your authors the guidelines they need to produce that consistent structure and style. Where things go off the rails is when you try to treat your documentation as a set of modules that can be indiscriminately mixed and matched to create whatever deliverables you want.

Jon Bosak put it very nicely in his Closing Keynote at the XML 2006 conference:

Another ancient subject that seems to be popping up again is the idea of modular document creation. This is one of those concepts that comes through about once a decade, seduces all the writing managers with the prospect of greater efficiency, takes over entire writing departments for a couple of years, and then falls out of favor as people finally realize that document reuse is not a solvable problem in document delivery but rather an intractable problem in document writing— which is, how to retain any sense of logical connection between pieces of information while writing as if your target audience consisted entirely of people afflicted with ADD.[Bosak06]

While I advocate minimizing reuse, I don’t advocate eliminating it. Legal notices, glossary definitions, and other boilerplate are obvious cases where reuse is the right strategy. But, in most of the cases where reuse really works, the content being reused is really “meta-content,” content that enhances, but is not central to, the document.

Content that is central to your message deserves a context within which it can live. If it is pulled out of context, as many methodologies encourage, it will either be confusing, or it will require additional information to provide that context, either as part of the module itself, or in the including document.

Instead of trying to build a modular Chinese menu from your content, go ahead and eliminate duplication, then as needed restructure your content to better serve your customer. If you don’t try to maximize reuse and instead simply reuse content when you discover an opportunity[1], you will end up with less reuse, better structured documentation, and a more efficient process. And, your readers won’t feel like they’re afflicted with ADD.

Bibliography

[Bosak06] Jon Bosak. Closing Keynote, XML 2006. XML 2006 Conference. December 5-1, 2006. Boston, MA. Idealliance. . http://2006.xmlconference.org/proceedings/162/presentation.html.

[Dijkstra68] Communications of the ACM. Association for Computing Machinery, Inc.. March, 1968. 11. 3. “Go To Statement Considered Harmful”. Edsger W. Dijkstra. Copyright © 1968 Association for Computing Machinery, Inc.. 147-148. http://www.acm.org/classics/oct95/.

[KHLR07] Kathy Haramundanis and Larry Rowland. Experience Paper – A Content Reuse Documentation Design Experience. SIGDOC 2007. October 22-24, 2007. El Paso, TX. Association for Computing Machinery. . http://sigdoc2007.org.


[1] In a forthcoming paper[KHLR07], Kathy Haramundanis and Larry Rowland describe this as opportunistic reuse.


Advertisements

7 comments

  1. I certainly agree that eliminating duplication is extremely important, and that content should be organized in such a way that users have a single, authoritative place to look for specific information. I would venture to say that I have not seen a lot of actual, direct reuse of content. Often, you see a lot of “repurposing”, where similar content is adjusted in some way to better fit the exact context. This is where DITA gets a little dicey: how do you reuse content without it reading like a ransom note? Modular content works, but I think you have to have a way to “tweak” it slightly for context and readability. A nice feature of a CMS, is that you can add relationship metadata to track related, but repurposed content. Then, when it is updated, you can be notified and then update any repurposed chunks as appropriate. Sure, you are still maintaining multiple content objects, but your document is likely more readable. As you mentioned, it’s best to re-organize your content in such a way that minimizes reuse/repurposing.


  2. Scott, thanks for the comment. I suspect that a good CMS could make updating related content reasonably painless by using an algorithm similar to the gnu patch command, which can generate and apply context sensitive patches. Do you know of any that do that kind of thing?


  3. I enjoyed your ideas to first eliminate duplication and then restructure content to better serve a customer. As a CMS vendor, I have seen increased productivity when our customers eliminate duplication and focus on improving the eventual navigation of content by their customers. The faster a customer finds an answer, the less likely they are to call the customer support hot line. This requires excellent structuring of the content during the creation phase. One of our customers reduced their traditional book page count by 30 percent by following this practice as well as minimalization guidelines.

    My experiences have taught me that re-use practices vary by industry with higher re-use (25 to 40 percent) prevalent in regulated industries such as medical device manufacturing or aerospace. Components of content typically require frequent revisions with several layers of review. The right balance of re-use speeds up approval and re-publishing of updated content. A good CMS should also quickly show an author where all re-uses is referenced within various documents.

    Additionally, better re-use models I’ve see are across different documents used by different audiences – for example:

    – Field engineer repair guide vs. end user documentation
    – Technical product data sheets that also requires a summary book re-using content from several datasheets
    – On-line help system for different OEM deliverables that vary greatly in configuration (could also be achieved with attributes and filtering)


  4. LOL! Well put! Do you remember the time Spence found around 8 copies of the same procedure across our doc set, most different? The original had been copied at some point to each printed document to reduce the users need to pull multiple books to complete a task. Over time, each copy had been updated or tweaked independent (or in ignorance of) of the other copies. That’s one of the times I’ve found where re-use makes a lot of sense.

    With online capabilities, of course, only one instance and links to it are needed. There are still times where print is critical and I find re-use is more common in those cases.


  5. I’m rather surprised that there has been no mention of DITA in this discussion on reuse. Because DITA focuses on the creation of topics, it allows writers to work within a structure that really makes sense and facilitates reuse. A topic includes all of the context that is necessary in order for the topic to stand alone.

    Too often, I think, people have tried to engineer reuse of content at a level of granularity that is simply too specific. It can work for a legal disclaimer that is attached to everything that a company may publish, but very fine granularity tends to break information into such small bits that they have virtually no context and are, therefore, much less useful.

    Also, DITA provides a mechanism called “content references” (CONREF), which allows users to identify content within a topic for reuse. This piece of content can reside in one topic, where it is used in context, but can also be reused in others.


  6. Judy,

    Thanks for your comments. You make some excellent points.

    Regarding DITA, I tried to stay away from specific technologies in this commentary because I think the basic issues are independent of any particular technology. Also, being a member of the DocBook TC, I’m somewhat of a partisan and didn’t want to write specifically about DITA (though I find DITA intriguing, have spent time working with it and the toolkit, and will probably write about it in the future).

    That said, I don’t honestly know if I consider DITA to be a positive or negative force on the question of reuse. As you say, DITA facilitates creation of topics that can stand alone and simplifies reuse through CONREF and other mechanisms. If you’re writing modular documentation, that is a good thing.

    But, while I generally think modular documentation makes sense in a lot of situations, I think it can easily be abused. And I think it’s fair to ask whether DITA, modular methodologies, and some of the CMS vendors encourage the overuse of reuse at the expense of more economical design.

    In the end, I’m inclined to come down on the side of DITA. It may be easier to speed in a Porsche than a Volkswagen, just as it may be easier to use and abuse reuse with DITA, but the person with his or her foot on the accelerator is ultimately responsible.

    Thanks for taking the time to read the piece and leave your comments.


  7. […] Richard Hamilton has an interesting, lengthy blog post on why he considers reuse to be harmful. Hamilton definitely isn’t against reuse, but as he says: While I advocate minimizing reuse, I don’t advocate eliminating it. Legal notices, glossary definitions, and other boilerplate are obvious cases where reuse is the right strategy. But, in most of the cases where reuse really works, the content being reused is really “”meta-content,”” content that enhances, but is not central to, the document. […]



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: