Month: November 2011

Kindle xRef link bug

Background

When I first started generating Mobi files (for the Kindle) in addition to my ePub workflow, I came up with a list of annoying formatting issues that I just accepted and chalked up as “Kindle Formatting Limitations”. ie:

  • No embedded fonts
  • Extremely limited stylesheet options
  • Formatting being ignored when linking from one section to another
  • Page elements being pushed to previous page when linking from one section to another
  • And so on

However, the item highlighted in green (in the above list) bugged me the most because it just didn’t seem right.

After digging much deeper into the issue, I finally found the cause, followed by a solution.
(I use InDesign CS5.5 for the majority of my ePub/Mobi workflows, so that is what I will be referencing in the following examples.)

My workflow:
InDesign–>ePub–>(Oxygen/TextWrangler)–>ePub–>Kindlegen–>Mobi

It seems that the Kindle will choke if an “Anchor” id is placed (inline) with the referenced line of text:

<h1 id="toc_marker-2"><a id="Anchor"/>CHAPTER 2</h1>

This example references the id in “Green” above:

Jumping to Chapter 2 from the TOC link causes no formatting issues on the Kindle

And this example references the id in “Red” above:

Jumping to Chapter 2 from an in-text cross-reference causes the Kindle to ignore formatting for the destination line

Why are there 2 different “id” locations?

InDesign CS5.5 creates ePub “id”s based on 2 types of sources:

  1. In the ePub Export Settings Dialog, you choose which InDesign TOC preset to use in order to create the .ncx file for the ePub

    When InDesign creates these links within the ePub files, it places the “id”s inside the opening html tag:

    <h1 id="toc_marker-2">

    When run through Kindlegen to create the Mobi file, these links work just fine on the Kindle.

  2. When you create internal Hyperlinks and Cross-references within InDesign, they export to ePub links as “id”s wrapped in <a> tags, and then placed in-between the corresponding HTML opening/closing tags:
    <h1 id="toc_marker-2"><a id="Anchor"/>CHAPTER 2</h1>

    When run through Kindlegen to create the Mobi file, these links cause the “loss of formatting” issue (illustrated above).

**It is worth noting that both of these examples of “id” locations are perfectly valid HTML and ePub. The Kindle is the only eBook reader/device (that I am aware of) that does not play nicely with both.

How to fix this on the Kindle?

After some testing, I found that simply moving the “bad id” outside of the HTML tag, solves the issue:

From this:

<h1 id="toc_marker-2"><a id="Anchor"/>CHAPTER 2</h1>

To this:

<a id="Anchor"/><h1 id="toc_marker-2">CHAPTER 2</h1>

How I accomplish this using GREP

I use either Oxygen or TextWrangler (on my Mac) to do my ePub/Mobi post-processing.

  • Oxygen is my preferred program because it does not require cracking open the ePub ahead of time. And it also has built-in ePub 2 validation. However, it is painfully slow when trying to apply batch fixes on large ePub files (hundreds or thousands of internal ePub HTML files), which I do a lot of.
  • I use TextWrangler to run my batch fixes on these larger projects. You need to crack the ePub file open to do so, but it is extremely fast and worth the extra step. And it is also free.

Here is the GREP pattern that I would use to fix the example in this post:

Find:

(<h\d/?[^\>]+class="/?[^\>]+">)(<a id="Anchor"/>)

Replace:

\2\1

The above pattern splits the search pattern into 2 slices and then flips them.

It can (and should) be re-written to account for other HTML tags (ie. <p>, <pre>, <div>, etc.) and multiple “Anchor id”s (ie. “Anchor-1”, “Anchor-12”, etc.).
Basically, it can be re-written to account for your specific situations.

I hope this is helpful to you all.

Cheers.