Refactoring Incremental Reader, or How to highlight text across nodes in the DOM

The last few days, I’ve been able to carve out some time to get back to working on Incremental Reader. My short term goal is to get a more sane structure so I can go back and add unit tests. I was recently exposed to Test Driven Development through a Coursera class “Introduction to Systematic Program Design“, and I’m pretty excited to figure out how to work this into my development process.

Highlighting text! As easy as 1,2…

The last feature I implemented was the right-click context menu to highlight selected text in the current browser tab. Correctly inserting tags to create the highlighted text was a huge pain, mainly because I want notes(highlighted text) to span multiple DOM elements and split anywhere (including the middle of an element). It’s been a bit since I’ve worked on the extension, so I thought I’d walk through the code to refresh my memory on how I implemented this feature.

First to get the selected text:

var highlightRange = window.getSelection().getRangeAt(0);

This returns a Range object, which can contain full nodes and parts of text nodes. The range object has a startContainer and an endContainer property which indicates the node at the start and end of the range. When these are equal, our range doesn’t cross through other nodes, making our job easy. All we have to do is split the node at the start and end offsets, and wrap the new node with with any tag containing a background(highlight) color.

var startContainer = highlightRange.startContainer;
var endContainer = highlightRange.endContainer;
var newNode = document.createElement('mark'); = "yellow";

if (startContainer == endContainer) {
    var splitNode = startContainer.splitText(highlightRange.startOffset);
    var temp = splitNode.splitText(highlightRange.endOffset);
    var insertElement = startContainer.parentNode.insertBefore(newNode, splitNode);
} else {

First, we get the nodes where the range starts and ends. The .startOffset property of the range object contains the offset from the start of the node where the highlight began. .splitText() breaks a node at a given offset and returns the portion after the offset as a new node. The .endOffset is actually still remains correct (it’s updated on splitText), so we don’t have to worry about recalculating that value, we simply call it again to split the node at the end of the highlight.

Screenshot Highlighting a single DOM Node

splitNode now contains the node we want to visually highlight on the screen. To do this we create a <mark> element, style it, and insert it in the correct location. This is as easy as just getting the parentNode of our splitNode, and using insertBefore to insert it as a child element, just before the splitNode.

We then use .appendNode to move the node into the new <mark> element. If you use .appendNode on an existing element in the DOM, it will remove it from it’s current position and add it to the new location.

Now we’re done and that was pretty simple, right? Here’s where it gets a little more tricky. Imagine you want to start highlighting a note in the middle of an <h3> and end a few elements away on another <h3> or a <p> element. We can’t just split and wrap the entire block with one tag, it will break the DOM structure.

Screenshot - Highlight across multiple DOM nodes

The solution I came up with is to split the first element and the end element, then wrap all the elements with <mark> individually. Since the <mark> element is just for the user to have visual confirmation of the notes they’ve highlighted on the page, I have no problem with doing it this way. I also haven’t thought of any other solutions, so there’s that too.

After we have split the start and end node, we extract just the text nodes, since these are what we want to wrap. The only way to do this is to extract the the text nodes of range.commonAncestorContainer, meaning the closest element that wraps around the entire range.

Now we’ve all the text nodes including those of the parent container of our highlighted text. To get rid of all the extra nodes, we can use Range.intersectsNode to determine which nodes in our list are actually in the selection. Then, we just loop through our nodes and wrap them all with a <mark> element. You can check out the full code here: content.js on Github

Screenshot Successful DOM highlighting


I’ve got way too much going on in the popup code. I’m going to try to consolidate everything I can into the event script. I’m not sure if the popup page is even going to stay, as the plan is to have a management view to manage articles and notes. The popup might be redundant, as saving pages can be done with a context menu instead.

I’ve also been working out a new way to structure the data. Right now all pages are stored as an array in localStorage. Each item in localStorage is an array containing the url, page title, and the current vertical location of the window (scrollTop or pageYOffset ). This was fine for testing but I plan to store much more info.

Here is the current working list of how I plan to restructure it:

// Collection of articles, future possibility to group articles
articleCollection {
    articles:        // list of articles

    // methods
    save:            // save to local storage
    load:            // retrieve from local storage

article {
    url:             // url of article
    prevLocation:    // scrollTop or pageYOffset
    notes:           // list of notes
    created:         // date article added
    accessed:        // date last accessed
    archived:        // bool, allow user to set when done reading

    // methods
    save:            // save article to collection
    remove:          // remove article from collection

note {
    content:         // raw text content of note
    location:        // original DOM node position, may help
                     // when locating note on page reload

    // methods
    save:            // save note to article
    remove:          // remove note from article

You can see I’ve tried to structure everything in a little bit more of an object oriented way. I’ve added some possibilities for helper methods, but haven’t added getters. I’m not sure that I need to hide any of the data (i.e. wrap in a closure), but I’ll consider that in the future.

You can see I’ve got a pretty good start on getting this up and running, but I’ve still got some more planning to do. I haven’t even gotten in to adding unit tests yet!