This was annoying. The solution I talk through below is still not perfect.

A little background first. As a side project to better understand chrome extensions, I built this extension Traffic Light SEO.

As you navigate to a page, it shows a preview of title and meta description as it would appear on Google, and 3 traffic lights which will tell you if a page is indexable by Google.

Often when talking to a client I needed to be able to immediately tell if a page is indexable and that involves 3 things:

  • Checking robots.txt
  • Checking the canonical & no-index in the body.
  • Checking the canonical & no-index in the headers.

You have to check the robots.txt, canonical and no-index directives in the head and canonical and no-index directives in the headers of the page.

The last one is the fiddly one.

Table of Contents

A quick background on how a chrome extension works

There are two main ways you run JS in a chrome extension.

  1. A background script that runs essentially as it’s own tab that can hook into the chrome internals.
  2. A content script that can run on every page.

The second is where the majority of my extension runs. I want to have the traffic lights appear on the page as you’re browsing so I need to run JS on that page.

However a content script can’t access the headers (for what I assume is either a combination of security or performance reasons), so you need to pass them through from the background script.

Passing headers to a content script

What did I settle on?

While the chrome.webRequest API definitely appears to be the correct way to pass through the headers to a content script to execute code on them, all the listeners I’ve tried have been randomly inconsistent and occasionally miss requests for reasons that I don’t really understand. (See how did I get there for more details).

chrome.webRequest.onResponseStarted.addListener API however performs notably more consistently and catches more main_frame requests than chrome.webRequest.onCompleted.addListener, so I’ve switched Traffic Light SEO to use that.

chrome.webRequest.onResponseStarted.addListener(
	// This takes 3 parameter:
	executeTrafficLights,
	{
		//the function to be executed on callback
		urls: ["*://*/*"],
		//the request types to be applied to
		types: ["main_frame"]
		// the tab id to watch
		// tabId: tab.id
	},
	["responseHeaders"]
);

function executeTrafficLights(details) {
	/*
	* This function takes the URL, removes the hash and then
	* runs a function on the tab submitted.
	*/

	var url = details.url.split("#")[0];
	checkURLExclusion("extension_default.js", details.tabId, false, url);
	checkURLExclusion(
		"built_seo_traffic.js",
		details.tabId,
		true,
		url,
		details // headers are passed through in this
	);

	// Then remove self after executing
	chrome.webRequest.onCompleted.removeListener(executeTrafficLights);
}

Enjoy hopefully the notably improvement consistency of the plugin.

How did I get there?

To figure out how to pass through header information to a content script I started working my way through the various relevant APIs.

First I had a look at chrome.tabs.onUpdated.addListener.

  1. For each tab it will always trigger with a status of complete when the tab is finished loading (I’m unsure of the exact definition of load in this situtation). This is exactly what we want.
  2. It doesn’t contain the headers. This unfortunately is not.

The webRequest API does contain the headers, however and the following listener certainly reads as the most appropriate chrome.webRequest.onCompleted.addListener.

  1. It should fire every time a request completes.
  2. It does indeed contain the headers information.

However for reasons beyond me, the first of those isn’t true.

chrome.webRequest.onCompleted.addListener doesn’t see a main_frame request for every tab load. I suspect perhaps it’s caching related, although I’ve also seen this behaviour happen when network requests claims the URL was loaded properly (i.e. not loaded from disk) and of course it’s wildly inconsistent making it very hard to debug. (Which in my experience indicates caching.)

This was previously what Traffic Light SEO was using. It then ran a second content script in chrome.tabs.onUpdated.addListener which checked to see if the first ran and loaded a - please reload the tab message in lieu.

I then started playing around with other APIs and found that chrome.webRequest.onResponseStarted.addListener not only still contains the headers, but also seems to fire successfully nearly all of the time (although frustratingly still not all of them) as well as shaving a small bit of time off running.

Notes on the APIs and what they return

If you are curious to build your own extension, this may also be some useful information.

What does chrome.tabs.onUpdated.addListener see?

chrome.tabs.onUpdated.addListener(function(tabId, changeInfo, tab) {
	//code goes here.
};

The two interesting parts of the above are changeInfo & tab.

changeInfo

changeInfo fires and returns information about the tab that is changing. It will fire multiple times with different content. This includes the status of the tab:

{status: "loading"}
{status: "complete"}

Or sometimes just the title or faviconURL.

{title: "Online Marketing & Search Conferences by Distilled"}
{favIconUrl: "https://www.distilled.net/static/images/favicon.ico"}

tab

tab shows default information about a tab:

{
    active: true
    audible: false
    autoDiscardable: true
    discarded: false
    favIconUrl: "https://www.distilled.net/static/images/favicon.ico"
    height: 723
    highlighted: true
    id: 797
    incognito: false
    index: 27
    mutedInfo: {muted: false}
    pinned: false
    selected: true
    status: "complete"
    title: "Online Marketing & Search Conferences by Distilled"
    url: "https://www.distilled.net/events/"
    width: 1536
    windowId: 3
}

What does chrome.webRequest.onCompleted.addListener see?

chrome.webRequest.onCompleted.addListener(
	// the function to be run
	my_func,
	// A filter set to decide where the function
	// should be run
	{
		//
		urls: ["*://*/*"],
		//the request types to be applied to
		types: ["main_frame"],
		// the tab id to watch
		tabId: tab.id
	},
	// Ask for the headers
	["responseHeaders"]
);

function my_func (details) {
	// code goes in here
}

The function will then receive the following object:

{
    frameId: 0
    fromCache: false
    ip: "143.204.181.74"
    method: "GET"
    parentFrameId: -1
    requestId: "190236"
    responseHeaders: (4) [{…}, {…}, {…}, {…}]
    statusCode: 200
    statusLine: "HTTP/1.1 200 OK"
    tabId: 797
    timeStamp: 1547638168224.9392
    type: "main_frame"
    url: "https://www.distilled.net/events/"
}

The caveat is that this doesn’t fire for the main request on every tab.

Comments?

Has anyone run into this issue before?

If you have successfully built an extension which passes headers through every time to the page? Please let me know what I’m missing.

Otherwise hope this is helpful.