WordPress HTML output filter

Available for download is my WordPress HTML output filter (the HOF), a plugin for WordPress 2.9 (it might work with previous versions, it might require tweaking). WordPress installations on large-scale shared hosts have come under assault recently, as noted in the WordPress Development blog, not through any flaw in WordPress but flaws in the web hosts’ servers. Those of us who use such services can never be sure of how secure the server is.

The plugin does 2 things:
1. It provides a dashboard utility, under Tools on the left-hand menu bar on the admin pages, which scans your WordPress pages and looks for malicious code (this might not spot all malicious code – see below)
2. A filter to remove malicious HTML code from your whole site, regardless of how the code was placed on the server (malicious or hacked plugin, comments or posts altered in the database).

Number 2 is turned off when the plugin is installed. First run a scan and allow all the code and domains that are allowed then turn on the filter.

Number 2 is important because some  recent hacks have been targeted only at certain traffic (for example, traffic coming from google) to hide the fact that the site has been hacked. The scan (number 1) will not pick up on a hack which is targetted in this way as the server has to request all the pages from itself. Number 2 will seek to prevent any harm coming to visitors to a hacked site (which can lead to blacklisting). The plugin does nothing to stop the site being hacked in the first place but seeks to limit the harm done and ruin the plans of the hackers. Even if your templates and database were hacked it should still filter out the crud.

Because the plugin checks for correct HTML first (to correct mismatched tags ready for the second stage of validation) I recommend testing the site first with either the W3C validator (http://validator.w3.org/) or the firefox HTML Validator plugin (https://addons.mozilla.org/en-US/firefox/addon/249) and correcting the HTML until it validates. All decent themes should validate OK. Otherwise you will have lots of HTML errors listed when you do a scan.

The first time you run it you’ll get a lot of errors reported since it won’t recognise any of the domains in the links on your site or any of the javascript. You can tick the checkbox to automatically add everything into the whitelist. This makes it easier to set up but I recommend checking the whitelists to see if it’s found any domains or code that shouldn’t be there.

Go to a normal blog page in your browser (or hit Refresh) and select View Source. You should see this line at the end of the file:

<!– scanned by HTML Output Filter by /wordpress-html-output-filter/ –>

It sends pretty good feedback on what it’s removed in HTML comments at the end of the returned file.

Share this:
Share this page via Email Share this page via Stumble Upon Share this page via Digg this Share this page via Facebook Share this page via Twitter

Leave a Comment