Home » Articles

Filter you output with HTML Purifier

paan 21 July 2008 Articles 745 views No CommentPrint This Post Print This Post Email This Post Email This Post

Now that you set up a rich text editor to you phpLD directory you would want to set up some filtering to prevent sites from doing nasty things like maybe insert a javascript in your site. Most phpld templates is already doing this. But they are using the smarty ‘escape’ modifier which will escape ALL tags and leave you with something like this.

Ugh.. We don't want this ugly html tags showing

Ugh.. We don't want this ugly html tags showing

See the <p> tags? We don’t want that. We want to provide the submiters the ability to have some html formating power and yet we don’t want to allow them the use of fancy XSS or javascript on the site. Keeping with the opensource spirit we will go ahead and use another freely available component to filter our input. It is called HTML Purifier.

Getting HTML Purifier

Head on over to HTML Purifier download page and download it. There is a lite download that strips out all the extra documentation etc. from the package. That really is all that we need but the documentation does comes in handy sometimes. So I recommend the full package.

Installing HTML Purifier

We need to put the libraries somewhere and, like we did with the tinymce library in the previous article in the series, the lib directory in your phpLD installation is perfect for this. Just copy the whole library directory into the lib directory and rename it to something more descriptive, ‘htmlpurifier’ will do.

This is what your lib directory will look like

This is what your lib directory will look like

Configure

[The few paragraphs below are just some of my thoughts about why I do things a certain way. They are kindda rant-ish. So you can just skip to the "Method 1: Handle it at code level" section if you want]

Ok now we need to configure it so that phpLD will know how to use it. There two way to go about this.

One is to do this during the process of getting the data from the database. Purify the data as we get the data then pass it to the template engine. This is usually the way I would prefer to do it. But the way phpld get links info is to give it to a variable straight from the resultset of a sql query. So there’s no way to intercept it, so we need to take the $links variable and travers through the resultset and filter it after it is assigned. Seems like a hack to me.

The other way to to send the data as is and let the template system handle the purifying. This involves writing a smarty plugin. Which is a better solution, in this case in my oppinion, but if you change theme then you have to remember to update the theme.

I’m leaning towards the first option really because I feel that this is a logic code rather than a UI code. And, following the MVC convention, should be handle by the code before handing it over to the template system. Either way, I’m really divided with this and I’ll just show you both ways(I’ll show the template method in the next article) and you can try out the ones that you thing fits your purpose.

Method 1: Handle it at code level.

Include the libraries

First you need to include the htmlpurifier include file in init.php. You need to add an extra include right after all the includes for the smarty and other libs includes.

You need to insert your include at the end of the other libs includes. Around line 52

You need to insert your include at the end of the other libs includes. Around line 52

After line 52, add the following line

52
require_once 'libs/htmlpurifier/HTMLPurifier.auto.php';

Calling the purifier

Now that the includes are done with. We need to go and call the purifier on out data. Go to index.php, near the bottom, right before we assign the variables to the template and call it.

Index.php.Just before assigning it to the template. Around line 178.

Index.php.Just before assigning it to the template. Around line 178.

Here we need to insert to following.

180
181
182
183
184
185
186
$purifier = new HTMLPurifier();
if ( (isset($links)) && (!empty($links)) ){
	foreach ($links as &$link){
		$link["DESCRIPTION"]=$purifier->purify($link["DESCRIPTION"]);
 
	}
}

Disabling the smarty filtering

One more thing that we need to do is to disable the filtering that is done by smarty. We need to go to link.tpl. And remove the escape modifier on the smarty variable {$link.DESCRIPTION}

Remove the escape modifier on $link.DESCRIPTION. Around line 12.

Remove the escape modifier on $link.DESCRIPTION. Around line 11.

This is of course will be a little different depending on your template.But the important thing is that you remove the escape modifier from $link.DESCRIPTION.

11
{$link.DESCRIPTION|trim}<br />

And you’re done.

Yay. Now we have bold text.

Yay. Now we have bold text.

Other considerations.

Remember that htmlpurifier is only PHP5 compatible. There is an old php4 version that you can use if you are still using php4. (You really should move to PHP5 :D).

You can also specify exactly which HTML tags that you want to allow and which ones you want to disable. But for me I think that the default that comes with htmlpurifier is good enough for me. You can consult the documentation on how to configure these.

The manual also have something to say about the loaders. If you you are using opcode cache like APC and the likes. The htmlpurifier autoload will have some problems with that and you need to spicify another include files that doesn’t use the autoload functionality. The manual also touches on performance issues. If you are concerned with that then you definitely should read the manual.

Summary

htmlpurifier is a good way to allow more flexibility to your submitters. There are many things that you do with it. For example letting premium listing have access to formatting that allow them to make their listing stand out more. It’s all up to you how you want to implement this.

Digg!
This entry is part 2 of 3 in the series Powered by phpLD
Series Navigation«Powered by phpLD«Rich text editor
Rate this:
2.5

Have your say!

Add your comment below, or trackback from your own site. You can also subscribe to these comments via RSS.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>