[geeklog-devel] COM_makeClickableLinks

Sami Barakat furiousdog at gmail.com
Wed Jul 30 09:17:09 EDT 2008


Hi,

I think I've got it now, although its not a complete solution

function COM_makeClickableLinks( $text )
{
$text = preg_replace(
'/([^"]?)(((ht|f)tps?):(\/\/)|(www\.))+((?=([^\s]+) ))?(\8|[a-z0-9%&_\-\+,;=:@~#\/.\?\[\]]+)/is',
'\\1<a href="http://\\6\\9">\\6\\9</a>', $text );
return $text;
}

It seems to work well with the following strings:

normal link http://www.url.com
normal link with early quote http://www.url.com/folder"stuff
link with &nbsp; and quotes "http://www.url.com&nbsp;"
www.url.com/ps&nbsp;
complicated link www.sub.url.com/folder/index.php?id=foo&amp;user=bar&nbsp;

it still fails however on these strings

link with two &nbsp; www.url.com/ps&nbsp;&nbsp;
link with early quote and &nbsp; "http://www.url.com/folder"stuff&nbsp;

The results of the two failed strings is

link with two &nbsp; <a
href="http://www.url.com/ps&nbsp;">www.url.com/ps&nbsp;</a>&nbsp;
link with early quote and &nbsp; "<a
href="http://www.url.com/folder"stuff">www.url.com/folder"stuff</a>&nbsp;

The second string could probably be fixed by replacing this part of
the regular expression '[^\s]+' with this
'[a-z0-9%&_\-\+,;=:@~#\/.\?\[\]]+'
But really regular expressions are more helpful when validating
strings or trying to find substrings in complicated strings, they are
not really made to exclude parts of a string. So it might be more
effective and less complicated to run through the expression twice.
The first time matching urls with &nbsp; on the end and the second
time without.

Hope this helps
Sami

2008/7/29 Sami Barakat <furiousdog at gmail.com>:

> Hey,

>

> I have tried looking into this and I have come up with a partial

> solution. From my understanding the problem is when a url has a &nbsp;

> at the end which is getting parsed along with the url. I ask because I

> think Gmail has filtered out some of them. Anyway the following regex

>

> ([^"]?)(((ht|f)tps?):(\/\/)|www\.)([a-z0-9%&_\-\+,;=:@~#\/.\?\[\]]+)(?<![&nbsp;])

>

> Seems to work fairly well. Here is the test code that I am using.

>

> echo '<pre>';

> $string = "normal link http://www.url.com PASS\n";

> echo htmlentities(COM_makeClickableLinks($string));

> $string = "link with &nbsp; and quotes \"http://www.url.com&nbsp;\" PASS\n";

> echo htmlentities(COM_makeClickableLinks($string));

> $string = "complicated link

> \"www.sub.url.com/folder/index.php?id=foo&amp;user=bar&nbsp;\"

> PASS\n";

> echo htmlentities(COM_makeClickableLinks($string));

> $string = "problem link \"www.url.com/words&nbsp;\" FAIL\n";

> echo htmlentities(COM_makeClickableLinks($string));

> echo '</pre>';

>

> This produces

>

> normal link <a href="http://www.url.com">www.url.com</a> PASS

> link with &nbsp; and quotes "<a

> href="http://www.url.com">www.url.com</a>&nbsp;" PASS

> complicated link "<a

> href="http://sub.url.com/folder/index.php?id=foo&amp;user=bar">sub.url.com/folder/index.php?id=foo&amp;user=bar</a>&nbsp;"

> PASS

> problem link "<a href="http://url.com/word">url.com/word</a>s&nbsp;" FAIL

>

> As you can see the first 3 work, the problem occurs when a url ends

> with any of the characters: '&' or 'n' or 'b' or 's' or 'p' or ';'

>

> So www.url.com/ps would return <a href="http://url.com/">url.com/</a>ps

>

> This is due to the last bit of the regex "(?<![&nbsp;])" if I tried

> just doing (?<!&nbsp;) but it does not work at all because the

> previous statement is being too greedy. There is also an issue with

> the www. being removed, but thats not too much of a problem at the

> moment.

>

> Also the COM_makeClickableLinks function can be simplified by removing

> the str_replace statment resulting in simply this

>

> function COM_makeClickableLinks( $text )

> {

> $text = preg_replace(

> '/([^"]?)(((ht|f)tps?):(\/\/)|www\.)([a-z0-9%&_\-\+,;=:@~#\/.\?\[\]]+)(?<![&nbsp;])/is',

> '\\1<a href="http://\\6">\\6</a>', $text );

> return $text;

> }

>

>

> in the original regex I was unsure why the "(\/|[+0-9a-z])" part was

> included. I dont think its necessary so I took it out, maybe there was

> a particular case that required it which Im overlooking.

>

> Anyhow I will have another crack at it later on, it really is a tough

> one, but this is as far as ive got so far.

>

> Sami

>

> 2008/7/28 Michael Jervis <mjervis at gmail.com>:

>> All (especially Sami!),

>>

>> There is a bug in the subject function. If it finds

>> "http://www.url.com" we end up with &nbsp<a

>> href=";http://www.url.com&nbsp">;http://www.url.com&nbsp</a>;

>>

>> Which isn't good.

>>

>> The original regexp in COM_MakeClickableLinks is:

>>

>> /([^"]?)((((ht|f)tps?):(\/\/)|www\.)[a-z0-9%&_\-\+,;=:@~#\/.\?\[\]]+(\/|[+0-9a-z]))/is

>>

>> I think the first match ([^"]?) is spurious, it matches anything other

>> than " before a link. So bhttp://www.foo.com" matches, but

>> "http://www.foo.com doesn't.

>>

>> So that gives:

>> /((((ht|f)tps?):(\/\/)|www\.)[a-z0-9%&_\-\+,;=:@~#\/.\?\[\]]+(\/|[+0-9a-z]))/is

>>

>> Resulting in:

>> &nbsp;<a href="http:///www.url.com&nbsp">http://www.url.com&nbsp</a>

>>

>> So, need to add an "ignore trailing &nbsp;" bit to the clause. Closest

>> I can get is:

>> ((((ht|f)tps?):(\/\/)|www\.)[a-z0-9%&_\-\+,;=:@~#\/.\?\[\]]+(\/|[+0-9a-z]))(?=&nbsp;)

>>

>> Which results in:

>> &nbsp;<a href="http:///www.url.com">http://www.url.com</a>&nbsp;

>>

>> However, unless there were quotes round the link, it won't match! So

>> "http://www.foo.com" matches and is correctly processed, but

>> http://www.foo.com is not matched.

>>

>> My head is now hurt. Any suggestions?

>>

>> --

>> Michael Jervis

>> mjervis at gmail.com

>> 504B03041400000008008F846431E3543A820800000006000000060000007765

>> 62676F642B4F4D4ACF4F0100504B010214001400000008008F846431E3543A82

>> 0800000006000000060000000000000000002000000000000000776562676F64

>> 504B05060000000001000100340000002C0000000000

>> _______________________________________________

>> geeklog-devel mailing list

>> geeklog-devel at lists.geeklog.net

>> http://eight.pairlist.net/mailman/listinfo/geeklog-devel

>>

>




More information about the geeklog-devel mailing list