[geeklog-devel] COM_makeClickableLinks
Sami Barakat
furiousdog at gmail.com
Wed Jul 30 09:17:09 EDT 2008
Hi,
I think I've got it now, although its not a complete solution
function COM_makeClickableLinks( $text )
{
$text = preg_replace(
'/([^"]?)(((ht|f)tps?):(\/\/)|(www\.))+((?=([^\s]+) ))?(\8|[a-z0-9%&_\-\+,;=:@~#\/.\?\[\]]+)/is',
'\\1<a href="http://\\6\\9">\\6\\9</a>', $text );
return $text;
}
It seems to work well with the following strings:
normal link http://www.url.com
normal link with early quote http://www.url.com/folder"stuff
link with and quotes "http://www.url.com "
www.url.com/ps
complicated link www.sub.url.com/folder/index.php?id=foo&user=bar
it still fails however on these strings
link with two www.url.com/ps
link with early quote and "http://www.url.com/folder"stuff
The results of the two failed strings is
link with two <a
href="http://www.url.com/ps ">www.url.com/ps </a>
link with early quote and "<a
href="http://www.url.com/folder"stuff">www.url.com/folder"stuff</a>
The second string could probably be fixed by replacing this part of
the regular expression '[^\s]+' with this
'[a-z0-9%&_\-\+,;=:@~#\/.\?\[\]]+'
But really regular expressions are more helpful when validating
strings or trying to find substrings in complicated strings, they are
not really made to exclude parts of a string. So it might be more
effective and less complicated to run through the expression twice.
The first time matching urls with on the end and the second
time without.
Hope this helps
Sami
2008/7/29 Sami Barakat <furiousdog at gmail.com>:
> Hey,
>
> I have tried looking into this and I have come up with a partial
> solution. From my understanding the problem is when a url has a
> at the end which is getting parsed along with the url. I ask because I
> think Gmail has filtered out some of them. Anyway the following regex
>
> ([^"]?)(((ht|f)tps?):(\/\/)|www\.)([a-z0-9%&_\-\+,;=:@~#\/.\?\[\]]+)(?<![ ])
>
> Seems to work fairly well. Here is the test code that I am using.
>
> echo '<pre>';
> $string = "normal link http://www.url.com PASS\n";
> echo htmlentities(COM_makeClickableLinks($string));
> $string = "link with and quotes \"http://www.url.com \" PASS\n";
> echo htmlentities(COM_makeClickableLinks($string));
> $string = "complicated link
> \"www.sub.url.com/folder/index.php?id=foo&user=bar \"
> PASS\n";
> echo htmlentities(COM_makeClickableLinks($string));
> $string = "problem link \"www.url.com/words \" FAIL\n";
> echo htmlentities(COM_makeClickableLinks($string));
> echo '</pre>';
>
> This produces
>
> normal link <a href="http://www.url.com">www.url.com</a> PASS
> link with and quotes "<a
> href="http://www.url.com">www.url.com</a> " PASS
> complicated link "<a
> href="http://sub.url.com/folder/index.php?id=foo&user=bar">sub.url.com/folder/index.php?id=foo&user=bar</a> "
> PASS
> problem link "<a href="http://url.com/word">url.com/word</a>s " FAIL
>
> As you can see the first 3 work, the problem occurs when a url ends
> with any of the characters: '&' or 'n' or 'b' or 's' or 'p' or ';'
>
> So www.url.com/ps would return <a href="http://url.com/">url.com/</a>ps
>
> This is due to the last bit of the regex "(?<![ ])" if I tried
> just doing (?<! ) but it does not work at all because the
> previous statement is being too greedy. There is also an issue with
> the www. being removed, but thats not too much of a problem at the
> moment.
>
> Also the COM_makeClickableLinks function can be simplified by removing
> the str_replace statment resulting in simply this
>
> function COM_makeClickableLinks( $text )
> {
> $text = preg_replace(
> '/([^"]?)(((ht|f)tps?):(\/\/)|www\.)([a-z0-9%&_\-\+,;=:@~#\/.\?\[\]]+)(?<![ ])/is',
> '\\1<a href="http://\\6">\\6</a>', $text );
> return $text;
> }
>
>
> in the original regex I was unsure why the "(\/|[+0-9a-z])" part was
> included. I dont think its necessary so I took it out, maybe there was
> a particular case that required it which Im overlooking.
>
> Anyhow I will have another crack at it later on, it really is a tough
> one, but this is as far as ive got so far.
>
> Sami
>
> 2008/7/28 Michael Jervis <mjervis at gmail.com>:
>> All (especially Sami!),
>>
>> There is a bug in the subject function. If it finds
>> "http://www.url.com" we end up with  <a
>> href=";http://www.url.com ">;http://www.url.com </a>;
>>
>> Which isn't good.
>>
>> The original regexp in COM_MakeClickableLinks is:
>>
>> /([^"]?)((((ht|f)tps?):(\/\/)|www\.)[a-z0-9%&_\-\+,;=:@~#\/.\?\[\]]+(\/|[+0-9a-z]))/is
>>
>> I think the first match ([^"]?) is spurious, it matches anything other
>> than " before a link. So bhttp://www.foo.com" matches, but
>> "http://www.foo.com doesn't.
>>
>> So that gives:
>> /((((ht|f)tps?):(\/\/)|www\.)[a-z0-9%&_\-\+,;=:@~#\/.\?\[\]]+(\/|[+0-9a-z]))/is
>>
>> Resulting in:
>> <a href="http:///www.url.com ">http://www.url.com </a>
>>
>> So, need to add an "ignore trailing " bit to the clause. Closest
>> I can get is:
>> ((((ht|f)tps?):(\/\/)|www\.)[a-z0-9%&_\-\+,;=:@~#\/.\?\[\]]+(\/|[+0-9a-z]))(?= )
>>
>> Which results in:
>> <a href="http:///www.url.com">http://www.url.com</a>
>>
>> However, unless there were quotes round the link, it won't match! So
>> "http://www.foo.com" matches and is correctly processed, but
>> http://www.foo.com is not matched.
>>
>> My head is now hurt. Any suggestions?
>>
>> --
>> Michael Jervis
>> mjervis at gmail.com
>> 504B03041400000008008F846431E3543A820800000006000000060000007765
>> 62676F642B4F4D4ACF4F0100504B010214001400000008008F846431E3543A82
>> 0800000006000000060000000000000000002000000000000000776562676F64
>> 504B05060000000001000100340000002C0000000000
>> _______________________________________________
>> geeklog-devel mailing list
>> geeklog-devel at lists.geeklog.net
>> http://eight.pairlist.net/mailman/listinfo/geeklog-devel
>>
>
More information about the geeklog-devel
mailing list