[geeklog-users] fix in COM_isemail

Bob Apthorpe apthorpe+geeklog at cynistar.net
Tue Dec 9 16:43:42 EST 2003


Hi,

On Tue, 9 Dec 2003, Lucas Gonze wrote:


> I added '+' character to the list of allowed chars in on the left side

> of an email address. This permits email addresses like

> joe+geeklog at site.com, which some people use to flag the source of spam.

>

> New code:

> function COM_isemail( $email )

> {

> if( eregi(

> "^([-_0-9a-z+])+([-._0-9a-z+])*@[0-9a-z+]([-.]?[0-9a-z])*.[a-z]{2,3}$",

> $email, $check ))

> // was:

> // if( eregi(

> "^([-_0-9a-z])+([-._0-9a-z])*@[0-9a-z]([-.]?[0-9a-z])*.[a-z]{2,3}$",

> $email, $check ))

> {

> return TRUE;

> }

> else

> {

> return FALSE;

> }

> }


You can reduce your regex to:

"^[-_0-9a-z+]+[-._0-9a-z+]*@[0-9a-z+]([-.]?[0-9a-z])*.[a-z]{2,3}$",

I've stripped out some of the unnecessary parens. I'm not sure if you want
to allow addresses of the format +++++++ at example.com, but that's
technically allowed.

Here's what I'm using:

if( eregi(
"^[-_0-9a-z][-_.0-9a-z]*\\+?[-_.0-9a-z]*@[0-9a-z]([-.]?[0-9a-z])*\\.[a-z]{2,6}$",
$email, $check))

Breaking it down, here's what it does:

"^[-_0-9a-z] # starts with one of [-_0-9a-z]
[-_.0-9a-z]* # followed by 0 or more of [-_.0-9a-z]
\\+? # then 0 or 1 '+'
[-_.0-9a-z]* # then 0 or more of [-_.0-9a-z]
@[0-9a-z] # then '@' followed by one of [0-9a-z]
([-.]?[0-9a-z])* # then 0 or more of ( 0 or 1 of [-.]
# and 1 of [0-9a-z])
\\. # a literal '.'
[a-z]{2,6}$" # terminated with 2-6 of [a-z]

The major differences are that I only allow one '+' (technically you can
have more but most people only use one), the [something][otherthing]*
pattern is better behaved than [something]+[otherthing]* because the regex
engine doesn't backtrack as much (this matters when [something] and
[otherthing] are very similar; it's an efficiency tweak), and the TLD is
from 2-6 characters rather than 2-3, taking into account the new longer
TLDs .aero, .coop, and .museum (not that these matter much in practice but
they are legal.)

I still haven't found the time to implement a more robust email address
validator in PEAR's Mail module. Brave people should look at
http://www.faqs.org/rfcs/rfc822.html to see the pain involved in parsing
email addresses.

hth,

--
Bob Apthorpe



More information about the geeklog-users mailing list