[geeklog-users] fix in COM_isemail

Bob Apthorpe apthorpe+geeklog at cynistar.net
Tue Dec 9 16:43:42 EST 2003


Hi,

On Tue, 9 Dec 2003, Lucas Gonze wrote:

> I added '+' character to the list of allowed chars in on the left side
> of an email address.  This permits email addresses like
> joe+geeklog at site.com, which some people use to flag the source of spam.
>
> New code:
> function COM_isemail( $email )
> {
>      if( eregi(
> "^([-_0-9a-z+])+([-._0-9a-z+])*@[0-9a-z+]([-.]?[0-9a-z])*.[a-z]{2,3}$",
> $email, $check ))
> 	  // was:
> 	  // if( eregi(
> "^([-_0-9a-z])+([-._0-9a-z])*@[0-9a-z]([-.]?[0-9a-z])*.[a-z]{2,3}$",
> $email, $check ))
>      {
>          return TRUE;
>      }
>      else
>      {
>          return FALSE;
>      }
> }

You can reduce your regex to:

"^[-_0-9a-z+]+[-._0-9a-z+]*@[0-9a-z+]([-.]?[0-9a-z])*.[a-z]{2,3}$",

I've stripped out some of the unnecessary parens. I'm not sure if you want
to allow addresses of the format +++++++ at example.com, but that's
technically allowed.

Here's what I'm using:

    if( eregi(
"^[-_0-9a-z][-_.0-9a-z]*\\+?[-_.0-9a-z]*@[0-9a-z]([-.]?[0-9a-z])*\\.[a-z]{2,6}$",
$email, $check))

Breaking it down, here's what it does:

"^[-_0-9a-z]		# starts with one of [-_0-9a-z]
[-_.0-9a-z]*		# followed by 0 or more of [-_.0-9a-z]
\\+?			# then 0 or 1 '+'
[-_.0-9a-z]*		# then 0 or more of [-_.0-9a-z]
@[0-9a-z]		# then '@' followed by one of [0-9a-z]
([-.]?[0-9a-z])*	# then 0 or more of ( 0 or 1 of [-.]
			# and 1 of [0-9a-z])
\\.			# a literal '.'
[a-z]{2,6}$"		# terminated with 2-6 of [a-z]

The major differences are that I only allow one '+' (technically you can
have more but most people only use one), the [something][otherthing]*
pattern is better behaved than [something]+[otherthing]* because the regex
engine doesn't backtrack as much (this matters when [something] and
[otherthing] are very similar; it's an efficiency tweak), and the TLD is
from 2-6 characters rather than 2-3, taking into account the new longer
TLDs .aero, .coop, and .museum (not that these matter much in practice but
they are legal.)

I still haven't found the time to implement a more robust email address
validator in PEAR's Mail module. Brave people should look at
http://www.faqs.org/rfcs/rfc822.html to see the pain involved in parsing
email addresses.

hth,

-- 
Bob Apthorpe



More information about the geeklog-users mailing list