activeperl@listserv.activestate.com
[Top] [All Lists]

Re: ActivePerl Digest, Vol 27, Issue 49

Subject: Re: ActivePerl Digest, Vol 27, Issue 49
From:
Date: Wed, 26 Apr 2006 15:07:37 -0400
Deane-

response embedded.

activeperl-bounces@xxxxxxxxxxxxxxxxxxxxxxxx wrote on 04/26/2006 12:32:01 
PM:
> Today's Topics:
>    3. Got this far with regex, now I'm stumped
>       (Deane.Rothenmaier@xxxxxxxxxxxxx)
> ----------------------------------------------------------------------
> ------------------------------
> 
> Message: 3
> Date: Wed, 26 Apr 2006 09:08:13 -0500
> From: Deane.Rothenmaier@xxxxxxxxxxxxx
> Subject: Got this far with regex, now I'm stumped
> To: activeperl@xxxxxxxxxxxxxxxxxxxxxxxx
> Message-ID:
> 
<OFF7B2E756.F74863B7-ON8625715C.004BB2BE-8625715C.004DAC50@xxxxxxxxxxxxx>
> 
> Content-Type: text/plain; charset="us-ascii"
> 
> Hi, all.
> 
> I have a sub that uses a set of URL-parsing regexes that almost works:
took a bit to see the mistake but i see it!!!

> 
> if ($url =~ m{^(.*)\.([^\.]+\...\...)$}) {
>    $domain = $2; 
>    $child = $1;
> }
> else {
>    if ($url =~ /^[^\.]+?\.\w{2,4}$/) {
>       $domain = $url;                                   # "www.xx.yy" 
> should have ended up here. . .

your expression as a base pattern: \w{3}\.\w{2}\.\w{2}
your match: ^[^\.]+?\.\w{2,4}$
if your pattern had been \w{3}\.\w{2} it would have matched.
^[^\.]+\.\w{2,4}$ would work as well, but it would also work with what 
matches below.
decide how to rework to the exact format you want.

>    }
>    else {
>       $url =~ m{^(.*)\.(.+\.\w{2,4}).*$};      # . . . but it ended up 
> here
>       $domain = $2;
>       $child = $1;
>    }
> }
> 
> It catches almost all the URL formats it needs to, like 
"www.defgh.xx.yy", 
> but it misses one possible format, "www.xx.yy". For this URL the sub 
that 
> uses the regex returns "xx.yy" as the domain and "www" as the child, 
which 
> means that there's still something not quite right with the regex in the 

> second if statement. The sub should've returned "www.xx.yy" as the 
domain, 
> with no child. See the comments in the code sample for where that URL 
> landed, vs. where it should've landed.
> 
> I've ordered "Mastering Regular Expressions" but it hasn't arrived yet, 
so 
> any help would be appreciated.
> 
> Thanks,
> 
> Deane


-----------------------------------------
PLEASE NOTE: 
SeaChange International headquarters in Maynard, MA is moving!
Effective March 1, 2006, our new headquarters address will be:

SeaChange International 
50 Nagog Park 
Acton, MA 01720 USA 

All telephone numbers remain the same: 
Main Corporate Telephone: 978-897-0100 
Customer Service Telephone: 978-897-7300

_______________________________________________
ActivePerl mailing list
ActivePerl@xxxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

<Prev in Thread] Current Thread [Next in Thread>