kc@mail.pm.org
[Top] [All Lists]

Re: [Kc] Perl Question: XML::Twig module

Subject: Re: [Kc] Perl Question: XML::Twig module
From: Daryl Fallin
Date: Mon, 28 Jun 2010 10:12:00 -0500
All -

Figured out the problem.  Sterling Hanenkamp got me going in the right direction.

Anyway... I was using an abstract example to ask my question, so here is an explanation and my actual code.

I am working with the Qualys API and I wanted to pull all scan data back from Qualys so that I can store and mashup the data against other data sources.

The DTD for the Qualys xml is:  https://qualysapi.qualys.com/scan-1.dtd  (This will give you the structure of the XML file)

Here is the basic code that I ended up with.  This works on the xml file after being retrieved from Qualys.


*************************************************
#!/usr/bin/perl -w

# Indentation style: 1 tab = 4 spaces


require XML::Twig;

sub info {
        my ($xml, $info) = @_;
        my $elt = $info;
        if ($elt->is_elt =~ m/(VULN|SERVICE|INFO|PRACTICE)/) {
            printf "VALUE: %s \n", $elt->parent->parent->parent->att("value");
            printf "ENT: %s \n", $elt->is_elt;
        }  

        if ($elt->is_elt =~ m/(OS|NETBIOS_HOSTNAME)/) {
                printf "VALUE: %s \n", $elt->parent->att("value");
                printf "ENT: %s \n", $elt->is_elt;
                printf "%s\n", $elt->text;

        }  
        while ($elt= $elt->next_elt($info) )
        {  
                my $localname = $elt->local_name;
                if ($localname ne '#CDATA' && $localname ne '#PCDATA') {
                    printf "%s: ", $localname;
                    printf "%s\n", $elt->text;
                }  
        }  
        printf "\n\n";
}

#===================================================
#Main program section


$xml = new XML::Twig(
        TwigHandlers => {
                SERVICE             => \&info,
                VULN                => \&info,
                OS                  => \&info,
                NETBIOS_HOSTNAME    => \&info,
                INFO                => \&info,
                PRACTICE            => \&info,
                HEADER              => \&info,
                #_all_       => \&info,                         # not using _all_ to ignore the toplevel SCAN tag
        }, 
        error_context => 1,

);

# Parse the XML
$xml->parsefile('sample.xml');

******************************************************************


On Fri, Jun 25, 2010 at 7:31 PM, Daryl Fallin <darylvf@xxxxxxxxx> wrote:
Hi All ....

I have been trying to work with XML::Twig lately to parse an xml file.

I just want to dump every element/Tag of the xml file.  But my while loops seems to be doing something weird or its the way that XML::Twig is working, not sure, but I get duplicate information from the original XML file.  Its like it is running part of the while loop twice.

I know there are other modules that I could use but I am using XML::Twig for other parts of what will be a larger program and I want the chunking that XML:Twig allows.

Any help would be greatly appreciated.

Here is my sample code:

#!/usr/bin/perl -w

require XML::Twig;

sub info {
        my ($xml, $info) = @_;
        my $elt = $info;
        while ($elt= $elt->next_elt($info) )
        {
                $elt->set_remove_cdata(1);
                $elt->set_pretty_print("record");  # print one field per line
                printf "%s\n", $elt->sprint;
        }
}

$xml = new XML::Twig(
        TwigHandlers => {
                XML_DIZ_INFO       => \&info,
        }
);

# Parse the XML
$xml->parsefile('sample.xml');

************************

sample.xml
-----------------
<?xml version="1.0" ?>
<XML_DIZ_INFO>
        <MASTER_PAD_VERSION_INFO>
                <MASTER_PAD_VERSION>1.0</MASTER_PAD_VERSION>
                <MASTER_PAD_EDITOR>Master Editor here</MASTER_PAD_EDITOR>
                <MASTER_PAD_INFO>information would go here </MASTER_PAD_INFO>
        </MASTER_PAD_VERSION_INFO>
        <Company_Info>
                <Company_Name>Moyea Software Co., Ltd.</Company_Name>
                <Country>China</Country>
                <Company_WebSite_URL>http://www.whatever.com</Company_WebSite_URL>
                <Contact_Info>
                        <Author_First_Name>Bob</Author_First_Name>
                        <Author_Last_Name>King</Author_Last_Name>
                        <Author_Email>product@xxxxxxxxx</Author_Email>
                </Contact_Info>
        </Company_Info>
</XML_DIZ_INFO>

============================================
The following is the output I get.  After the closing </Company_Info> it should stop.
============================================

  <MASTER_PAD_VERSION_INFO>
    <MASTER_PAD_VERSION>1.0</MASTER_PAD_VERSION>
    <MASTER_PAD_EDITOR>Master Editor here</MASTER_PAD_EDITOR>
    <MASTER_PAD_INFO>information would go here </MASTER_PAD_INFO>
  </MASTER_PAD_VERSION_INFO>

    <MASTER_PAD_VERSION>1.0</MASTER_PAD_VERSION>
1.0

    <MASTER_PAD_EDITOR>Master Editor here</MASTER_PAD_EDITOR>
Master Editor here

    <MASTER_PAD_INFO>information would go here </MASTER_PAD_INFO>
information would go here

  <Company_Info>
    <Company_Name>Moyea Software Co., Ltd.</Company_Name>
    <Country>China</Country>
    <Company_WebSite_URL>http://www.whatever.com</Company_WebSite_URL>
    <Contact_Info>
      <Author_First_Name>Bob</Author_First_Name>
      <Author_Last_Name>King</Author_Last_Name>
      <Author_Email>product@xxxxxxxxx</Author_Email>
    </Contact_Info>
  </Company_Info>

    <Company_Name>Moyea Software Co., Ltd.</Company_Name>
Moyea Software Co., Ltd.

    <Country>China</Country>
China

    <Company_WebSite_URL>http://www.whatever.com</Company_WebSite_URL>
http://www.whatever.com

    <Contact_Info>
      <Author_First_Name>Bob</Author_First_Name>
      <Author_Last_Name>King</Author_Last_Name>
      <Author_Email>product@xxxxxxxxx</Author_Email>
    </Contact_Info>

      <Author_First_Name>Bob</Author_First_Name>
Bob

      <Author_Last_Name>King</Author_Last_Name>
King

      <Author_Email>product@xxxxxxxxx</Author_Email>
product@xxxxxxxxx



_______________________________________________
kc mailing list
kc@xxxxxx
http://mail.pm.org/mailman/listinfo/kc
<Prev in Thread] Current Thread [Next in Thread>