perl.beginners
[Top] [All Lists]

Re: diff says memory exhausted need help with perl

Subject: Re: diff says memory exhausted need help with perl
From: John W. Krahn
Date: Fri, 28 Mar 2008 17:03:54 -0700
Newsgroups: perl.beginners


tc314@xxxxxxxxxxx wrote:
I've got two similar large files with one word per line and they're
sorted.
Each file has a few words not in the other.
I typically identify the unique words in the file using diff,grep,cut.
When the files are too big (2Gig) diff dies with "memory exhausted".

I want to search for the unique words in file1 but I might need to
ping-pong since neither file is a superset of the other.
I don't want to be limited by physical RAM as the file sizes exceed
RAM.

I assume I'm not the first to have this problem.
Can someone point me to perl code?

This appears to do what you require:

#!/usr/bin/perl
use warnings;
use strict;


my ( $file1, $file2 ) = ( 'file1', 'file2' );

open my $F1, '<', $file1 or die "Cannot open '$file1' $!";
open my $F2, '<', $file2 or die "Cannot open '$file2' $!";


my ( $first, $second ) = ( '', '' );

do  {
    if ( $first eq $second ) {
        $first  = <$F1> || '~'; # because ~ is the last ASCII character
        $second = <$F2> || '~';
        }
    elsif ( $first lt $second ) {
        print "$file1: $first";
        $first  = <$F1> || '~';
        }
    elsif ( $first gt $second ) {
        print "$file2: $second";
        $second = <$F2> || '~';
        }
    } until eof $F1 and eof $F2;

__END__



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order.                            -- Larry Wall

<Prev in Thread] Current Thread [Next in Thread>