Perl Read File Into Memory for Parsing

Question

Accepted Answer

The problem is this:

                    my @assortment=(<FILE>); ## slurp a whole Terabyte into RAM

Do not read a file like this, unless it is very small or your memory is huge, instead use:

                    while (<FILE>) { # read each line, then forget well-nigh it     chomp;     split /t/;     .... }

In addition your code is extremely inefficient:

You are iterating over all entries of a huge array for each line of a huge file. Hashes in Perl provide constant fourth dimension access to elements given the hash key. Apply a hash of hashes to shop your filter table.

You seem to filter a large file by a pocket-size file, therefore:

procedure the minor file showtime (using while construct), parse the pocket-sized file into a hash using the location as cardinal of a nested hash
process the large file as above and await up each of the entries in the hash created before

As a result you will only need as much retentivity equally is required to shop the pocket-size file.

Accepted Answer

Consider the following refactoring:

                    apply strict; use warnings; apply autodie;  my ( @assortment, %hash );  open my $FILE, "<", "dbSNP_in.vcf";  while (<$FILE>) {     $hash{"@array[0, 1, 3, 4]"} = $array[2] if @array = dissever /\t/, $_, 6; }  close $FILE;  open my $OUT,   ">", "trial_output.vcf"; open my $FILE1, "<", "trial_rs.vcf";  while (<$FILE1>) {     my @columns = split /\t/;      $columns[two] = $hash{"@columns[0, i, three, 4]"}       if exists $hash{"@columns[0, 1, three, four]"};      impress $OUT bring together "\t", @columns; }  close $OUT; close $FILE1;

autodie is used to trap all I/O errors.
chomp is non needed within either loop.
splitting only the corporeality needed is faster than splitting all.
An interpolated array slice (@array[0, 1, three, 4], which produces a string comprised of elements 0, i, three, 4) is used as a hash key with a corresponding value of assortment element ii--for apply subsequently.
An interpolated array slice, as a hash fundamental, is tested for existence, and $columns[2] is set up to that cardinal'south corresponding value if that key exists. Doing this, instead of using iv eq statements, should speed upward the file processing.

Promise this helps!

Perl Read File Into Memory for Parsing

0 Response to "Perl Read File Into Memory for Parsing"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel