Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 379425 - sys-apps/file-5.07-r3: big slow down on large ASCII file
Summary: sys-apps/file-5.07-r3: big slow down on large ASCII file
Status: RESOLVED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: Normal normal with 1 vote (vote)
Assignee: Gentoo's Team for Core System packages
URL: http://bugs.gw.com/view.php?id=164
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-08-16 15:51 UTC by Michael Kohn
Modified: 2011-12-31 22:48 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Kohn 2011-08-16 15:51:25 UTC
If I run "file" against a sample with many empty lines (>=100.000), it seems to hang. Depending on system, it runs for minutes to hours with 100% load.

The regular expression for detecting AWK scripts in file-5.07/magic/Magdir/commands seems to be the cause.

Without this regex, "file" works like expected:

--- file-5.07/magic/Magdir/commands     2011-05-02 14:36:41.000000000 +0200
+++ file-5.07/magic/Magdir/commands.patched     2011-08-16 17:37:12.327729653 +0200
@@ -48,8 +48,8 @@
 0      string/wt       #!\ /bin/awk            awk script text executable
 !:mime text/x-awk
 0      string/wt       #!\ /usr/bin/awk        awk script text executable
-!:mime text/x-awk
-0      regex           =^\\s*BEGIN\\s*[{]      awk script text
+# !:mime       text/x-awk
+# 0    regex           =^\\s*BEGIN\\s*[{]      awk script text

 # AT&T Bell Labs' Plan 9 shell
 0      string/wt       #!\ /bin/rc     Plan 9 rc shell script text executable


Test with 10.000 empty lines:

1. Create test file with 10.000 empty lines for quick test
for i in `seq 1 10000`; do echo >>/tmp/file_with_10000_empty_lines.txt ; done
2. Run file with original magic.mgc:
time file -m /usr/share/misc/magic.mgc /tmp/file_with_10000_empty_lines.txt

/tmp/file_with_10000_empty_lines.txt: ASCII text

real    0m2.243s
user    0m2.230s
sys     0m0.000s

3. Run with patched magic.mgc:

time file -m /usr/share/misc/magic.patched.mgc /tmp/file_with_10000_empty_lines.txt

/tmp/file_with_10000_empty_lines.txt: ASCII text

real    0m0.005s
user    0m0.000s
sys     0m0.000s
Comment 1 SpanKY gentoo-dev 2011-12-31 22:48:43 UTC
presumably you're running this on a server.  you can mitigate the issue by running file in a LC_ALL=C locale rather than something like en_US.UTF8.