Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 301026

Summary: app-portage/gentoolkit-0.3.0_rc7 equery s counts hardlinks as separate files
Product: Gentoo Linux Reporter: A.C.Heron <acheron>
Component: Current packagesAssignee: Portage Tools Team <tools-portage>
Status: RESOLVED FIXED    
Severity: normal    
Priority: High    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on:    
Bug Blocks: 237964    

Description A.C.Heron 2010-01-14 20:45:13 UTC
When calculating the package size equery s from app-portage/gentoolkit-0.3.0_rc7 counts each file as many times as there are hardlinks to it.


Reproducible: Always

Steps to Reproduce:
equery s dev-util/git

Actual Results:  
416.10 MiB for dev-util/git-1.6.4.4


Expected Results:  
39.33 MiB

/usr/libexec/git-core/git has 97 hardlinks, which counts as additional
96*4115362 bytes or 376.77 MiB
Comment 1 Patrick Lauer gentoo-dev 2010-01-15 19:46:31 UTC
*** Bug 301022 has been marked as a duplicate of this bug. ***
Comment 2 Douglas Anderson 2010-01-16 21:25:31 UTC
I wasn't able to duplicate your results on the latest in-tree version (rc8-r1):
$ equery --version
equery (0.3.0_rc8) - Gentoo package query tool
$ equery s git
 * dev-util/git-1.6.4.4
         Total files : 660
         Total size  : 96.30 MiB

However, here's a patch to gentoolkit svn HEAD that should theoretically remove hardlinks from the list of files being "sized". It works by making the list of files unique by inode. It didn't produce any change for me. Further testing/consideration needed:

Index: gentoolkit/pym/gentoolkit/package.py
===================================================================
--- gentoolkit/pym/gentoolkit/package.py	(revision 147)
+++ gentoolkit/pym/gentoolkit/package.py	(working copy)
@@ -297,11 +297,15 @@
 		@return: (size, number of files in total, number of uncounted files)
 		"""
 
-		contents = self.parsed_contents()
+		seen = set()
+		content_stats = (os.lstat(x) for x in self.parsed_contents())
+		# Remove hardlinks by checking for duplicate inodes. Bug #301026.
+		unique_file_stats = (x for x in content_stats if x.st_ino not in seen
+			and not seen.add(x.st_ino))
 		size = n_uncounted = n_files = 0
-		for cfile in contents:
+		for st in unique_file_stats:
 			try:
-				size += os.lstat(cfile).st_size
+				size += st.st_size
 				n_files += 1
 			except OSError:
 				n_uncounted += 1
Comment 3 Douglas Anderson 2010-03-07 06:00:31 UTC
A modified version of this patch has been included in genscripts svn.
Comment 4 Paul Varner (RETIRED) gentoo-dev 2010-04-08 14:59:28 UTC
This is in gentoolkit-0.3.0_rc10 -- Please let me know if the problem is resolved.
Comment 5 A.C.Heron 2010-04-23 11:45:43 UTC
Tried app-portage/gentoolkit-0.3.0_rc10-r1.

For dev-vcs/git-1.6.4.4 I get 102.22 MiB. 

Counting all hardlinks should give 106.33 MiB for my installation, while omitting all hardlinks should give only 13.10 MiB. Looks like the new version omits at least some of hardlinks.
Comment 6 Douglas Anderson 2010-07-18 17:43:01 UTC
(In reply to comment #5) 
> Counting all hardlinks should give 106.33 MiB for my installation, while
> omitting all hardlinks should give only 13.10 MiB. Looks like the new version
> omits at least some of hardlinks.

You'll have to be more specific about how you're getting your numbers. qsize (from portage-utils) gives very similar numbers to equery size.
Comment 7 Douglas Anderson 2010-10-24 06:32:30 UTC
$ qsize -b git
dev-vcs/git-1.7.2.2: 687 files, 36 non-files, 129944873 bytes
$ equery -q s git
dev-vcs/git-1.7.2.2: total(607), inaccessible(0), size(9707633)

That's qsize from portage-utils-0.3.1 and equery from gentoolkit svn. qsize is showing about 120MiB and equery a little over 9MiB, which is consistent with your expectations when excluding hardlinks. 

Will close after gentoolkit 0.3.0 release.
Comment 8 Paul Varner (RETIRED) gentoo-dev 2010-11-22 20:48:04 UTC
Released in gentoolkit-0.3.0_rc11