Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 506170 - net-proxy/squid-3.4.4 - malloc(): smallbin double linked list corrupted in operator new (size=64) at ../../include/SquidNew.h:45
Summary: net-proxy/squid-3.4.4 - malloc(): smallbin double linked list corrupted in op...
Status: RESOLVED TEST-REQUEST
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Server (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: No maintainer - Look at https://wiki.gentoo.org/wiki/Project:Proxy_Maintainers if you want to take care of it
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-03-29 16:15 UTC by Martin von Gagern
Modified: 2019-02-24 10:47 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge --info (net-proxy:squid-3.4.4.emerge--info,7.30 KB, text/plain)
2014-03-29 16:15 UTC, Martin von Gagern
Details
Valgrind output 1 (valgrind1.txt,332.63 KB, text/plain)
2014-03-31 16:23 UTC, Martin von Gagern
Details
Fix SOME issues (gentoo506170a.patch,1014 bytes, patch)
2014-04-01 06:46 UTC, Martin von Gagern
Details | Diff
Valgrind output 2 (valgrind2.txt,365.91 KB, text/plain)
2014-04-01 09:11 UTC, Martin von Gagern
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin von Gagern 2014-03-29 16:15:45 UTC
Created attachment 373834 [details]
emerge --info

I recently had several cases of squid becoming unavailable while I was using it. Lacking time to investigate, I switched to a no-proxy configuration. But squid still keeps taking ages during shutdown. Something strange going on there.

Today I took some time to investigate and found the following:

# dmesg | grep squid
[ 6575.837258] squid[2781]: segfault at ba63f4d0 ip 00000000006b74d3 sp 00007fffc0baf2a0 error 4 in squid[400000+445000]
# ps -C squid -o pid,ppid,cmd
  PID  PPID CMD
 2779     1 /usr/sbin/squid -YC -f /etc/squid/squid.conf
24573  2779 (squid-1) -YC -f /etc/squid/squid.conf
# gdb -p 24573
[…]
(gdb) bt
#0  __lll_lock_wait_private ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1  0x0000003d0dc81421 in _L_lock_10433 () from /lib64/libc.so.6
#2  0x0000003d0dc7ecb5 in __GI___libc_malloc (bytes=56) at malloc.c:2852
#3  0x0000003d0d80c811 in _dl_map_object_deps (map=map@entry=0x7f6f8af9e9c0, 
    preloads=preloads@entry=0x0, npreloads=npreloads@entry=0, 
    trace_mode=trace_mode@entry=0, open_mode=open_mode@entry=-2147483648)
    at dl-deps.c:511
#4  0x0000003d0d812b2c in dl_open_worker (a=a@entry=0x7fffe1cbe628)
    at dl-open.c:261
#5  0x0000003d0d80e889 in _dl_catch_error
    (objname=objname@entry=0x7fffe1cbe618, 
    errstring=errstring@entry=0x7fffe1cbe620,
    mallocedp=mallocedp@entry=0x7fffe1cbe617, 
    operate=operate@entry=0x3d0d812a20 <dl_open_worker>,
    args=args@entry=0x7fffe1cbe628)
    at dl-error.c:177
#6  0x0000003d0d8124eb in _dl_open (file=0x3d0dd640a2 "libgcc_s.so.1", 
    mode=-2147483647, caller_dlopen=<optimized out>, nsid=-2, argc=4, 
    argv=0x7fffe1cbfb18, env=0x7fffe1cbfb40) at dl-open.c:650
#7  0x0000003d0dd1d4a2 in do_dlopen (ptr=ptr@entry=0x7fffe1cbe850)
    at dl-libc.c:87
#8  0x0000003d0d80e889 in _dl_catch_error (objname=0x7fffe1cbe830, 
    errstring=0x7fffe1cbe838, mallocedp=0x7fffe1cbe82f, 
    operate=0x3d0dd1d460 <do_dlopen>, args=0x7fffe1cbe850) at dl-error.c:177
#9  0x0000003d0dd1d56f in dlerror_run
    (operate=operate@entry=0x3d0dd1d460 <do_dlopen>, 
    args=args@entry=0x7fffe1cbe850) at dl-libc.c:46
#10 0x0000003d0dd1d5f1 in __GI___libc_dlopen_mode (
    name=name@entry=0x3d0dd640a2 "libgcc_s.so.1", mode=mode@entry=-2147483647)
    at dl-libc.c:163
#11 0x0000003d0dcf7bf5 in init () at ../sysdeps/x86_64/backtrace.c:52
#12 0x0000003d0e80dd93 in pthread_once ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_once.S:103
#13 0x0000003d0dcf7d1c in __GI___backtrace (array=<optimized out>, size=64)
    at ../sysdeps/x86_64/backtrace.c:103
#14 0x0000003d0dc1fc8b in backtrace_and_maps (do_abort=234497600,
    do_abort@entry=2, written=128, fd=2)
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:47
#15 0x0000003d0dc75952 in __libc_message (do_abort=do_abort@entry=2, 
    fmt=fmt@entry=0x3d0dd69600 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/posix/libc_fatal.c:172
#16 0x0000003d0dc7b69d in malloc_printerr (action=3, 
    str=0x3d0dd699c0 "malloc(): smallbin double linked list corrupted", 
    ptr=<optimized out>) at malloc.c:4916
#17 0x0000003d0dc7d96d in _int_malloc (av=0x3d0dfa2640 <main_arena>, bytes=64)
    at malloc.c:3310
#18 0x0000003d0dc7ecc3 in __GI___libc_malloc (bytes=64) at malloc.c:2855
#19 0x000000000076d97d in xmalloc (sz=64) at xalloc.cc:116
#20 0x00000000006b22d8 in operator new (size=64) at ../../include/SquidNew.h:45
#21 allocate (__n=<optimized out>, this=<optimized out>)
    at /usr/lib/gcc/x86_64-pc-linux-gnu/
    4.7.3/include/g++-v4/ext/new_allocator.h:94
#22 _M_allocate_map (__n=<optimized out>, this=<optimized out>)
    at /usr/lib/gcc/x86_64-pc-linux-gnu/
    4.7.3/include/g++-v4/bits/stl_deque.h:545
#23 std::_Deque_base<ACLChecklist::Breadcrumb,
    std::allocator<ACLChecklist::Breadcrumb> >::_M_initialize_map
    (this=this@entry=0x2ef4398, __num_elements=__num_elements@entry=0)
    at /usr/lib/gcc/x86_64-pc-linux-gnu/
    4.7.3/include/g++-v4/bits/stl_deque.h:590
#24 0x00000000006b1f0f in _Deque_base (__x=..., this=0x2ef4398)
    at /usr/lib/gcc/x86_64-pc-linux-gnu/
    4.7.3/include/g++-v4/bits/stl_deque.h:472
#25 deque (__x=std::deque with 0 elements, this=0x2ef4398)
    at /usr/lib/gcc/x86_64-pc-linux-gnu/
    4.7.3/include/g++-v4/bits/stl_deque.h:854
#26 stack (__c=std::deque with 0 elements, this=0x2ef4398)
    at /usr/lib/gcc/x86_64-pc-linux-gnu/
    4.7.3/include/g++-v4/bits/stl_stack.h:138
#27 ACLChecklist::ACLChecklist (this=0x2ef4338) at Checklist.cc:179
#28 0x000000000066f026 in ACLFilledChecklist::ACLFilledChecklist
    (this=0x2ef4338, A=
    0x2a63288, http_request=0x2fe5ad0, ident=0x0) at FilledChecklist.cc:150
#29 0x00000000005e7be0 in peerSelectFoo (ps=0x2ef40c8) at peer_select.cc:442
#30 0x00000000005e92be in peerSelect (paths=paths@entry=0x2fe6770, 
    request=<optimized out>, entry=<optimized out>, 
    callback=callback@entry=0x578160 <fwdPeerSelectionCompleteWrapper
    (Comm::ConnectionList*, ErrorState*, void*)>,
    callback_data=callback_data@entry=0x2fe6718)
    at peer_select.cc:176
#31 0x00000000005701b9 in FwdState::start (this=this@entry=0x2fe6718, aSelf=...)
    at FwdState.cc:166
#32 0x0000000000578434 in FwdState::Start (clientConn=..., entry=0x2efdc60, 
    request=0x2fe5ad0, al=...) at FwdState.cc:356
#33 0x00000000005787d6 in FwdState::fwdStart (clientConn=...,
    entry=<optimized out>, request=<optimized out>) at FwdState.cc:367
#34 0x000000000073342a in netdbExchangeStart (data=0x2a62528) at net_db.cc:1337
#35 0x00000000006b3819 in AsyncCall::make (this=this@entry=0x2f17060)
    at AsyncCall.cc:32
#36 0x00000000006b74c5 in AsyncCallQueue::fireNext (this=this@entry=0x2ee86b0)
    at AsyncCallQueue.cc:52
#37 0x00000000006b7808 in AsyncCallQueue::fire (this=0x2ee86b0)
    at AsyncCallQueue.cc:38
#38 0x0000000000550075 in EventLoop::dispatchCalls
    (this=this@entry=0x7fffe1cbf8e0) at EventLoop.cc:158
#39 0x0000000000550228 in EventLoop::runOnce (this=this@entry=0x7fffe1cbf8e0)
    at EventLoop.cc:123
#40 0x0000000000550408 in EventLoop::run (this=this@entry=0x7fffe1cbf8e0)
    at EventLoop.cc:99
#41 0x00000000005c4fc2 in SquidMain (argc=<optimized out>, argv=<optimized out>)
    at main.cc:1534
#42 0x00000000004c814b in SquidMainSafe (argv=<optimized out>,
    argc=<optimized out>) at main.cc:1262
#43 main (argc=<optimized out>, argv=<optimized out>) at main.cc:1254

So the way read it, in frame #17 something went wrong with memory allocation, and #16 tries to print the corresponding error message, namely
“malloc(): smallbin double linked list corrupted”.
But #14 decides to also print a backtrace for this, and for that it calls stuff which eventually results in some kind of deadlock. Perhaps because the thread tries to aquire a lock it is already holding in frame #18.

That other process, 2779, seems to be some kind of controller which spawns children to do the actual work:

# gdb -p 2779
[…]
(gdb) bt
#0  0x0000003d0e8101fa in __libc_waitpid (pid=-1, stat_loc=0x7fff4e952d3c,
    options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:31
#1  0x00000000005c56f9 in watch_child (argv=0x7fff4e953008) at main.cc:1774
#2  SquidMain (argc=<optimized out>, argv=0x7fff4e953008) at main.cc:1458
#3  0x00000000004c814b in SquidMainSafe (argv=<optimized out>,
    argc=<optimized out>) at main.cc:1262
#4  main (argc=<optimized out>, argv=<optimized out>) at main.cc:1254
[…]
# cat /run/squid.pid
24573

The libc error message seems to be not going anywhere, because stderr (suggested by that fd=2 in frame #14) is redirected to /dev/null:

# lsof -p 24573 | grep ' 2u'
squid   24573 squid    2u      CHR    1,3      0t0    1028 /dev/null

So the way I see it, there are three problems here:
1. The crash of 2781 for reasons unknown
2. The error detected in frame #17
3. The deadlock while trying to report that error

Not sure how to proceed from here. I guess I'll try a few things, like rebuilding squid, or trying to catch such a segfault in gdb, but as I'm unsure how to proceed, and as I might reboot my system in between, I'm posting the current state of affairs here.
Comment 1 Martin von Gagern 2014-03-29 18:17:52 UTC
Problem 3, the deadlock during error reporting, seems to me to be a glibc issue. Should I file a separate bug for this?
Comment 2 Martin von Gagern 2014-03-29 18:38:30 UTC
(In reply to Martin von Gagern from comment #0)
> So the way I see it, there are three problems here:
> 1. The crash of 2781 for reasons unknown
> 
> […] trying to catch such a segfault in gdb

Killed deadlocked squid, removed pid file, restarted squid, attached gdb, suifed for some time, and got this:

Program received signal SIGABRT, Aborted.
0x0000003d0dc35855 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x0000003d0dc35855 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x0000003d0dc36cc7 in __GI_abort () at abort.c:89
#2  0x0000003d0dc75957 in __libc_message (do_abort=do_abort@entry=2,
    fmt=fmt@entry=0x3d0dd69600 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/posix/libc_fatal.c:175
#3  0x0000003d0dc7b69d in malloc_printerr (action=3, str=0x3d0dd697c0
    "free(): invalid next size (fast)", ptr=<optimized out>) at malloc.c:4916
#4  0x0000003d0dc7c473 in _int_free (av=<optimized out>, p=0x1c87830,
    have_lock=0) at malloc.c:3772
#5  0x00000000005e0c78 in peerDigestRequest (pd=<optimized out>)
    at peer_digest.cc:402
#6  peerDigestCheck (data=<optimized out>) at peer_digest.cc:290
#7  0x00000000006b3819 in AsyncCall::make (this=this@entry=0x1e11220)
    at AsyncCall.cc:32
#8  0x00000000006b74c5 in AsyncCallQueue::fireNext (this=this@entry=0x21156b0)
    at AsyncCallQueue.cc:52
#9  0x00000000006b7808 in AsyncCallQueue::fire (this=0x21156b0)
    at AsyncCallQueue.cc:38
#10 0x0000000000550075 in EventLoop::dispatchCalls
    (this=this@entry=0x7fff9f516510) at EventLoop.cc:158
#11 0x0000000000550228 in EventLoop::runOnce (this=this@entry=0x7fff9f516510)
    at EventLoop.cc:123
#12 0x0000000000550408 in EventLoop::run (this=this@entry=0x7fff9f516510)
    at EventLoop.cc:99
#13 0x00000000005c4fc2 in SquidMain (argc=<optimized out>,
    argv=<optimized out>) at main.cc:1534
#14 0x00000000004c814b in SquidMainSafe (argv=<optimized out>,
    argc=<optimized out>) at main.cc:1262
#15 main (argc=<optimized out>, argv=<optimized out>) at main.cc:1254

So that, too, appears to be some corruption of internal malloc state.
Comment 3 Martin von Gagern 2014-03-29 20:23:13 UTC
Ran squid through valgrind:

# valgrind /usr/sbin/squid -N -YC -f /etc/squid/squid.conf
==26273== Memcheck, a memory error detector
==26273== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==26273== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==26273== Command: /usr/sbin/squid -N -YC -f /etc/squid/squid.conf
==26273== 
==26273== Conditional jump or move depends on uninitialised value(s)
==26273==    at 0x3D0D817C86: index (strchr.S:55)
==26273==    by 0x3D0D80763D: expand_dynamic_string_token (dl-load.c:425)
==26273==    by 0x3D0D807F4C: _dl_map_object (dl-load.c:2302)
==26273==    by 0x3D0D80145D: map_doit (rtld.c:626)
==26273==    by 0x3D0D80E888: _dl_catch_error (dl-error.c:177)
==26273==    by 0x3D0D800F5F: do_preload (rtld.c:815)
==26273==    by 0x3D0D8035B4: dl_main (rtld.c:1629)
==26273==    by 0x3D0D81539F: _dl_sysdep_start (dl-sysdep.c:249)
==26273==    by 0x3D0D804A33: _dl_start (rtld.c:331)
==26273==    by 0x3D0D801227: ??? (in /lib64/ld-2.18.so)
==26273==    by 0x4: ???
==26273==    by 0xFFEFFF34E: ???
==26273== 
==26273== Conditional jump or move depends on uninitialised value(s)
==26273==    at 0x3D0D817C8B: index (strchr.S:58)
==26273==    by 0x3D0D80763D: expand_dynamic_string_token (dl-load.c:425)
==26273==    by 0x3D0D807F4C: _dl_map_object (dl-load.c:2302)
==26273==    by 0x3D0D80145D: map_doit (rtld.c:626)
==26273==    by 0x3D0D80E888: _dl_catch_error (dl-error.c:177)
==26273==    by 0x3D0D800F5F: do_preload (rtld.c:815)
==26273==    by 0x3D0D8035B4: dl_main (rtld.c:1629)
==26273==    by 0x3D0D81539F: _dl_sysdep_start (dl-sysdep.c:249)
==26273==    by 0x3D0D804A33: _dl_start (rtld.c:331)
==26273==    by 0x3D0D801227: ??? (in /lib64/ld-2.18.so)
==26273==    by 0x4: ???
==26273==    by 0xFFEFFF34E: ???
==26273== 
==26273== Mismatched free() / delete / delete []
==26273==    at 0x4A07D8C: operator delete(void*) (vg_replace_malloc.c:502)
==26273==    by 0x3D0F49FADF: std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_ostringstream() (in /usr/lib64/gcc/x86_64-pc-linux-gnu/4.8.2/libstdc++.so.6.0.18)
==26273==    by 0x532157: Debug::finishDebug() (debug.cc:762)
==26273==    by 0x5DBB94: PconnModule::PconnModule() (pconn.cc:485)
==26273==    by 0x5DBC14: PconnModule::GetInstance() (pconn.cc:493)
==26273==    by 0x5DBCF7: PconnPool::PconnPool(char const*) (pconn.cc:387)
==26273==    by 0x4C747D: _GLOBAL__sub_I__ZN8FwdState15CBDATA_FwdStateE (FwdState.cc:98)
==26273==    by 0x76DDAC: __libc_csu_init (elf-init.c:88)
==26273==    by 0x3D0DC21A04: (below main) (libc-start.c:244)
==26273==  Address 0x4cbc860 is 0 bytes inside a block of size 352 alloc'd
==26273==    at 0x4A09670: malloc (vg_replace_malloc.c:291)
==26273==    by 0x76D97C: xmalloc (xalloc.cc:116)
==26273==    by 0x531CFF: Debug::getDebugOut() (SquidNew.h:45)
==26273==    by 0x5DBAE9: PconnModule::PconnModule() (pconn.cc:485)
==26273==    by 0x5DBC14: PconnModule::GetInstance() (pconn.cc:493)
==26273==    by 0x5DBCF7: PconnPool::PconnPool(char const*) (pconn.cc:387)
==26273==    by 0x4C747D: _GLOBAL__sub_I__ZN8FwdState15CBDATA_FwdStateE (FwdState.cc:98)
==26273==    by 0x76DDAC: __libc_csu_init (elf-init.c:88)
==26273==    by 0x3D0DC21A04: (below main) (libc-start.c:244)
==26273== 
==26273== Mismatched free() / delete / delete []
==26273==    at 0x4A07D8C: operator delete(void*) (vg_replace_malloc.c:502)
==26273==    by 0x3D0F49FADF: std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_ostringstream() (in /usr/lib64/gcc/x86_64-pc-linux-gnu/4.8.2/libstdc++.so.6.0.18)
==26273==    by 0x532157: Debug::finishDebug() (debug.cc:762)
==26273==    by 0x5C388B: mainSetCwd() (main.cc:988)
==26273==    by 0x5C490E: SquidMain(int, char**) (main.cc:996)
==26273==    by 0x4C814A: main (main.cc:1262)
==26273==  Address 0x4e69240 is 0 bytes inside a block of size 352 alloc'd
==26273==    at 0x4A09670: malloc (vg_replace_malloc.c:291)
==26273==    by 0x76D97C: xmalloc (xalloc.cc:116)
==26273==    by 0x531CFF: Debug::getDebugOut() (SquidNew.h:45)
==26273==    by 0x5C37BA: mainSetCwd() (main.cc:978)
==26273==    by 0x5C490E: SquidMain(int, char**) (main.cc:996)
==26273==    by 0x4C814A: main (main.cc:1262)

Followed by MANY more reports, all of them during startup. I'm not sure whether this mismatch is indeed a problem, or an artifact of how valgrind works. SquidNew.h seems to be crucial here. It seems that the operator new will be inlined, thus calling malloc, whereas the operator delete is not inlined so the valgrind version gets called.

Looking at a disassembly of debug.o during compilation, I can verify this distinction: Debug::getDebugOut directly calls xmalloc, whereas Debug::finishDebug seems to do some virtual function table magic instead. In any case, the destructor mentioned above is apparently not part of squid object code, but instead provided by libstdc++.

According to lsof, squid has /usr/lib64/gcc/x86_64-pc-linux-gnu/4.8.2/libstdc++.so.6.0.18 mapped and according to objdump that file provides symbols:

GLIBCXX_3.4 operator delete[](void*)
GLIBCXX_3.4 operator delete(void*, std::nothrow_t const&)
GLIBCXX_3.4 operator delete[](void*, std::nothrow_t const&)
GLIBCXX_3.4 operator delete(void*)
GLIBCXX_3.4 operator new[](unsigned long)
GLIBCXX_3.4 operator new(unsigned long)
GLIBCXX_3.4 operator new[](unsigned long, std::nothrow_t const&)
GLIBCXX_3.4 operator new(unsigned long, std::nothrow_t const&)

Looking closer at SquidNew.h, we can read:

/* Any code using libstdc++ must have externally resolvable overloads
 * for void * operator new - which means in the .o for the binary,
 * or in a shared library. static libs don't propogate the symbol
 * so, look in the translation unit containing main() in squid
 * for the extern version in squid
 */

But in main.cc I see no reference to this. And according to ldd, /usr/sbin/squid has these operators as undefined symbols. So I'd say that even without valgrind, squid will likely use its own operator new with malloc where things got inlined by the compiler, but the delete operator from libstdc++ since that apparently doesn't get inlined as easily.

If that interpretation is correct, it seems to me that by the time squid reaches operational state, its memory management might already be in a highly inconsistent state.

http://bazaar.launchpad.net/~squid/squid/3-trunk/revision/6378 tells me that the comment about SquidNew.h and why it should be needed dates back from 2003, and doesn't apply to clang. I wonder whether it applies to recent GCC versions. It doesn't sound particularly reasonable to me.

I guess I'll simply disable SquidNew.h in my next build, by adding a 0 to the #if preprocessor switch. Will see whether that helps.
Comment 4 Martin von Gagern 2014-03-30 18:48:15 UTC
Disabling SquidNew.h seems to work as intended so far. If you can confirm that this issue is not specific to Gentoo, i.e. not caused by any Gentoo patches or unconventional libstdc++ settings, I'll gladly report this upstream.
Comment 5 Martin von Gagern 2014-03-31 13:40:39 UTC
Again got a deadlocked process trying to report that smallbin error. So the problem is not resolved. But valgrind has a lot less to complain now that I disabled SquidNew.h, so it might be possible to track the actual problems using valgrind now. Will take some time, though.
Comment 6 Martin von Gagern 2014-03-31 16:23:05 UTC
Created attachment 373972 [details]
Valgrind output 1

Here is some valgrind output from the build with NewSquid.h disabled.

* epoll_ctl pointing to uninitialized byte(s).
* closing invalid file descriptors after fork, but this should be safe.
* various invalid reads and writes.

I guess I'll keep tweaking and patching squid till valgrind is satisfied, and see if that finally gets rid of my crashes as well.
Comment 7 Amos Jeffries 2014-03-31 21:01:03 UTC
This looks exactly like a user-after-free race condition in FwdState and cache_peer selection we have been tracking down upstream in 3.HEAD. I am a bit surprised to see it in 3.4 though.  Francesco Chemolli is working on a fix right now, I have notified him about this bug report.
Comment 8 Martin von Gagern 2014-04-01 06:46:52 UTC
Created attachment 374006 [details, diff]
Fix SOME issues

I just emerged squid with the attached patch applied, and furthermore with

EXTRA_ECONF="--with-valgrind-debug --disable-optimizations"
CFLAGS="-ggdb"
CXXFLAGS="-ggdb"

As a result, valgrind is almost silent during startup. And so far it is completely quiet after that. Might be too early to tell for sure, but I guess that this will avoid the issues reported earlier. So those are likely due to compiler optimization settings, in particular inlining of some things.

The change to the ~IcmpConfig destructor in the attached patch avoids a report by valgrind during shutdown. The string in question was created using xstrdup while installing a default configuration, so it shouldn't be deleted but freed instead.
Comment 9 Martin von Gagern 2014-04-01 09:11:06 UTC
Created attachment 374012 [details]
Valgrind output 2

OK, even with no optimization valgrind finds reason to complain. This trace should be more useful since it should report stack traces more accurately.
Comment 10 Amos Jeffries 2014-04-03 09:02:11 UTC
Please try with this patch (http://treenet.co.nz/projects/squid/patches/PSC_params_mk1.patch) it is based on 3.HEAD but should also apply to 3.4 easily.

Also, do you have Squid memmory pools feature enabled or disabled?
Comment 11 Martin von Gagern 2014-04-03 13:40:35 UTC
(In reply to Amos Jeffries from comment #10)
> Please try with this patch
> (http://treenet.co.nz/projects/squid/patches/PSC_params_mk1.patch) it is
> based on 3.HEAD but should also apply to 3.4 easily.

Applied cleanly with slight fuzz, compiling now but will test only later as I'm remote just now.

Note: all of this would be that much easier if the squid ebuild were to support epatch_user. Then I could simply drop my patches into that directory and call “emerge squid”. Now I have to do the whole “ebuild prepare” + “patch” + “ebuild merge” chain for every patch.

> Also, do you have Squid memmory pools feature enabled or disabled?

I have no memory_pools setting in my config, so the default of enabled with 5 MB should apply.
Comment 12 Martin von Gagern 2014-04-05 17:52:52 UTC
(In reply to Martin von Gagern from comment #11)
> (In reply to Amos Jeffries from comment #10)
> > Please try with this patch
> > (http://treenet.co.nz/projects/squid/patches/PSC_params_mk1.patch) it is
> > based on 3.HEAD but should also apply to 3.4 easily.
> 
> […] will test only later

Squid just crashed again, with that patch and the one from comment 8 applied.
"malloc(): smallbin double linked list corrupted" again.
Comment 13 Eray Aslan gentoo-dev 2014-04-09 09:23:59 UTC
(In reply to Martin von Gagern from comment #11)
> Note: all of this would be that much easier if the squid ebuild were to
> support epatch_user.

Done.
Comment 14 Amos Jeffries 2014-05-05 13:55:47 UTC
The upstream fix for this turned out to be http://www.squid-cache.org/Versions/v3/3.HEAD/changesets/squid-3-13340.patch
Comment 15 Martin von Gagern 2014-05-07 10:08:32 UTC
(In reply to Amos Jeffries from comment #14)
> The upstream fix for this turned out to be
> http://www.squid-cache.org/Versions/v3/3.HEAD/changesets/squid-3-13340.patch

Applied that in addition to the one from comment #8, and still got a hung process and the backtrace showing "malloc(): smallbin double linked list corrupted". I've disabled squid for my daily work now, since it is absolutely unusable in its current state. But I'm willing to test more patches or try new debugging techniques if you have anything like this for me.
Comment 16 Pacho Ramos gentoo-dev 2019-02-24 10:47:31 UTC
please retry with net-proxy/squid-3.5.28