166545 – baselayout serialization

Bug 166545 - baselayout serialization

Summary: baselayout serialization

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	[OLD] baselayout (show other bugs)
Hardware:	All Linux

Importance:	High normal (vote)
Assignee:	Gentoo's Team for Core System packages

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2007-02-12 19:05 UTC by Daniel Drake (RETIRED)
Modified:	2007-05-03 11:33 UTC (History)
CC List:	0 users

See Also:
Package list:
Runtime testing required:	---

Attachments
initial implementation (locking.patch,38.32 KB, patch) 2007-02-12 19:06 UTC, Daniel Drake (RETIRED)	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Daniel Drake (RETIRED) gentoo-dev

2007-02-12 19:05:47 UTC

There are issues with different processes accessing the same baselayout data at the same time. See bug #154670 for an example. Although 1.13 includes some workarounds, while trying to restart net.ath0 in a loop, something usually breaks after less than 15 restarts.

I was losing confidence in 1.13 stability (well, I know it's still alpha) so I decided to try and apply a locking strategy to 1.12 to solve this.

My locking implementation is simple and uses fifo's. It adds a lock dir at /var/lib/init.d/locks

My basic strategy is to provide per-init-script locks and use access to ${svcdir}/{starting,started,inactive,stopping,scheduled,...} as a basis for data protection.

This resulted in me ensuring that the callers of the following functions hold the lock for the init script they are dealing with:

test_service_state
service_coldplugged
service_starting
service_started
service_inactive
service_wasinactive
service_stopping
service_stopped
service_scheduled_by
service_scheduled
service_started_daemon
service_failed
mark_service_coldplugged
mark_service_starting
mark_service_started
mark_service_inactive
mark_service_stopping
mark_service_stopped
mark_service_failed
dep_stop
start_service
stop_service
update_service_status
svc_quit
restart
svc_schedule_start
svc_stop
svc_start
svc_restart
svc_status

One challenge with the above is that some internal code invokes baselayout again. For example start_service (which requires caller to hold the lock) calls /etc/init.d/service start, which attempts to the lock --> DEADLOCK

So, I moved much of runscript.sh into a new file (runscript-shared.sh) which is shared between rc-services.sh and runscript.sh

In addition this patch has the following disadvantages:
 - parallel startup is probably broken
 - hacks are needed for locking with svc_start_scheduled
 - there are probably still race windows in other places

However it seems to be working nicely with this configuration.

Comment 1 Daniel Drake (RETIRED) gentoo-dev

2007-02-12 19:06:49 UTC

Created attachment 109978 [details, diff]
initial implementation

Comment 2 Roy Marples (RETIRED) gentoo-dev

2007-05-03 11:33:02 UTC

This fixed in baselayout-2