when starting up drbd on boot, drbd will hang and is unable to sync its discs, because it needs the network up and running. hence "net" should be a NEED dependency instead of a USE dependency so that the network gets started before drbd. if drbd can't sync its data, data loss is to be expected. Reproducible: Always Steps to Reproduce: 1. emerge drbd and configure it 2. add it to runlevel default, rc-update add drbd default 3. reboot server 4. drbd will hang if there is no network connection and ask the user to force primary state (which will cause data loss if the other server of the cluster has been primary) Actual Results: drbd hangs and waits for the user to force primary state (which can cause data loss if the other server was primary and hence has the recent data) Expected Results: the network should be started before drbd so drbd can find the other server/node and sync with it.
Thank you for reporting this "issue". Dataloss: should never be caused as your Cluster Manager (heartbeat or something else) should be able to detect such condition and together with drbd resolve it. Remember: you should have at least 2 (different) connections between nodes to allow the CM to detect such condition. NEED dependency: drbd is often used in multi-link setups with one or more cluster internal links and one more links to the outside world. In such setup the interfaces to the outside world could have the NEED flag and the cluster internal links a USE flag as the internal link or one of the 2 nodes is expected to fail (a failure of the external link in this case should cause a node failure [which currently doesnt happen] so no drbd starts) for example: NEED net.eth0 USE net.eth1 So drbd can start even if net.eth1 fails. The node can startup even without the link (which might be a failed WAN-Link to the other node). A lot of other configurations are possible depending on individual requirements. If you specify only NEED net the node will fail to startup if one of the links or the other node failed. That does not provide high availablity. In other words: IMHO with the current gentoo startup-script environment it is not possible to specify a setup which works in almost all cases, the individual requirements differ a lot. The Sysadmin is responsible for configuring (editing the drbd startup script to match the individual requirements) and testing. Please check your configuration and change it to meet your node-failure requirements. You can specify a timeout so no user-interaction is required, thus drbd starts up, USEing the net. As soon as the net is up and the CM is there synchronization will proceed. No need to fix anything. Feel free to take over maintainership of this ebuild.
so i think you solved this problem?
@michael yes, sorry for not answering, i didn't have internet access the last days. you can close this "bug". @jan thank you for your fast and very detailed answer!! when implementing drbd for our little (student-)project we didn't think of such setups as you described. as you mentioned, we just edited the init-script for ourselves. sorry for the inconvenience! JG
Thanks for your response. So I'm closing this bug now.