In an attempt to fix some race condition in bringing up bonding interface when slave interfaces are not yet ready, I came up with the following patch:
=================================================================
--- bonding.orig 2012-07-27 15:36:00.000000000 +0400
+++ bonding 2012-08-21 11:29:01.906575329 +0400
@@ -23,8 +23,14 @@
# Add slaves
for S in $SLAVES; do
report_debug "Attempting to add slave $S"
- ifenslave -d $INTERFACE $S &>/dev/null
ip link set $S up || return 1
+ let timeout=10
+ while ethtool $S|grep -q "Speed: Unknown"; do
+ [[ $timeout -le 0 ]] && break
+ sleep .2
+ let timeout--
+ done
+ ifenslave -d $INTERFACE $S &>/dev/null
C=$(ifenslave $INTERFACE $S)
if [[ $C -eq 0 ]]; then
report_debug "Interface $S enslaved to $INTERFACE properly"
=============================================================================
Could you please consider including it (or some variant) in the next version?
Without it, usually at least one interface (or even both in my setup) doesn't work as slave interfaces.
And I get the following in syslog at system startup:
bonding: bond0: link status definitely up for interface eth1, 4294967295 Mbps full duplex.
bonding: bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond
Search Criteria
Package Details: netcfg-bonding 1.5.1-1
Package Actions
| Package Base: | netcfg-bonding |
|---|---|
| Description: | Support for network interface bonding for netcfg2 |
| Upstream URL: | https://aur.archlinux.org/packages.php?ID=25771 |
| Category: | network |
| Licenses: | |
| Submitter: | Schnouki |
| Maintainer: | xyem |
| Last Packager: | None |
| Votes: | 8 |
| First Submitted: | 2009-04-21 00:23 |
| Last Updated: | 2012-07-02 17:58 |
Latest Comments
Anonymous comment
Comment by mihanson
@xyem: Sorry, I just got around to checking on my request. I re-packaged and added the SLAVE_TIMEOUT to /etc/network.d/bonded. After 5 reboots and a couple cold boots, all seems well.
Comment by xyem
Sorry for the delay with this change, got busy and then it slipped my mind.
1.5 introduces a SLAVE_TIMEOUT variable so you can change how long the profile will wait for slaves to come up in 0.5 second increments. The example uses 5 as was hardcoded pre-1.5 and it should default to 5 if you have not specified it.
@mihanson: Could you let me know if this works for you?
Comment by xyem
At the moment, bonding will wait a maximum of 2.5 seconds (0.5 seconds sleep * 5 checks). Rather than subject everyone to a longer sleep, it would make more sense to increase the number of checks (line 40):
- if [[ $timeout -eq 5 ]]; then
+ if [[ $timeout -eq 10 ]]; then
This will still double the total time given for it to be ready, but not increase the minimum waiting time.
I think this would be better off as a configuration option (SLAVE_TIMEOUT?) where it can be defaulted to 5, but easily changed for situations like yours. I'll update with this change shortly.
Comment by mihanson
I would like to request/recommend the sleep time in line 38 of bonding be increased to 1.0 seconds. I have 2 different hardware setups that are slow to initialize. More often than not, the bond fails because the slaves are not "ready." Increasing this sleep time solves the problem for me. I rebooted 10-15 times on each setup and the bond has not failed since increasing this sleep.
Comment by devoncrouse
Thanks for the ultra-fast response. Testing now.
Comment by xyem
netcfg 2.8.2 removed the symlink 'ethernet-iproute' which netcfg-bonding was still using.
'ethernet' and 'ethernet-iproute' became the same in netcfg 2.5.2 so I've bumped up the minimum netcfg dependency to that version.
1.4
= Change Log =
* Changed to use 'ethernet' script instead of the symlink 'ethernet-iproute'
Comment by devoncrouse
Here's some version info:
Linux hostname 3.3.6-1-ARCH #1 SMP PREEMPT Sun May 13 10:52:32 CEST 2012 x86_64 GNU/Linux
core/netcfg 2.8.3-1
core/ifenslave 1.1.0-7
netcfg-bonding 1.3-1
Comment by devoncrouse
With a recent update (of netcfg?) I'm no longer able to start net-profiles. I can manually set the address and netmask for the bonded interface after the failed attempt and it works fine, but net-profiles isn't doing it anymore. Let me know if I can provide more information.
(1:544)# rc.d restart net-profiles
> Using /etc/rc.conf for netcfg settings is deprecated.
Use /etc/conf.d/netcfg instead.
:: wan down [DONE]
:: lan up [BUSY]
> Slave eth1 is down and timeout reached
> Slave eth2 is down and timeout reached
> No slaves up, aborting
[FAIL]
:: wan up [DONE]
(ifconfig now shows the slaves ARE UP...)
Here's some relevant configuration:
/etc/rc.conf:
...
NETWORKS=(lan)
interface=bond0
...
/etc/network.d/lan
CONNECTION="bonding"
INTERFACE="bond0"
SLAVES="eth1 eth2"
IP='static'
ADDR='10.10.10.1'
GATEWAY='10.10.10.1'
DNS='10.10.10.1'
Comment by xyem
1.3
= Change Log =
* Applied patch contributed by aslag
** Fixed error in bonding_up which prevented it making use of eligible slaves
** Additional outputs for debug/logging
** Replaces instances of non-portable grep invocations
Comment by falconindy
When is this going to be submitted as a patch to netcfg?
Comment by xyem
1.2-1 is a quick release to get the package usable again. Thanks to Schnouki for creating it and allowing me to maintain it.
= Change Log =
* Changed 'err_append' calls to 'report_fail'
* Changed 'arch' to 'any'
* Fixed exit code checking on link status check which was causing UP slaves to be considered DOWN
* Added INSTALL and CHANGES
Comment by xyem
The script included in this (/usr/lib/network/connections/bonding) does not fully work.
Issues found:
* "err_append" function does not exist
* Does not correctly identify "up" interfaces (i.e. it always fails)
I had to replace "err_append" with "report_fail" and stop the script "return 1"'ing when no slaves were up. It functions correctly after these changes (but still complains no slaves are "up").
I'm willing to take over maintaining this if you are otherwise unable to, where I will fix these two issues properly.
Thanks