Comments (50)
I think this is related to #39
from authsae.
I have not seen the messages:
'confirm did not verify!'
anywhere in the logs, so I'm not sure about this.
Reading back I suppose I should have attached a log from one of the nodes as well. Its attached now.
meshd-log.txt
I just noticed your commit, I will test again with the latest master tomorrow morning.
from authsae.
This appears to be a long-standing issue with ath9k. I've submitted this as a bug to ath9k mailing list last year ( https://www.mail-archive.com/ath9k-devel%40lists.ath9k.org/msg13595.html ) and there was an earlier report of very similar issues ( http://lists.shmoo.com/pipermail/hostap/2014-November/031377.html ). I was not able to get any traction. I've gone as far as reading the key back from the card registers and it matches what's expected. Our workaround was to have a unicast probe between the nodes that occurs right after rekey and, if it fails, rekey.
from authsae.
@alexgrin:
I'm inclined to believe you on that, looking at the debugging you did. Do you have a patch of that workaround somewhere?
@fhuberts:
I've tested with your patches, no changes as far as the bug is concerned.
from authsae.
On 23/06/16 18:10, Alexis Green wrote:
This appears to be a long-standing issue with ath9k. I've submitted this
as a bug to ath9k mailing list last year (
https://www.mail-archive.com/ath9k-devel%40lists.ath9k.org/msg13595.html
) and there was an earlier report of very similar issues (
http://lists.shmoo.com/pipermail/hostap/2014-November/031377.html ). I
was not able to get any traction. I've gone as far as reading the key
back from the card registers and it matches what's expected. Our
workaround was to have a unicast probe between the nodes that occurs
right after rekey and, if it fails, rekey.
Would you mind sharing the code of your workaround?
from authsae.
The code is pretty awful looking but I'll see if I can button it up this
weekend and push up to my repo.
On Fri, Jun 24, 2016 at 6:10 AM, Ferry Huberts [email protected]
wrote:
On 23/06/16 18:10, Alexis Green wrote:
This appears to be a long-standing issue with ath9k. I've submitted this
as a bug to ath9k mailing list last year (
https://www.mail-archive.com/ath9k-devel%40lists.ath9k.org/msg13595.html
) and there was an earlier report of very similar issues (
http://lists.shmoo.com/pipermail/hostap/2014-November/031377.html ). I
was not able to get any traction. I've gone as far as reading the key
back from the card registers and it matches what's expected. Our
workaround was to have a unicast probe between the nodes that occurs
right after rekey and, if it fails, rekey.Would you mind sharing the code of your workaround?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#42 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ABLZNlU0RFhCO7ZEYRMhnEpY7l_n6wjmks5qO9dDgaJpZM4I81o7
.
from authsae.
@alexgrin: I was looking at the workaround you described, I assume that you use the NL80211_CMD_PROBE_CLIENT netlink call for this? This would probably require a small patch in net/wireless/nl80211.c to make this work (this call is only allowed for AP and P2P interfaces, by default). Or do you probe differently?
from authsae.
Nope, it's nowhere near as awesome as you think. Authsae does a multicast ping (layer3) after rekey and waits to hear a unicast response from the a device with MAC address of the peer we just rekeyed with. If there's no response, rekey is triggered. You have to specify the interface for multicast to the daemon for this to work. It's a pretty nasty hackjob and I'll post the code as is (-ish) soon.
from authsae.
Here's the yuckyness - uniumwifi@a1591d3
from authsae.
thanks for sharing!
from authsae.
Thank you for the patch! I'll have a look at it.
Sorry for the radio silence, was testing a patch which may help find a solution.
It works by resetting the ath9k chip when a new key is installed. It does seem to cause a LOT of authsae renegotiation traffic.
The link does keep forwarding traffic.
Although it is a start, I would hardly call this a nice patch, its the equivalent of using buckshot to swat mosquitoes.
Would someone mind checking it out and maybe suggesting a better approach?
ath9k-install_key-buckshot.diff.txt
from authsae.
I've posted the issue on the ath9k-devel list as well, hopefully I can stir something up/get the help to address this.
https://lists.ath9k.org/pipermail/ath9k-devel/2016-July/014676.html
from authsae.
So far 0 response, tried to bump it one time with no effect.
For the forseeable future, I've chosen to use software encryption.
from authsae.
Please note that your maximum achievable throughput will degrade if using
software encryption.
On Wed, Jul 13, 2016 at 11:41 PM, MichelStam [email protected]
wrote:
So far 0 response, tried to bump it one time with no effect.
For the forseeable future, I've chosen to use software encryption.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#42 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ABBewvbaVf5ZQaXzL1faMdgbnG0gX9vuks5qVQdGgaJpZM4I81o7
.
from authsae.
Hello Chun-Yeow,
I agree, there is a measurable performance drop of about 2 Mbps. Luckily, for this particular application, high bandwidth is not the most important, but link stability is.
I see it as a temporary solution so the project can move forward at my employers' side. When a fix is available, we're definitely switching back to hw encryption.
On a side note, I seem to have gotten a little traction on the ath9k-devel list; Adrian Chadd has taken a look at the Atheros reference driver, which seems to have a fix called ATH_SUPPORT_KEYPLUMB_WAR. This reinserts the key when there's Rx decryption errors.
This fix may have to be ported into the ath9k driver.
Is it maybe an idea for those of you that have run into this issue as well at some point or the other to pitch in on the ath9k-devel list?
Cheers,
Michel
from authsae.
Can you point me to that thread? It's very relevant for our deployment as well
from authsae.
Of course,
https://lists.ath9k.org/mailman/listinfo/ath9k-devel - The list
https://lists.ath9k.org/pipermail/ath9k-devel/2016-July/014676.html - The thread
from authsae.
tnx
from authsae.
I recently got a mail from Sven Eckelman about a patch which may solve the situation:
https://lkml.kernel.org/r/[email protected]
I have not yet had time to take a look at this, so caveat emptor.
from authsae.
thanks.
will try to see if this works in our setup.
please let me know more if you have more information :-)
from authsae.
Is that patch being upstreamed?
from authsae.
No it is not. Sven add the following to the message:
The patch itself has (at least) one big problem. It is using some mac80211
internals in ath_key_config_iter to make sure that the uploaded keys were
actually programmed in the hardware. Without this check the keys could end up
in the lower slots and thus break all connections.So this patch could be a starting point for someone who wants to add a
workaround which is acceptable by upstream.
Here is the original email:
http://www.mail-archive.com/[email protected]/msg14458.html
from authsae.
I just contacted Antonio via email, offering help in upstreaming.
Are you willing to help here? Maybe Bob and Alex can participate as well?
from authsae.
Sure, I was planning to do this somewhere in the coming days. Maybe I can re-use some of the kludge I wrote up to get around the use of internals (unless I am doing that myself as well).
from authsae.
shall we continue via direct email? mailings (at) hupie (dot) com
from authsae.
After spending 2 weeks on this issue together with Ferry Huberts, we did not get much further.
We tried:
- Introducing a worker process that gets activated on a key_set or whenever a number of decrypt_error's occur. In the former case it does not seem to have any effect, and decrypt_errors do not seem to occur very often. Hence the decrypt_error does not increment enough. Setting the limit to 1 does not help, either
- Introducing the KEYPLUMB_WAR as described in [https://github.com/qca/qcamain_open_hal_public/blob/master/hal/ar9300/ar9300_keycache.c], see below. This too does not work, the write and subsequent check indicate that the data is written correctly at the first attempt. The extra xorKey argument introduced does not solve this issue other than adding an extra read/write cycle.
- We tried to silence the chip completely by disabling the interrupts, cancelling the tasks (and waiting for completion). This does not solve the issue other than adding instability
- Lastly we tried to schedule a worker that replumbs the key 2, 5 or 10 seconds after a set_key. This too, does not help.
It seems to me that the chip gets very confused when a key is installed while it is processing a lot of traffic. Quieting the chip does not seem to help, unless I did not get it quiet enough.
I have attached the various patches as an example of what was tried.
- Replumb on decrypt_error and key_set 0001-ath9k-Implement-key-cache-corruption-work-around.patch.txt
- Open HAL key replumb: 0001-Rework-of-the-key-plumbing-using-ath-hal-code.patch.txt
- Replumb x seconds after key_set 0001-ath9k-rekey-after-10-seconds.patch.txt
I lost the patch which quiets the chip prior to keying; accidental delete ....
description; in at9k_set_key (ath9k/main.c), just before the switch statement, add:
spin_lock_bh(&sc->sc_pcu_lock); ath9k_hw_disable_interrupts(ah); tasklet_disable(&sc->intr_tq); spin_unlock_bh(&sc->sc_pcu_lock);
Then, after the switch statement, add:
spin_lock_bh(&sc->sc_pcu_lock); tasklet_enable(&sc->intr_tq); ath9k_hw_enable_interrupts(ah); spin_unlock_bh(&sc->sc_pcu_lock);
Adrian Chadd has suggested in an email to try my original buckshot patch ath9k-install_key-buckshot.diff.txt, but this time reinstall keys after the reset. I will try to find some time and do this, see if it helps.
Cheers,
Michel
from authsae.
Yes, I have never seen hardware behave so baffling and I suspect we will not get any further unless we get more information on what is really going on. very very unfortunate.
from authsae.
Ok. So after a few valiant attempts, I got a little further, but still nowhere close to a working patch.
Resetting the chip, then replumbing the cache seems to be the way to go.
From a bug which is triggered every rekey, I'm now at a situation where the error usually every couple of minutes. The max I got was 500 seconds on every 60 seconds rekeying.
I'm getting the idea that the chip must be reset as quickly as possible after a key insertion, otherwise the chip gets confused.
Another thing which significantly delayed my progress is the sheer amount of locking in the ath9k driver. I needed to grab the rtnl_lock in order to access the key material in mac80211, but sc->mutex also seems to be required. Grabbing both is inviting all sorts of locking issues which usually result in the whole network stack hanging. As a quick hack, I stopped using sc->mutex and grabbed only rtnl_lock (not doing so will cause a BUG_ON every rekey).
Please take a close look at this patch. It is by no means complete or clean yet, so not ready for production.
0002-ath9k-reset-and-replumb-key.patch.txt
from authsae.
I do have a fix in authsae (rekeying) but need to test it further
from authsae.
My rekey code works well.
The issues we had with ath5k systems were related to ath5k barfing over 'htmode=HT20'.
I've re-opened my rekey PR (#55)
from authsae.
One question, why do you use authsae rather than wpa_supplicant, I'd think wpa_supplicant is much more widely used?
from authsae.
from authsae.
Personally, I had some issues with wpa_supplicant in combination with OpenWRT. Some race condition which prevented either the AP or mesh function from working. Did not have this problem when starting everything manually, just when using the OpenWRT configuration system. Since AuthSAE did work, and seemed more stable at the time I settled for that.
from authsae.
Thanks for your answering.
I was having race condition with dnsmasq (for Ethernet). Not sure if below link is of any help.
https://dev.openwrt.org/ticket/7423
from authsae.
Any final solutions on this? I have exactly same issue in ath9k. Thanks!
from authsae.
from authsae.
... I think I'm finally hitting it in ath9k at my current employer. Let's see how far down the rabbit hole I go.
from authsae.
from authsae.
from authsae.
interesting. if I replumb the keys on the receiver side then it doesn't fix things. That's ... odd.
from authsae.
(so I wonder if there are two bugs here..)
from authsae.
Per my understanding:
- It is not specific to Authsae. It is a common issue.
- As someone mentioned it is an ath9k chip h/w bug, I have read this : "https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0ahUKEwia6e2Q95LcAhVBGDQIHdWnDEMQFggoMAA&url=https%3A%2F%2Fmentor.ieee.org%2F802.11%2Fdcn%2F10%2F11-10-0018-00-000m-4-way-handshake-synchronization-issue.ppt&usg=AOvVaw2lqr2wOTS6MzRh8cob18jy"
Looks like they're kind of related.
from authsae.
Oh I know it's an ath9k chip bug. :-) That's what I have on my desk atm
from authsae.
Oh I know it's an ath9k chip bug. :-) That's what I have on my desk atm
from authsae.
I mean, the link shows a key installation deficiency in the 4-way handshake mechanism. Following that idea to solve that key re-installation problem may help to solve this "ath9k chip bug" too.
from authsae.
my rekeying patch (which was merged in #59 ) does that, and works well, except when the chip is under heavy load.
from authsae.
Yeah, the "heavy load" issue is our issue here. I'll keep digging and see what I can find.
Is your patch focused on rekeying the transmit side key, or the remote/receiver key? (yes the CCMP keys are used for both TX/RX, I'm more interested in which side is doing the replumbing of keys to the HW.)
from authsae.
ok, yeah. I'm seeing three separate bugs:
- Somehow mac80211's PN tracking gets messed up on the receive side and it gets set to 1, which means all subsequent frames get rejected as CCMP replay until it reaches the old sequence couner;
- The receiver sometimes needs replumbing; looks about 10% of the time;
- The /transmitter/ sometimes needs replumbing, about 50% of the time.
I'm doing a UDP iperf of a few tens of mbit from an ath9k AP -> ath9k STA (both Peacock / AR9580) to reproduce this. In all cases the right keys make it through to the keycache code. When it breaks then at least whenever I've caught it the STA can still send frames to the AP which the AP can decrypt, but the frames from the AP can't be decrypted by the STA.
I wonder if the sender side hardware bug will suck less if we just completely pause transmit before doing a rekey (because that's what the rekeying patch seems to detect). The RX side rekey thing that QCA does is a different beast and I think fixes another bit of the problem - I'm not actively TXing (besides ACKs, obviously) from the STA -> AP during the failure mode, so if there's a hardware bug it's likely due to having a packet in receive flight during rekeying.
(Maybe I can experiment with pausing the MAC TX/RX whilst replumbing the key, which would have the added benefit of not ACKing anything during that window..)
from authsae.
Ok, so bug 1 here was fixed by just disabling PTK rekey. It turns out the data/control path for transmit and mac80211 key management is not really setup for doing seamless PTK rekey at least on the transmit side, and you can't guarantee to not drop frames on the receive side either. So, I'm not going to do it.
Which means the second and third don't happen.
Now, if someone has some spare time (and maybe me too) I think we should experiment with trialling draining the ath9k station queue so we aren't transmitting anything that can use that keycache entry before we plumb in the new key. It's tricky because mac80211/ath9k aren't setup for that. But I /think/ that'll work around the TX keycache bug.
The RX keycache bug shouldn't be triggered if you're not actively receiving packets to decrypt whilst you're changing the key - which you can guarantee if your AP is not doing stupid crap (ie, has this bug fixed) but you can't otherwise; that particular one will benefit from the keycache plumb hack from QCA. However, and here's the rub - it doesn't seem to work reliably if you're constantly hitting it with a stream of packets. It really needs to sneak in when no active RX is being done for that keycache entry.
from authsae.
I've found the same issue with rekeying a PTK under load and are currently trying to upstream a fix for that. In fact there are different ways how normal kernels can mess up the PN, but I think we found them all now.
I've not yet tested it with mesh networks but I'm quite confident the current version of the patch will fix the issue for ath9k mesh also. And I would like to get more feedback on the patch, so if you want to test the current version here it is:
https://patchwork.kernel.org/project/linux-wireless/list/?series=8045&state=*
You will get warnings that the userspace (wpa_supplicant) is requesting rekeys while it should not which will need patches to either ath9k or wpa_supplicant which simply are not available, yet.
But the "fallback" path printing this warning is working quite nicely with an ath9k AP and I expect it to do the same for ath9k in a mesh.
When testing the patch I suggest you also make sure you have this fix applied:
https://patchwork.kernel.org/patch/10399613/
While I believe ath9k will not trigger this race the obvious symptom are pretty much the same so better make sure it's fixed also.
Edit: updated link
from authsae.
Related Issues (20)
- nlerror 19 & 18 on peer nodes when new node joins network or refresh key HOT 27
- Errors: - confirm did not verify! HOT 30
- meshd-nl80211 memory leak on refresh key HOT 22
- Update FreeBSD code to support 10.2 (or 11-HEAD) HOT 3
- "correct token received" logged as errors, should be a debug level HOT 1
- Unexpected error -22 (expected -17), nlerror, cmd 0, seq 1461249716: Invalid argument HOT 5
- Again, confirm did not verify! HOT 1
- VHT80 HOT 9
- no rssi_threshold config with authsae HOT 2
- MGTK/IGTK changes do not work HOT 19
- Licence file HOT 1
- encrypted mgmt frames don't seem to work on ath5k HOT 3
- meshd-nl80211 5GHz not working HOT 6
- Krack attack HOT 4
- Little helps in building?? HOT 10
- Seems README is obsolete.
- VHT Not working properly need assistance. HOT 4
- Is authsae still actively developed?
- Error linking: In `peer_lists.h`: "multiple definition of `peers'" and "multiple definition of `blacklists'". HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from authsae.