GithubHelp home page GithubHelp logo

Comments (50)

fhuberts avatar fhuberts commented on July 19, 2024

I think this is related to #39

from authsae.

MichelStam avatar MichelStam commented on July 19, 2024

I have not seen the messages:
'confirm did not verify!'
anywhere in the logs, so I'm not sure about this.

Reading back I suppose I should have attached a log from one of the nodes as well. Its attached now.
meshd-log.txt

I just noticed your commit, I will test again with the latest master tomorrow morning.

from authsae.

alexgrin avatar alexgrin commented on July 19, 2024

This appears to be a long-standing issue with ath9k. I've submitted this as a bug to ath9k mailing list last year ( https://www.mail-archive.com/ath9k-devel%40lists.ath9k.org/msg13595.html ) and there was an earlier report of very similar issues ( http://lists.shmoo.com/pipermail/hostap/2014-November/031377.html ). I was not able to get any traction. I've gone as far as reading the key back from the card registers and it matches what's expected. Our workaround was to have a unicast probe between the nodes that occurs right after rekey and, if it fails, rekey.

from authsae.

MichelStam avatar MichelStam commented on July 19, 2024

@alexgrin:
I'm inclined to believe you on that, looking at the debugging you did. Do you have a patch of that workaround somewhere?

@fhuberts:
I've tested with your patches, no changes as far as the bug is concerned.

from authsae.

fhuberts avatar fhuberts commented on July 19, 2024

On 23/06/16 18:10, Alexis Green wrote:

This appears to be a long-standing issue with ath9k. I've submitted this
as a bug to ath9k mailing list last year (
https://www.mail-archive.com/ath9k-devel%40lists.ath9k.org/msg13595.html
) and there was an earlier report of very similar issues (
http://lists.shmoo.com/pipermail/hostap/2014-November/031377.html ). I
was not able to get any traction. I've gone as far as reading the key
back from the card registers and it matches what's expected. Our
workaround was to have a unicast probe between the nodes that occurs
right after rekey and, if it fails, rekey.

Would you mind sharing the code of your workaround?

from authsae.

alexgrin avatar alexgrin commented on July 19, 2024

The code is pretty awful looking but I'll see if I can button it up this
weekend and push up to my repo.

On Fri, Jun 24, 2016 at 6:10 AM, Ferry Huberts [email protected]
wrote:

On 23/06/16 18:10, Alexis Green wrote:

This appears to be a long-standing issue with ath9k. I've submitted this
as a bug to ath9k mailing list last year (
https://www.mail-archive.com/ath9k-devel%40lists.ath9k.org/msg13595.html
) and there was an earlier report of very similar issues (
http://lists.shmoo.com/pipermail/hostap/2014-November/031377.html ). I
was not able to get any traction. I've gone as far as reading the key
back from the card registers and it matches what's expected. Our
workaround was to have a unicast probe between the nodes that occurs
right after rekey and, if it fails, rekey.

Would you mind sharing the code of your workaround?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#42 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ABLZNlU0RFhCO7ZEYRMhnEpY7l_n6wjmks5qO9dDgaJpZM4I81o7
.

from authsae.

MichelStam avatar MichelStam commented on July 19, 2024

@alexgrin: I was looking at the workaround you described, I assume that you use the NL80211_CMD_PROBE_CLIENT netlink call for this? This would probably require a small patch in net/wireless/nl80211.c to make this work (this call is only allowed for AP and P2P interfaces, by default). Or do you probe differently?

from authsae.

alexgrin avatar alexgrin commented on July 19, 2024

Nope, it's nowhere near as awesome as you think. Authsae does a multicast ping (layer3) after rekey and waits to hear a unicast response from the a device with MAC address of the peer we just rekeyed with. If there's no response, rekey is triggered. You have to specify the interface for multicast to the daemon for this to work. It's a pretty nasty hackjob and I'll post the code as is (-ish) soon.

from authsae.

alexgrin avatar alexgrin commented on July 19, 2024

Here's the yuckyness - uniumwifi@a1591d3

from authsae.

fhuberts avatar fhuberts commented on July 19, 2024

thanks for sharing!

from authsae.

MichelStam avatar MichelStam commented on July 19, 2024

Thank you for the patch! I'll have a look at it.

Sorry for the radio silence, was testing a patch which may help find a solution.
It works by resetting the ath9k chip when a new key is installed. It does seem to cause a LOT of authsae renegotiation traffic.
The link does keep forwarding traffic.
Although it is a start, I would hardly call this a nice patch, its the equivalent of using buckshot to swat mosquitoes.

Would someone mind checking it out and maybe suggesting a better approach?

ath9k-install_key-buckshot.diff.txt

from authsae.

MichelStam avatar MichelStam commented on July 19, 2024

I've posted the issue on the ath9k-devel list as well, hopefully I can stir something up/get the help to address this.

https://lists.ath9k.org/pipermail/ath9k-devel/2016-July/014676.html

from authsae.

MichelStam avatar MichelStam commented on July 19, 2024

So far 0 response, tried to bump it one time with no effect.

For the forseeable future, I've chosen to use software encryption.

from authsae.

chunyeow avatar chunyeow commented on July 19, 2024

Please note that your maximum achievable throughput will degrade if using
software encryption.

On Wed, Jul 13, 2016 at 11:41 PM, MichelStam [email protected]
wrote:

So far 0 response, tried to bump it one time with no effect.

For the forseeable future, I've chosen to use software encryption.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#42 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ABBewvbaVf5ZQaXzL1faMdgbnG0gX9vuks5qVQdGgaJpZM4I81o7
.

from authsae.

MichelStam avatar MichelStam commented on July 19, 2024

Hello Chun-Yeow,

I agree, there is a measurable performance drop of about 2 Mbps. Luckily, for this particular application, high bandwidth is not the most important, but link stability is.
I see it as a temporary solution so the project can move forward at my employers' side. When a fix is available, we're definitely switching back to hw encryption.

On a side note, I seem to have gotten a little traction on the ath9k-devel list; Adrian Chadd has taken a look at the Atheros reference driver, which seems to have a fix called ATH_SUPPORT_KEYPLUMB_WAR. This reinserts the key when there's Rx decryption errors.
This fix may have to be ported into the ath9k driver.

Is it maybe an idea for those of you that have run into this issue as well at some point or the other to pitch in on the ath9k-devel list?

Cheers,

Michel

from authsae.

fhuberts avatar fhuberts commented on July 19, 2024

Can you point me to that thread? It's very relevant for our deployment as well

from authsae.

MichelStam avatar MichelStam commented on July 19, 2024

Of course,
https://lists.ath9k.org/mailman/listinfo/ath9k-devel - The list
https://lists.ath9k.org/pipermail/ath9k-devel/2016-July/014676.html - The thread

from authsae.

fhuberts avatar fhuberts commented on July 19, 2024

tnx

from authsae.

MichelStam avatar MichelStam commented on July 19, 2024

I recently got a mail from Sven Eckelman about a patch which may solve the situation:
https://lkml.kernel.org/r/[email protected]

I have not yet had time to take a look at this, so caveat emptor.

from authsae.

fhuberts avatar fhuberts commented on July 19, 2024

thanks.
will try to see if this works in our setup.
please let me know more if you have more information :-)

from authsae.

fhuberts avatar fhuberts commented on July 19, 2024

Is that patch being upstreamed?

from authsae.

MichelStam avatar MichelStam commented on July 19, 2024

No it is not. Sven add the following to the message:

The patch itself has (at least) one big problem. It is using some mac80211
internals in ath_key_config_iter to make sure that the uploaded keys were
actually programmed in the hardware. Without this check the keys could end up
in the lower slots and thus break all connections.

So this patch could be a starting point for someone who wants to add a
workaround which is acceptable by upstream.

Here is the original email:
http://www.mail-archive.com/[email protected]/msg14458.html

from authsae.

fhuberts avatar fhuberts commented on July 19, 2024

I just contacted Antonio via email, offering help in upstreaming.
Are you willing to help here? Maybe Bob and Alex can participate as well?

from authsae.

MichelStam avatar MichelStam commented on July 19, 2024

Sure, I was planning to do this somewhere in the coming days. Maybe I can re-use some of the kludge I wrote up to get around the use of internals (unless I am doing that myself as well).

from authsae.

fhuberts avatar fhuberts commented on July 19, 2024

shall we continue via direct email? mailings (at) hupie (dot) com

from authsae.

MichelStam avatar MichelStam commented on July 19, 2024

After spending 2 weeks on this issue together with Ferry Huberts, we did not get much further.

We tried:

  • Introducing a worker process that gets activated on a key_set or whenever a number of decrypt_error's occur. In the former case it does not seem to have any effect, and decrypt_errors do not seem to occur very often. Hence the decrypt_error does not increment enough. Setting the limit to 1 does not help, either
  • Introducing the KEYPLUMB_WAR as described in [https://github.com/qca/qcamain_open_hal_public/blob/master/hal/ar9300/ar9300_keycache.c], see below. This too does not work, the write and subsequent check indicate that the data is written correctly at the first attempt. The extra xorKey argument introduced does not solve this issue other than adding an extra read/write cycle.
  • We tried to silence the chip completely by disabling the interrupts, cancelling the tasks (and waiting for completion). This does not solve the issue other than adding instability
  • Lastly we tried to schedule a worker that replumbs the key 2, 5 or 10 seconds after a set_key. This too, does not help.

It seems to me that the chip gets very confused when a key is installed while it is processing a lot of traffic. Quieting the chip does not seem to help, unless I did not get it quiet enough.

I have attached the various patches as an example of what was tried.

I lost the patch which quiets the chip prior to keying; accidental delete ....
description; in at9k_set_key (ath9k/main.c), just before the switch statement, add:
spin_lock_bh(&sc->sc_pcu_lock); ath9k_hw_disable_interrupts(ah); tasklet_disable(&sc->intr_tq); spin_unlock_bh(&sc->sc_pcu_lock);

Then, after the switch statement, add:
spin_lock_bh(&sc->sc_pcu_lock); tasklet_enable(&sc->intr_tq); ath9k_hw_enable_interrupts(ah); spin_unlock_bh(&sc->sc_pcu_lock);

Adrian Chadd has suggested in an email to try my original buckshot patch ath9k-install_key-buckshot.diff.txt, but this time reinstall keys after the reset. I will try to find some time and do this, see if it helps.

Cheers,

Michel

from authsae.

fhuberts avatar fhuberts commented on July 19, 2024

Yes, I have never seen hardware behave so baffling and I suspect we will not get any further unless we get more information on what is really going on. very very unfortunate.

from authsae.

MichelStam avatar MichelStam commented on July 19, 2024

Ok. So after a few valiant attempts, I got a little further, but still nowhere close to a working patch.
Resetting the chip, then replumbing the cache seems to be the way to go.

From a bug which is triggered every rekey, I'm now at a situation where the error usually every couple of minutes. The max I got was 500 seconds on every 60 seconds rekeying.
I'm getting the idea that the chip must be reset as quickly as possible after a key insertion, otherwise the chip gets confused.

Another thing which significantly delayed my progress is the sheer amount of locking in the ath9k driver. I needed to grab the rtnl_lock in order to access the key material in mac80211, but sc->mutex also seems to be required. Grabbing both is inviting all sorts of locking issues which usually result in the whole network stack hanging. As a quick hack, I stopped using sc->mutex and grabbed only rtnl_lock (not doing so will cause a BUG_ON every rekey).

Please take a close look at this patch. It is by no means complete or clean yet, so not ready for production.
0002-ath9k-reset-and-replumb-key.patch.txt

from authsae.

fhuberts avatar fhuberts commented on July 19, 2024

I do have a fix in authsae (rekeying) but need to test it further

from authsae.

fhuberts avatar fhuberts commented on July 19, 2024

My rekey code works well.
The issues we had with ath5k systems were related to ath5k barfing over 'htmode=HT20'.
I've re-opened my rekey PR (#55)

from authsae.

xbing6 avatar xbing6 commented on July 19, 2024

One question, why do you use authsae rather than wpa_supplicant, I'd think wpa_supplicant is much more widely used?

from authsae.

bcopeland avatar bcopeland commented on July 19, 2024

from authsae.

MichelStam avatar MichelStam commented on July 19, 2024

Personally, I had some issues with wpa_supplicant in combination with OpenWRT. Some race condition which prevented either the AP or mesh function from working. Did not have this problem when starting everything manually, just when using the OpenWRT configuration system. Since AuthSAE did work, and seemed more stable at the time I settled for that.

from authsae.

xbing6 avatar xbing6 commented on July 19, 2024

Thanks for your answering.

I was having race condition with dnsmasq (for Ethernet). Not sure if below link is of any help.
https://dev.openwrt.org/ticket/7423

from authsae.

zhejunli avatar zhejunli commented on July 19, 2024

Any final solutions on this? I have exactly same issue in ath9k. Thanks!

from authsae.

MichelStam avatar MichelStam commented on July 19, 2024

from authsae.

erikarn avatar erikarn commented on July 19, 2024

... I think I'm finally hitting it in ath9k at my current employer. Let's see how far down the rabbit hole I go.

from authsae.

bcopeland avatar bcopeland commented on July 19, 2024

from authsae.

erikarn avatar erikarn commented on July 19, 2024

from authsae.

erikarn avatar erikarn commented on July 19, 2024

interesting. if I replumb the keys on the receiver side then it doesn't fix things. That's ... odd.

from authsae.

erikarn avatar erikarn commented on July 19, 2024

(so I wonder if there are two bugs here..)

from authsae.

zhejunli avatar zhejunli commented on July 19, 2024

Per my understanding:

  1. It is not specific to Authsae. It is a common issue.
  2. As someone mentioned it is an ath9k chip h/w bug, I have read this : "https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0ahUKEwia6e2Q95LcAhVBGDQIHdWnDEMQFggoMAA&url=https%3A%2F%2Fmentor.ieee.org%2F802.11%2Fdcn%2F10%2F11-10-0018-00-000m-4-way-handshake-synchronization-issue.ppt&usg=AOvVaw2lqr2wOTS6MzRh8cob18jy"
    Looks like they're kind of related.

from authsae.

erikarn avatar erikarn commented on July 19, 2024

Oh I know it's an ath9k chip bug. :-) That's what I have on my desk atm

from authsae.

erikarn avatar erikarn commented on July 19, 2024

Oh I know it's an ath9k chip bug. :-) That's what I have on my desk atm

from authsae.

zhejunli avatar zhejunli commented on July 19, 2024

I mean, the link shows a key installation deficiency in the 4-way handshake mechanism. Following that idea to solve that key re-installation problem may help to solve this "ath9k chip bug" too.

from authsae.

fhuberts avatar fhuberts commented on July 19, 2024

my rekeying patch (which was merged in #59 ) does that, and works well, except when the chip is under heavy load.

from authsae.

erikarn avatar erikarn commented on July 19, 2024

Yeah, the "heavy load" issue is our issue here. I'll keep digging and see what I can find.

Is your patch focused on rekeying the transmit side key, or the remote/receiver key? (yes the CCMP keys are used for both TX/RX, I'm more interested in which side is doing the replumbing of keys to the HW.)

from authsae.

erikarn avatar erikarn commented on July 19, 2024

ok, yeah. I'm seeing three separate bugs:

  • Somehow mac80211's PN tracking gets messed up on the receive side and it gets set to 1, which means all subsequent frames get rejected as CCMP replay until it reaches the old sequence couner;
  • The receiver sometimes needs replumbing; looks about 10% of the time;
  • The /transmitter/ sometimes needs replumbing, about 50% of the time.

I'm doing a UDP iperf of a few tens of mbit from an ath9k AP -> ath9k STA (both Peacock / AR9580) to reproduce this. In all cases the right keys make it through to the keycache code. When it breaks then at least whenever I've caught it the STA can still send frames to the AP which the AP can decrypt, but the frames from the AP can't be decrypted by the STA.

I wonder if the sender side hardware bug will suck less if we just completely pause transmit before doing a rekey (because that's what the rekeying patch seems to detect). The RX side rekey thing that QCA does is a different beast and I think fixes another bit of the problem - I'm not actively TXing (besides ACKs, obviously) from the STA -> AP during the failure mode, so if there's a hardware bug it's likely due to having a packet in receive flight during rekeying.

(Maybe I can experiment with pausing the MAC TX/RX whilst replumbing the key, which would have the added benefit of not ACKing anything during that window..)

from authsae.

erikarn avatar erikarn commented on July 19, 2024

Ok, so bug 1 here was fixed by just disabling PTK rekey. It turns out the data/control path for transmit and mac80211 key management is not really setup for doing seamless PTK rekey at least on the transmit side, and you can't guarantee to not drop frames on the receive side either. So, I'm not going to do it.

Which means the second and third don't happen.

Now, if someone has some spare time (and maybe me too) I think we should experiment with trialling draining the ath9k station queue so we aren't transmitting anything that can use that keycache entry before we plumb in the new key. It's tricky because mac80211/ath9k aren't setup for that. But I /think/ that'll work around the TX keycache bug.

The RX keycache bug shouldn't be triggered if you're not actively receiving packets to decrypt whilst you're changing the key - which you can guarantee if your AP is not doing stupid crap (ie, has this bug fixed) but you can't otherwise; that particular one will benefit from the keycache plumb hack from QCA. However, and here's the rub - it doesn't seem to work reliably if you're constantly hitting it with a stream of packets. It really needs to sneak in when no active RX is being done for that keycache entry.

from authsae.

alexw65500 avatar alexw65500 commented on July 19, 2024

I've found the same issue with rekeying a PTK under load and are currently trying to upstream a fix for that. In fact there are different ways how normal kernels can mess up the PN, but I think we found them all now.
I've not yet tested it with mesh networks but I'm quite confident the current version of the patch will fix the issue for ath9k mesh also. And I would like to get more feedback on the patch, so if you want to test the current version here it is:
https://patchwork.kernel.org/project/linux-wireless/list/?series=8045&state=*

You will get warnings that the userspace (wpa_supplicant) is requesting rekeys while it should not which will need patches to either ath9k or wpa_supplicant which simply are not available, yet.
But the "fallback" path printing this warning is working quite nicely with an ath9k AP and I expect it to do the same for ath9k in a mesh.

When testing the patch I suggest you also make sure you have this fix applied:
https://patchwork.kernel.org/patch/10399613/
While I believe ath9k will not trigger this race the obvious symptom are pretty much the same so better make sure it's fixed also.

Edit: updated link

from authsae.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.