faucetsdn / chewie Goto Github PK

A python 802.1x daemon

License: Apache License 2.0

Python 95.68% Shell 2.74% Makefile 1.58%

chewie's Introduction

Chewie

Chewie - EAPOL / 802.1x

Chewie is an EAPOL/802.1x implementation in Python. It is designed to work on it's own but primarily as a module for The Faucet Project which is an open-source SDN controller implementation in Python.

Supported Features:

PEAP
MD5-SUM
TLS
TTLS

Configuration

Setting up credentials with Chewie can be set on the Radius server, if using the default configuration this can be found in the etc/freeradius/users file.

The default credentials for the username and password are user and microphone respectively. Example authentication certificates for TLS / TTLS / PEAP have been provided in the etc folder.

NOTE: These are self-signed certificates

Getting Started:

Getting started with Chewie is as easy as starting a docker-compose network. This has been described below. If you would like to learn about the requirements for running Chewie, all of the dependencies for Chewie have been defined in the Dockerfile.chewie file, with the pip dependencies defined in the requirements.txt and test-requirements.txt files respectively.

Docker / Docker-Compose:

Setup

If needed, installation instructions for Docker and Docker-Compose can be found on the official Docker website or by following the links provided.

Starting a Docker-Compose

To run Chewie in the Docker-Compose Environment:

docker-compose up --build

To Stop and Clean Up the Docker Environment

docker-compose down

Questions / Bugs

If there are any questions or bugs found please report them to the Chewie project via the issue link. This can be found at https://github.com/faucetsdn/chewie/issues

chewie's People

Contributors

Stargazers

Watchers

Forkers

bairdo byllyfish gizmoguy michaelwasher anurag6 naknakshota ericliujian anarkiwi c65sdn cglewis afwu aytacce hujunfeng-gitbub flogriesser madbananauniondao

chewie's Issues

Output Graphs for the State Machines

Would be handy to have diagrams in the docs, build from the state machines using pytransitions GraphMachine functionality.

https://github.com/pytransitions/transitions

RADIUS integration

Chewie currently has hardcoded credentials.

Let's add the ability to do a RADIUS lookup at auth time

Criteria

Unit tests where possible
An integration test harness (currently there's deploy_test.sh to fire up chewie, deploy_wpasupplicant.sh to deploy a client, we should add a deploy_radius.sh to deploy a radius container that we can test against)
A test case (not necessarily automated) that shows wpasupplicant authing against chewie and succeeding/failing based on a radius lookup

RADIUS Proxy

I've had a look into how we could use Chewie and Faucet with a wireless access point running WPA2 Enterprise.

A TP-Link Archer C7 running OpenWRT, OVS, and hostapd was used, see the 'Radius from Access Point' below for how it was configured.

What/Why

Forward RADIUS Packets that come from another Authenticator to the Authentication Server (RADIUS Server).
Possible way to have Faucet/Chewie control a Wireless Access Point - Apply ACLs on the Access Point,
A naive approach would be to just blindly forward on what is received, however:
- this would not work when proxying for multiple authenticators (the radius.ID is going to be different). - this would not be an issue if there was a different socket pair for each authenticator.
- Can't have separate RADIUS Secrets between Authenticator and proxy (Chewie), and proxy (Chewie) and RADIUS Server.
- Blindly forwarding allows an attacker to impersonate either the RADIUS Server or the authenticator,
  and the proxy would pass on any mess it received.

How

Basic process for proxying is:

From Authenticator to RADIUS Server:

RADIUS Request (RReq) received from Authenticator.
Validate RReq.message-authenticator.
Read any RADIUS Attributes e.g. Username, Calling Station (MAC Address), Called Station (SSID/Port).
Replace RReq.id and RReq.request-authenticator.
Recalculate RReq.message-authenticator.
Sends RReq to RADIUS Server.

From RADIUS Server to Authenticator:

RADIUS Response (RRes) received from RADIUS Server.
Validate RRes.response-authenticator, and RRes.message-authenticator.
Read any RADIUS Attributes e.g. Filter ID, Username, Calling Station.
Check the packet Code (Success/Reject).
Replace RRes.id with the original RReq.id that this RRes is response to.
Recalculate RRes.response-authenticator and RRes.message-authenticator.
Send RRes to Authenticator.

What needs to be done

Add 4 new sockets.
RADIUS Accounting needs minimal support so we can be notified of log offs.
2 (Auth & Accounting) for interface to the authenticators.
2 (Auth & Accounting) for the RADIUS Server - keeping this separate from the one forwarding the normal mode's EAP RADIUS packets will make things simpler and hopefully easier to maintain (different lifecycles).

Add tracking of original request.id, request.request-authenticator, and which authenticator it came from to be used on the reply.
And original request.id to new (forwarded) request.id.

Configuration options.

Appendix/General Notes

EAP from Access Point

Using a TPLink Wireless Router running OpenWRT with OVS, I was unable to get Hostapd to pass/take EAP packets that were generated by Chewie, and pass them onto the supplicant.
Appears that Hostapd would intercept them and not forward from Chewie to the supplicant,
or not forward them as the are not successfully connected.

RADIUS from Access Point

A patch exists to allow hostapd to connect to an OVS bridge - https://forum.archive.openwrt.org/viewtopic.php?id=59129,
~~but I've so far been unable to compile it with the openwrt toolchain.~~ I've now got this patch working except there is a problem getting the name of the ovs bridge to add the interface to from /etc/config/wireless and putting it in the hostapd config is not working (see next comment).

~~Configured a TPLink Wireless Router running OpenWRT with OVS so that RADIUS Packets that Hostapd sends are put onto the dataplane, along with the users dataplane traffic.~~
~~Hostapd attaches the uplink to a Linux Bridge, which in turn is connected to the OVS bridge via a veth pair.~~
~~This leaves hostapd in control of the Wireless interface and can use WPA(2) (Enterprise), unencrypted traffic is then put on the dataplane via the Linux Bridge.~~
~~Two hosts on the same SSID should not be able to ping each other unless Faucet allows it (see AP Isolate below).~~

~~Note:~~
~~Not sure if/how this will work with another OpenFlow AP (e.g. An Allied Telesis AP), as I don't have access to one yet.~~

                                         +-------+
                                         |ETH0   |
                                +------+ |Control|
                                | ETH1 | |Plane  |
+-------------------------------------------------+
|                               +------+ +-------+|
|                                   |       .     |
|                                   |       .     |
|                                   |       .     |
|       +-----------+          +----+-----+ .     |
|       | Linux     |veth1     |          | .     |
|       | Bridge    +----------+   OVS    +^.     |
|       |           |     veth2|          |       |
|       +-----+-----+          +----------+       |
|             |                                   |
|             |                                   |
|       +-----+-----+                             |
|       |           |                             |
|       | Hostapd   |                             |
|       |           |                             |
|       +----+------+                             |
|            |                TP-Link Archer C7   |
|       +-------+                                 |
+-------------------------------------------------+
        | WLAN0 |
        +---+---+
            |
            |
            |
        +---+----+
        |Client  |
        |Android |
        +--------+

AP Isolate

~~Hostapd needs to be set to AP Isolate mode for Faucet to perform ACLs (we want all traffic to go via OVS).~~

However despite being allowed to (minimal config with no ACLs), and with ap_isolate and hairpin enabled, the two hosts on the same SSID cannot ping each other. But can ping the outside world.
(I vaguely remember a while ago (> 1 year) that they should be able to, Mohammed also says he used this successfully.)
~~With hairpin on, hosts fail to obtain an IP address as well.~~

There is an arp response sent by host B, but A never receives the response.
It is being dropped after Faucet forwards it back toward A - (veth1 & linux bridge show the packet the second time. wlan0 does not)

~~With ap_isolate disabled, looks like the bridging is being done by the bridge br-wifi.~~

Multi SSID/Radios?

~~Create another linux bridge like above and connect it to ovs on a different port.~~
~~RADIUS Packets all come from the first SSID's bridge, not the one associated with the SSID the client is on.~~

OpenWRT Packages used (list may be incomplete)

bridge
hostapd
veth
openvswitch

Logoff

RADIUS Accounting sends a 'session stop' request that contains when the user disconnects.
There are various reasons in 'Acct-Terminate-Cause',
(Idle-Timeout (occurred when I moved away), User-Request (when i turned off wifi on phone), ...)

Implement MAB

Implement MAB when EaPOL times out. Needed for IoT devices.

Add Checks and logging for Chewie Socket Setup

Change the focus of EAP state machine

There are a couple of improvements we could make here:

Build the test suite around outcomes (unit test should not care what internal states get transitioned, especially with UCTs)
Move to a state machine framework (pytramsitions, pysm etc) so we don't need to think about testing the state machine itself
Rename the variables to match the convention rather than the exact variable names in the RFC
Consider moving away from the RFC state machine in places where we still get the same results (hence starting with the test suite) but have cleaner ways of writing this - for example, the eapFail/eapSuccess/eapTimeout flags are superfluous - we can infer them from the state

In some cases; states are replayed when they receive unexpected events in EAP State Machine

chewie/chewie/state_machines/eap_state_machine.py

Lines 885 to 890 in 0703e71

 if isinstance(message, EapolStartMessage) or \ 

 (self.state in (FullEAPStateMachine.TIMEOUT_FAILURE, 

 FullEAPStateMachine.TIMEOUT_FAILURE2) and 

 isinstance(message, EapMessage) and message.code == Eap.RESPONSE 

 ): 

 self.eap_restart = True

Improve internal message routing

There's a lot of the following pattern:

if isinstance(x, Class1):
  do_something(x)
elif isinstance(x, Class2):
  do_something_else(x)

This started as a quick way to get a prototype up and running but we need to find a better pattern for this. Python isn't the nicest language for this, but even moving to a dictionary of {class: method} would be an improvement

chewie.py is not really tested here.

#39 adds some tests, but more would be good.

look into mocking the socket operations.

Set up Codecov

Set up code coverage reports in Codecov like in Faucet.
@gizmoguy

Docker infrastructure is broken and packaging has some issues

I would make a separate pull request (dealing with at least the docker stuff) but I would like some clarifications first if possible. I believe right now chewie won't run from the current setup because either the packaging or something to do with the sys path has some issues.

If you clone the repo and try to run main.py, it will throw this error:

Doing docker compose --build or creating the image using the below command seen in the documentation results with the same problem.

docker build -t chewie_image -f Dockerfile.chewie .

Now to fix this temporarily on my machine, I've added the below line into my main.py file.

sys.path.insert(0, "/home/snakamura/code/chewie")

Once I do this I can run it natively fine from only that directory.

Clearly this is a terrible temp fix so it would be really great if I could have some guidance on fixing this issue. Thank you!

Coding standards

We're getting to the point where we need to put some sort of standards together to help code reviewers and protect the codebase so it's easily extensible.

Open for ideas on this, some general thoughts to get it going:

Use tests as your rule of thumb - if it's hard to test then the interfaces probably need a rethink
Test your use cases and name accordingly - you want to test individual paths and edge cases so focus the tests on object_gets_this_and_does_that rather than test_method_name
Aim for 90% test coverage - the goal of testing is to check your code against your specification, so that needs to be the focus. There will be times where a method or a use case is hard to test, and it's much better to have a caveat and an untested method than a bad test that merely replicates the code.
Don't mock the system under test - chances are this should be extracted into its own object
Don't test private methods - if other objects aren't calling this method then you shouldn't be either.
Don't DRY too early - extending an existing class can often be useful, but it also commits you to an abstraction that might not make sense. 2 copies of code is fine, 3 is a little iffy, and 4 is the point where you should be thinking about extracting a common class and extending from there.
No more than 5 lines of code per method (a little insane, I expect this to be broken fairly regularly but it should always be with good reason - check out https://robots.thoughtbot.com/sandi-metz-rules-for-developers for some more thought on this)
Names have meaning - no single-letter variables, no truncating words unnecessarily (recv, msg etc don't save that much typing)

Renew/Expire authentications

After X time log supplicant off (notify logoff handler).

Send Identity Request on port up.

PR #40 adds the interface for port status changes from Faucet.

It also sends Identity Requests for supplicants that already have a state machine on the affected port.
However if there is a supplicant without a sm on the port they will not process the request (its not addressed to them).

MAB does not support multiple attempts on the same MAC address.

MAB State machine needs to have restart enabled on access denied and on receiving unexpected events.

Current MAB does not support multiple attempts on the same MAC address.

Timer scheduler isn't working.

Used for re transmitting potentially lost packets.

Chewie tests claim its working ([test] not failing).

problem with Eventlet?

Add Integration Tests for Chewie

RADIUS Attribute 'State' should be cleared on new authentications

Basically, if the server sends the State attribute we need to return it with the next Access-Request unmodified, unless it is a new authentication attempt.

https://tools.ietf.org/html/rfc2865#section-5.24

https://tools.ietf.org/html/rfc5080#section-2.1.1

I think this will become a problem when an authentication is interrupted (no Access-Accept/Reject received)

When pip3 installing requirements for Faucet, chewie fails because it can't find the LICENSE file

  Copying chewie.egg-info to build/bdist.linux-x86_64/wheel/chewie-0.0.12-py3.5.egg-info
  running install_scripts
  error: [Errno 2] No such file or directory: 'LICENSE'
  
  ----------------------------------------
  Failed building wheel for chewie

Could verify that all EAP packets in a sequence come from the same port.

If we have 2 hosts with same MAC (e.g. 1 good, 1 Malicious) and they try to authenticate, it could be possible to authenticate on the malicious port when good is successful if malicious sends eap packet before the radius access-accept has been received, and thus setting the sm.port_id_mac.

Basically we want to stop this. 2 obvious ways.

Make a state machine tied to a mac and port. So in the above case there would be 2 state machines.
some sort of verification logic.

Option 1 looks easier to implement.
Will also handle if 2 good, both would auth.

De-authenticate successful auths after period

period since last successful auth.

NFV mode

Currently set up for the multicast group, but needs to be slightly different to do NFV

Make socket promiscuous instead of on the multicast group (easy)
Remember the dest_mac we receive and return packets to that dest_mac instead of the multicast group

Build docs on readthedocs

support for switches with OF port numbers > 255

A vendor reports that they use a chassis slot number based scheme to assign OF port numbers, so they have a switch with OF port numbers > 255. It would be good to support 2 bytes worth of OF port.

Replace build_byte_string() with bytes.fromhex()

Does anyone have any objection to me submitting a PR that replaces netils.build_byte_string with bytes.fromhex()?

There's no need to support Python 2.x, and bytes.fromhex() will do the right thing.

netils.build_byte_string doesn't deal with certain edge cases correctly...

>>> import netils
>>> netils.build_byte_string("123")
b'\x12'
>>> netils.build_byte_string("12 3")
b'\x12\x03'

Thanks.

Exception handling with MessageParser

There are times we raise exceptions during packet parsing which should result in the processed packet being dropped, and Chewie can safely continue processing other packets.

Exceptions (and example of when raised) include:

KeyError: unknown EAP type.
UnicodeDecodeError: Illegal bytes in the EAP.identity field.
ValueError: unknown EAP code.
struct.error: the payload doesn't contain enough bytes to unpack.

Should all of these be caught by the caller, or a new exception created (e.g. InvalidPacketException)?
A new exception can still contain the original reason.

Problem if we catch them all, the same exception could be raised somewhere else, that we might not expect (e.g. KeyError & ValueError are reasonably common errors for other parts of python to raise with) and it will basically be swallowed.

Does anyone have any thoughts on this?

Add travis for automatic pypi building

Port status handler

Have faucet pass port status changes (on dot1x ports) onto chewie

Add gitlab ci for automatic deb package building

Statemachine logging isn't working with Faucet

Investigate. It is working when running standalone.

Repackage with PBR

Repackage chewie with https://docs.openstack.org/pbr/

Concat datatype incorrectly assumes data is an eap message

The Concat RADIUS datatype assumes the data is going to be an EAP message. This isn't necessarily correct.

Preemptive Identity Requests are send when a port is flagged as a MAB port

Shard Tests for Travis

Tests are now taking upwards of 10 minutes due to the integration tests.

Follow @Faucet as an example for travis and split tests into smaller components.

Architect-out the circular dependencies in Chewie

Architect-out the circular dependencies in Chewie
https://github.com/faucetsdn/chewie/blob/master/chewie/radius_attributes.py#L303-L321

usage and basic instructions

I'd like to get started on trying this out when it's ready. Are there some basic instructions for using this?

Add Pylintrc to Ignore Generated Libraries (such as Eventlet.green)

https://stackoverflow.com/questions/20553551/how-do-i-get-pylint-to-recognize-numpy-members

Fuzz parsers

AFL fuzz the EAP & RADIUS parsers

Currently working on this

If Message-Authenticator doesn't exist we should still check the request_authenticator

https://github.com/faucetsdn/chewie/blob/master/chewie/radius.py#L151

correct logic should be:

if message_authenticator:
    validate message_authenticator
if request_authenticator:
    validate request_authenticator

In other words, if Message-Authenticator doesn't exist we still check the request_authenticator.

Add a facade to abstract away calls to eventlet

Carrying on from the discussion in #70:

Changing put to put_nowait sounds like the correct thing to do in any case so I'd be happy to take that in. There's also an argument for making Chewie (and Beka, given the main message pumps are almost identical) be passed some sort of facade that means we can swap in a greenlet/coroutine system at runtime and nobody is any the wiser.

I'm in the middle of trying to decouple some of the radius code and for part of that I think I'm going to make a ChewieFactory, and that will probably inject an EAP socket, Radius socket, Radius lifecycle, and some sensible default for concurrency - that way the asyncio fork can be a one line change of from concurrency.asyncio import facade instead of from concurrency.greenlet import facade

What are your thoughts on this?

Move Pre-emptive requests and reauthentication jobs out of Chewie

There a too many of timer-based async actions running inside the Chewie main file making it difficult to read.

[ ] Move port reauthentication and session management out of Chewie.
[ ] Move pre-emptive identity request generation and management out of Chewie (probably into a port SM or EAP SM)

Change set_port_status to be a Property

The set_port_status function in Chewie.py does not use properties to control access instead the value of port_status is public and there is also a public function intended to be used to drive the events. This can leave the state machines that are subscribe to fall out of sync with Chewie.

Change to a property and enforce a clean sub/observer pattern.

Move radius logic out of Chewie

The radius serialising/deserialising code mirrors the EAP code but there are enough differences that they should probably follow different patterns.

One of the things with the EAP code is that it's stateless so can be done out of a bunch of static class methods, whereas the radius deserialisation needs things like the packet_id_to_request_authenticator lookup. Passing a callback two layers through a method call is a red flag and a sign that this function probably lives closer to the deserialiser.

Going from this

chewie -> radius packer/unpacker (with callbacks back to chewie)

To this

chewie -> radius lifecycle -> radius packer/unpacker

The lifecycle-packer/unpacker relationship could be done a number of ways, but the key is for that to be its own object that Chewie doesn't know or care about.

Configuration options

At least these should be configurable:

chewie.radius_secret
chewie.chewie_id (maybe faucet's id, dpid??)

and maybe these:

FullEAPStateMachine.DEFAULT_TIMEOUT
chewie.radius_udp_port (may be useful for testing)

Support more (all) RADIUS attributes

will currently throw an error if receive an radius attribute we don't know about.

Send NAS-Port in Access-Request

RFC 2865: "... Either NAS-Port or NAS-Port-Type (61) or both SHOULD
be present in an Access-Request packet, if the NAS differentiates
among its ports."

We only send NAS-Port-Type, but we now know where the packet came from, so can also send NAS-Port.

We have 4 octets - [n/a, n/a, dp, port] would match the current implementation of the mac encoded port-id.
00:00:00:00:dp:port

Test Chewie with mocking and calls to the individual message consuming/producing methods

#69 gets rid of the threads from the tests, next step is to totally mock out eventlet here.

I had a bit of a play and couldn't quite get it done (the way coroutines are declared in python is weird but eventlet/gevent/asyncio figure out how to do this), but what we need is basically to build our own hub with no scheduler, and have a method to step forward and bump everything once. We'll need to mock the calls to sleep and queue read etc so that they don't block, and ultimately we can step them forward until they're waiting on queues and then make the tests look like get_to_block -> add_message_to_queue -> step_forward -> assert_output

Go through TODO's and Clean up Code

Other EAP types

Only MD5 and TTLS are currently supported

Make the chewie tests only test chewie

The current Chewie tests end up re-testing the state machine.

We should mock out any references to the state machine and test just chewie's functionality. That is, make sure it passes messages to and from the state machines correctly and responds to port status messages from faucet. It's going to be somewhat abstract, but that's the correct way to do test an object like this.

	if isinstance(message, EapolStartMessage) or \
	(self.state in (FullEAPStateMachine.TIMEOUT_FAILURE,
	FullEAPStateMachine.TIMEOUT_FAILURE2) and
	isinstance(message, EapMessage) and message.code == Eap.RESPONSE
	):
	self.eap_restart = True