GithubHelp home page GithubHelp logo

kisom / pypcapfile Goto Github PK

View Code? Open in Web Editor NEW
76.0 15.0 33.0 180 KB

Pure Python library for handling libpcap savefiles.

Home Page: http://kisom.github.com/pypcapfile

License: ISC License

Python 100.00%

pypcapfile's Introduction

pypcapfile

pypcapfile is a pure Python library for handling libpcap savefiles.

Installing

The easiest way to install is from
sudo pip install pypcapfile
Note that for pip, the package name is pypcapfile; in your code you will need to
import pcapfile.
Alternatively, you can install from source. Clone the repository, and run setup.py with
an install argument:
git clone git://github.com/kisom/pypcapfile.git
cd pypcapfile
./setup.py install
This does require the Python distutils to be
installed.

Introduction

The core functionality is implemented in pcapfile.savefile:

>>> from pcapfile import savefile
>>> testcap = open('test.pcap', 'rb')
>>> capfile = savefile.load_savefile(testcap, verbose=True)
[+] attempting to load test.pcap
[+] found valid header
[+] loaded 11 packets
[+] finished loading savefile.
>>> print(capfile)
little-endian capture file version 2.4
microsecond time resolution
snapshot length: 65535
linklayer type: LINKTYPE_ETHERNET
number of packets: 11

You can take a look at the packets in capfile.packets:

>>> pkt = capfile.packets[0]
>>> pkt.raw()
<binary data snipped>
>>> pkt.timestamp
1343676707L
Right now there is very basic support for Ethernet and Wi-Fi frames and IPv4 packet
parsing.

Automatically decoding layers

The layers argument to load_savefile determines how many layers to
decode; the default value of 0 does no decoding, 1 will load only the link
layer, etc... For example, with no decoding:
>>> from pcapfile import savefile
>>> from pcapfile.protocols.linklayer import ethernet
>>> from pcapfile.protocols.linklayer import wifi
>>> from pcapfile.protocols.network import ip
>>> testcap = open('samples/test.pcap', 'rb')
>>> capfile = savefile.load_savefile(testcap, verbose=True)
[+] attempting to load samples/test.pcap
[+] found valid header
[+] loaded 3 packets
[+] finished loading savefile.
>>> eth_frame = ethernet.Ethernet(capfile.packets[0].raw())
>>> wifi_frame = wifi.WIFI(capfile.packets[1].raw())
>>> print(eth_frame)
ethernet from 00:11:22:33:44:55 to ff:ee:dd:cc:bb:aa type IPv4
>>> print(wifi_frame)
QoS data (sa: None, ta: 00:11:22:33:44:55, ra: ff:ee:dd:cc:bb:aa, da: None)
>>> ip_packet = ip.IP(eth_frame.payload)
>>> print(ip_packet)
ipv4 packet from 192.168.2.47 to 173.194.37.82 carrying 44 bytes
>>> ip_packet = ip.IP(wifi_frame.payload[0]['payload']) #if wifi_frame.category == 2 and wifi_frame.subtype == 8
>>> print(ip_packet)
ipv4 packet from 192.168.2.175 to 239.255.255.250 carrying 336 bytes

and this example:

>>> from pcapfile import savefile
>>> testcap = open('samples/test.pcap', 'rb')
>>> capfile = savefile.load_savefile(testcap, layers=1, verbose=True)
[+] attempting to load samples/test.pcap
[+] found valid header
[+] loaded 3 packets
[+] finished loading savefile.
>>> print(capfile.packets[0].packet.src)
00:11:22:33:44:55
>>> print(capfile.packets[0].packet.payload)
<hex string snipped>

and this example to pull the raw payload from every packet in a pcap file:

>>> from pcapfile import savefile
>>> import binascii

>>> capfile = savefile.load_savefile(testcap)
>>> file_length = capfile.__length__()
>>> for packet in range(0, file_length):
>>>     pkt = capfile.packets[packet]
>>>     data = binascii.b2a_qp(pkt.raw())  # Do something here

and lastly:

>>> from pcapfile import savefile
>>> testcap = open('samples/test.pcap', 'rb')
>>> capfile = savefile.load_savefile(testcap, layers=2, verbose=True)
>>> print(capfile.packets[0].packet.payload)
ipv4 packet from 192.168.2.47 to 173.194.37.82 carrying 44 bytes
The IPv4 module (ip) currently only supports basic IP headers, i.e. it
doesn't yet parse options or add in padding.

The interface is still a bit messy.

Run Unit Tests

  • cd /path/pypcapfile
  • cp pcapfile/test/__main__.py .
  • python __main__.py

Future planned improvements

  • IP options parsing (END and NOP is supported)
  • IPv6 support
  • TCP options parsing
  • ARP support

TODO

  1. write unit tests
  2. add __repr__ method that shows all of the values of the fields in IP packets and Ethernet frames.

See also

Contributors

A list of the project's contributors may be found in the AUTHORS file.

pypcapfile's People

Contributors

asergi avatar cristiklein avatar dmitrikh avatar don42 avatar dos1 avatar douglaskastle avatar eclazi avatar hankchan avatar jchia avatar johnthagen avatar kisom avatar kivanccakmak avatar shealutton avatar stevepeak avatar tommyolofsson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pypcapfile's Issues

Error on loading pcap file.

Getting an error when sf = savefile.load_savefile('file.pcap')

   __TRACE__('[+] attempting to load %s', (input_file.name, ))
AttributeError: 'str' object has no attribute 'name'

Failing tests don't fail the build

When fixing #25 I noticed that one of the tests was failing on python 3 due to a change in
the interface of the string function translate [1]. I fixed that, but then noticed that the builds
on travis where passing all along. It seems that coverage does not fail the build on failing tests.
Is that intentional?

If not, it might be possible to run codecov with the --required option making it exit with -1 when it
fails [2]. Otherwise it would be necessary to run the test with a separate test runner.

[1] https://travis-ci.org/kisom/pypcapfile/jobs/309041427
[2] https://github.com/codecov/codecov-python/blob/0743daa83647f12ff31b84d07113d2c24c27b924/codecov/__init__.py#L213

Non-hexlify everything

Hello,

I recently needed to process a 2.3 GB pcap file, which initially took 186 s. Upon profiling, I realised that a lot of time was spent in hexlify/unhexlify, both on pcap's side as well as on my side. I felt this was both ugly and inefficient, so I checked what it means to eliminate all hexlification everywhere. The processing time decreased to 81 s, i.e., performance more than doubled. (see cristiklein@e29d084)

I could spend some time rebasing my branch on top of pycapfile's current master, but before investing this time, I wanted to check if you are okey breaking the API for both performance improvement and a more intuitive API.

Regards,
Cristian

Broken on Python 3.12

linklayer.py relies on deprecated module that got removed from Python 3.12:

>>> from pcapfile import savefile
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dos/git/ViewSB/venv/lib/python3.12/site-packages/pcapfile/savefile.py", line 13, in <module>
    import pcapfile.linklayer as linklayer
  File "/home/dos/git/ViewSB/venv/lib/python3.12/site-packages/pcapfile/linklayer.py", line 7, in <module>
    import imp
ModuleNotFoundError: No module named 'imp'

DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses

Documentation update

Example given in "Automatically decoding layers" shows

eth_frame = ethernet.Ethernet(capfile.packets[0].raw())
wifi_frame = wifi.WIFI(capfile.packets[1].raw())
print(eth_frame)
ethernet from 00:11:22:33:44:55 to ff:ee:dd:cc:bb:aa type IPv4
print(wifi_frame)
QoS data (sa: None, ta: 00:11:22:33:44:55, ra: ff:ee:dd:cc:bb:aa, da: None)
ip_packet = ip.IP(eth_frame.payload)
print(ip_packet)

When tested on python 3.9.x the example requires additional "binascii.unhexlify" to create IP packet from Ethernet and UDP from IP.
ip_packet = ip.IP(binascii.unhexlify(eth_frame.payload))
udp_frame = udp.UDP(binascii.unhexlify(ip_packet.payload))

timestamp_ms is misleading

timestamp_ms member of pcap_packet is confusing. I think is should be renamed timestamp_us as it is in microseconds.

UDP.payload is wrongly assumed to be NUL-terminated

UDP.payload is wrong if the payload contains a NUL (b'\x00') character.

>>> from pcapfile.protocols.transport import udp
>>> u = udp.UDP(b'\x00\x00\x00\x00\x00\x01\x00\x00\x00')
>>> u.payload    # Should be b'\x00'
b''

This is because of the use of c_char_p, which is for NUL-terminated strings, but UDP payloads in general can contain NUL characters. Why is c_char_p used for payload? What is the purpose using ctypes and ctypes.Structure for the UDP class?

Don't throw general Exception

The library should only throw specific exceptions and not the "base" Exception, since that makes
it difficult to differentiate where an exception comes from.

I usually define one Exception class as a base exception of the package and specific exceptions inherit from that.

Are there any problems with this implementation?
I would do the change if there aren't.

Cannot load pcap files split by editcap

I have a large pcap file so I split it by editcap. However when I use pypcapfile to load the split file, it returns an exception

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "d:\anaconda3\envs\network\lib\site-packages\pcapfile\savefile.py", line 130, in load_savefile
    header = _load_savefile_header(input_file)
  File "d:\anaconda3\envs\network\lib\site-packages\pcapfile\savefile.py", line 105, in _load_savefile_header
    raise UnknownMagicNumber("No supported Magic Number found")
pcapfile.UnknownMagicNumber: No supported Magic Number found

How can I read split pcap files by this tool?

Access ARP packet

How I can access ARP packet from pcap file?

I am accessing IP packet using

ip_packet = ip.IP(binascii.unhexlify(ethernet.Ethernet(packet.raw()).payload))

Issue with lazy loading and IP Packet Parsing

When I try to use lazy loading for the packets and iterate through them the assertion below is hit:
assert ((magic >> 4) == 4 and (magic & 0x0f) > 4), 'not an IPv4 packet.'

Here's the traceback:

Traceback (most recent call last):
File "pcap_speed_test/pcap_test_suite.py", line 55, in
read_pcap()
File "pcap_speed_test/pcap_test_suite.py", line 45, in read_pcap
sss(capfile.packets)
File "pcap_speed_test/pcap_test_suite.py", line 15, in sss
print(ip.IP(binascii.unhexlify(ethernet.Ethernet(next_packet.raw()).payload)))
File "lib/python3.5/site-packages/pcapfile/protocols/network/ip.py", line 34, in init
(magic & 0x0f) > 4), 'not an IPv4 packet.'
AssertionError: not an IPv4 packet.

PCAP file works fine without lazy set. Seems to work fine if I take out the assertion.

timestamp only gives the second mark

Is there a reason why timestamp only gives the timestamp in seconds?

Is it possible to add a simple API call that combines timestamp with timestamp_us/timestamp_ms?

Seems unnecessary for the user to add these values together to get a full timestamp.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.