GithubHelp home page GithubHelp logo

railslove / cmxl Goto Github PK

View Code? Open in Web Editor NEW
46.0 16.0 25.0 217 KB

your friendly MT940 SWIFT file parser for bank statements

Home Page: http://railslove.com

License: MIT License

Ruby 100.00%
mt940 mt942 fintech parser bank-statements banking swift

cmxl's Introduction

Build Status Gem Version

Cmxl - your friendly ruby MT940 parser

At Railslove we build a lot of financial applications and work on integrating applications with banks and banking functionality. Our goal is to make simple solutions for what often looks complicated.

Cmxl is a friendly and extensible MT940 bank statement file parser that helps you extract data from bank statement files.

What is MT940 & MT942?

MT940 (MT = Message Type) is the SWIFT-Standard for the electronic transfer of bank statement files. When integrating with banks you often get MT940 or MT942 files as interface. For more information have a look at the different SWIFT message types

At some point in the future MT940 file should be exchanged with newer XML documents - but banking institutions are slow, so MT940 will stick around for a while.

Reqirements

Cmxl is a pure ruby parser and has no dependency on native extensions.

  • Ruby (current officially supported distributions)

Installation

Add this line to your application's Gemfile:

gem 'cmxl'

And then execute:

$ bundle

Or install it yourself as:

$ gem install cmxl

Usage

Simple usage:

# Configuration:

# statement divider regex to split the individual statements in one file - the default is standard and should be good for most files
Cmxl.config[:statement_separator] = /\n-.\n/m

# do you want an error to be raised when a line can not be parsed? default is true
Cmxl.config[:raise_line_format_errors] = true

# try to stip the SWIFT header data. This strips everything until the actual first MT940 field. (if parsing fails, try this!)
Cmxl.config[:strip_headers] = true


# Statment parsing:

statements = Cmxl.parse(File.read('mt940.txt'), :encoding => 'ISO-8859-1') # parses the file and returns an array of statement objects. Please note: if no encoding is given Cmxl tries to guess the encoding from the content and converts it to UTF-8.
statements.each do |s|
  puts s.reference
  puts s.generation_date
  puts s.opening_balance.amount
  puts s.closing_balance.amount
  puts s.sha # SHA of the statement source - could be used as an identifier (see: https://github.com/railslove/cmxl/blob/master/lib/cmxl/statement.rb#L49-L55)

  s.transactions.each do |t|
    puts t.information
    puts t.description
    puts t.entry_date
    puts t.funds_code
    puts t.credit?
    puts t.debit?
    puts t.sign # -1 if it's a debit; 1 if it's a credit
    puts t.name
    puts t.iban
    puts t.sepa
    puts t.sub_fields
    puts t.reference
    puts t.bank_reference
    # ...
  end
end

Every object responds to to_h and let's you easily convert the data to a hash. Also every object responds to to_json which lets you easily represent the statements as JSON with your favorite JSON library.

A note about encoding and file weirdnesses

You probably will encounter encoding issues (hey, you are building banking applications!). We try to handle encoding and format weirdnesses as much as possible. If no encoding is passed we try to guess the encoding of the data and convert it to UTF8. In the likely case that you encounter encoding issues you can pass encoding options to Cmxl.parse(<string>, <options hash>). It accepts the same options as String#encode If that fails, try to modify the file before you pass it to the parser - and please create an issue.

MT940 SWIFT header data

Cmxl currently does not support parsing of the SWIFT headers (like {1:F01AXISINBBA ....) If your file comes with these headers try the strip_headers configuration option to strip data execpt the actual MT940 fields.

Cmxl.config[:strip_headers] = true
Cmxl.parse(...)

MT942 data

CMXL is now also capable of parsing MT942 data. Just pass the data and the parser will identify the type automatically.

first_statement = Cmxl.parse(File.read('mt940.txt'), :encoding => 'ISO-8859-1').first
puts first_statement.mt942?
#=> false

first_statement = Cmxl.parse(File.read('mt942.txt'), :encoding => 'ISO-8859-1').first
puts first_statement.mt942?
#=> true

p first_statement.vmk_credit_summary.to_h
#=> { type: 'credit', entries: 1, amount: 9792.0, currency: 'EUR' }

p first_statement.vmk_dedit_summary.to_h
#=> { type: 'debit', entries: 0, amount: 0.0, currency: 'EUR' }

first_statement.transactions # same as for MT940

Custom field parsers

Because a lot of banks implement the MT940 format slightly different one of the design goals of this library is to be able to customize the individual field parsers. Every line get parsed with a special parser. Here is how to write your own parser:

# simply create a new parser class inheriting from Cmxl::Field
class MyFieldParser < Cmxl::Field
  self.tag = 42 # define which MT940 tag your parser can handle. This will automatically register your parser and overwriting existing parsers
  self.parser = /(?<world>.*)/ # the regex to parse the line. Use named regexp to access your match.

  def upcased
    self.data['world'].upcase
  end
end

my_field_parser = MyFieldParser.parse(":42:hello from mt940")
my_field_parser.world #=> hello from MT940
my_field_parser.upcased #=> HELLO FROM MT940
my_field_parser.data #=> {'world' => 'hello from mt940'} - data is the accessor to the regexp matches

Parsing issues? - please create an issue with your file

The Mt940 format often looks different for the different banks and the different countries. Especially the not strict defined fields are often used for custom bank data. If you have a file that can not be parsed please open an issue. We hope to build a parser that handles most of the files.

ToDo

  • collect MT940 files from different banks and use them as example for specs
  • better header data handling

Looking for other Banking and EBICS tools?

Maybe these are also interesting for you.

Contributing

Automated tests: We use rspec to test Cmxl. Simply run rake to execute the whole test suite.

  1. Fork it ( http://github.com/railslove/cmxl/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

Credits and other parsers

Cmxl is inspired and borrows ideas from the mt940_parser by the great people at betterplace.

other parsers:


built with love by Railslove and some amazing people.
Released under the MIT-Licence.

Railslove builds FinTech products, if you need support for your project we are happy to help. Please contact us at [email protected].

cmxl's People

Contributors

bkues-zuora avatar bumi avatar joesouthan avatar mkilling avatar namxam avatar olleolleolle avatar prometh07 avatar uepsilon avatar yoyostile avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cmxl's Issues

processing problem

I recently had problems with MT940 so I upgraded to 1.1.0 and from that time I got issues (but it may also be the bank changing the format).
MT940 piece:

:61:180627D79,NMSCXXXX3550//MA-20-00084395
28/06/1812:15 PIZZA HUT MA3550
:86:XXXX3550        /TYPE/631/PAYM CARTE

result:

#<Cmxl::Fields::Transaction:0x0000000007ad8c78 @tag="61", @modifier=nil, @source="180627D79,NMSCXXXX3550//MA-20-00084395\n28/06/1812:15 PIZZA HUT MA3550", @data={"date"=>"180627", "entry_date"=>nil, "storno_flag"=>"", "funds_code"=>"D", "currency_letter"=>nil, "amount"=>"79,", "swift_code"=>"NMSC", "reference"=>"XXXX3550//MA-20-", "bank_reference"=>nil, "supplementary"=>"00084395"}, @match=#<MatchData "180627D79,NMSCXXXX3550//MA-20-00084395" date:"180627" entry_date:nil storno_flag:"" funds_code:"D" currency_letter:nil amount:"79," swift_code:"NMSC" reference:"XXXX3550//MA-20-" bank_reference:nil supplementary:"00084395">, @details=#<Cmxl::Fields::StatementDetails:0x0000000007ad8570 @tag="86", @modifier=nil, @source="XXXX3550        /TYPE/631/PAYM CARTE", @data={"transaction_code"=>"XXX", "details"=>"X3550        /TYPE/631/PAYM CARTE", "seperator"=>"X"}, @match=#<MatchData "XXXX3550        /TYPE/631/PAYM CARTE" transaction_code:"XXX" details:"X3550        /TYPE/631/PAYM CARTE" seperator:"X">>>

Why do I have such reference and bank_reference? Is it proper result of processing?
I would expect a reference of "XXXX3550", bank_reference "MA-20-00084395" and supplementary "28/06/1812:15 PIZZA HUT MA3550"

I use Cmxl.config[:statement_separator] = /\r?\n-\r?\n(?:[^:]*\r?\n)+/m

Lines with non-digit tags should be parsed too.

Scenario: I'm trying to parse a file with a header containing a line with a custom, non-digit tag, eg. :NS:some-description.

Available options:

  1. Ignore errors by setting Cmxl.config[:raise_line_format_errors] = false
  2. Strip headers.

None of these is helpful. However, I think there are some solutions to the problem:

  1. Change regex used to match tags in self.parse(line) inside field.rb
  2. Allow to ignore lines starting with certain tags or meeting another critiera.

I can make a PR, if you want so.

Negative Debit

I have a case like this: :61:1908150815D-104,12NMSCNONREF//010F214191270328

It fails to parse correctly, likely because of the D-104,12. I'm not certain that's correct or not though. I guess it comes down to the - being allowed or not. Perhaps you've seen such a case before?

Statement and sequence numbers aren't correctly parsed when without filling 0 digits

The current regex to parse the statement and sequence numbers doesn't follow the swift specs as declared here.

The examples :28C:235/1 and :28C:235/1 provided in the spec will not match because the regex expects 5 digits for the statement and 3 to 5 digits for the sequence number. The sequence number should be optional and the regex should allow shorter statement and sequence numbers. I would suggest:

/(?<statement_number>\d{1,5})(?:\/(?<sequence_number>\d{1,5}))?/

rchardet19 upgrade

Is there any specific dependency on rchardet19 gem or could it be upgraded to latest rchardet?

It also contains support for Ruby 1.9.

https://github.com/jmhodges/rchardet

If we use e.g. git gem which has this as dependency, there is conflict in some constants with rchardet19 version.

gems/rchardet19-1.3.7/lib/rchardet19.rb:59: warning: already initialized constant CharDet::VERSION
gems/rchardet-1.8.0/lib/rchardet/version.rb:2: warning: previous definition of VERSION was here

I know it's only warning, but I'd like to resolve it

Unencrypted secret in codebase

- name: CC_TEST_REPORTER_ID
value: 149f0d20e17ace00c44be432a4194bc441ba805da89a7708fca0f4a2c3f2aed7

Worst case is that someone might send differing data to CodeClimate. That aside, it may be annoying if all forks post to the same CodeClimate Project.

rchardet19 dependency breaks under ruby 3.3

Ruby 3.3 changes the parameters to Regexp.new, which in turn breaks rchardet19s universal detector, which uses an incompatible parameter style.

Can the rchardet19 dependency be changed to rchardet, as the former seems to have been abandoned in 2014?

transactions entry_date is not the actual transaction date

I actually don't know if our source file is valid, but we have a file (real file, coming from a bank) that has opening_date and closing_date set at 2019/12/30, but one transaction with an individual date of 2019/12/31.

The whole case shouldn't be possible, but in this case the cmxl's parsed "entry_date" is "1230" and the "date" field is "191231" - which is the correct date. Not only is the entry_date field wrong (I guess it's taking it's value from the global opening/closing dates) but it also misses the year.

For now, is it safe to use "date" instead of "entry_date"?

BN Paribas MT940

they add some header lines in front of each statements
I am not sure I am allowed to attach file here
It is not possible to cover it by line parser as there is no :tag: here

1601 25V3241A1XAXXX00001
0000 30BMCIMAMCXXXX00001
940 02
:20:BMCI
...

first idea is to prefix file contents with "\r\n-\r\n" and use such /\r?\n-\r?\n([^:].*\r?\n)+/m statements separator which should consume this header but I still get wrong line format...

Deprecated Warning: Using the last argument as keyword parameters is deprecated

Hey Bumi,

first of all: great job with cmxl :)

I was just about to play around with cmxl and received the following warning:

~/.rvm/gems/ruby-2.7.1@cmxl-test/gems/cmxl-1.4.6/lib/Cmxl.rb:39: warning: Using the last argument as keyword parameters is deprecated

I'm using ruby 2.7.1 and Cmxl 1.4.6.
For testing purpose I used your example described in Simple usage part of the README with fixtures file mt940-iso8859-1.txt.

Best wishes

Field 61: Statement Line cannot parse transactions of type 'S'

There are three acceptable codes for transaction type ('S', 'N', and 'F') in Field 61 of MT940, but the parser in Cmxl::Fields::Transaction only handles two of them ('N' and 'F').

See this link for more information about the acceptable codes.

I believe this issue can be resolved simply by adding this missing third letter to the swift_code group of the regex, so that it becomes as follows.

%r{^(?<date>\d{6})(?<entry_date>\d{4})?(?<storno_flag>R?)(?<funds_code>[CD]{1})(?<currency_letter>[a-zA-Z])?(?<amount>\d{1,12},\d{0,2})(?<swift_code>(?:S|N|F).{3})(?<reference>NONREF|(.(?!\/\/)){,16}([^\/]){,1})((?:\/\/)(?<bank_reference>[^\n]{,16}))?((?:\n)(?<supplementary>.{,34}))?$}

Here is an example line that is not currently handled correctly.

:61:1911181118CR653,00S445328556-76501096

Line break within transaction, colon right after newline

Hi, I’m currently having problems with Deutsche Bank. The error message I’m getting is:

Cmxl::Field::LineFormatError: Wrong line format: ":08 Karten?25nr. 5355999999999975  Origi?26nal 49,00 USD 1 EUR/1,\r\n12385?27 USD  Entgelt 0,44 EUR?30DEUTDEDBFRA?31DE1950070024000402\r\n0480?32DEUTSCHE BANK"

The transaction that’s causing the issue is:

:61:190425D44,04NMSCNONREF
:86:106?109075/658?20EREF+000000000193592204?21MREF+CN3R3U?22CRED+DE7
600200000132558?23SVWZ+STARTER//8449273399/US?24 22-04-2019T03:46
:08 Karten?25nr. 5355999999999975  Origi?26nal 49,00 USD 1 EUR/1,
12385?27 USD  Entgelt 0,44 EUR?30DEUTDEDBFRA?31DE1950070024000402
0480?32DEUTSCHE BANK

Note the linebreak and the :08 right after the newline, which does not indicate a new section but is just a part of the transaction details that has been inconveniently split.

Unfortunately I don’t know right now if this is a configuration issue from my side or a case that Cmxl currently cannot handle...

add EREF reader to transaction

the sepa['eref'] field contains the end to end id of SEPA transactions.
since this is a widely used identifier (that is also used to match the transaction with a debit/credit) we should add an easy reader to the transaction class.

Question/Discussion: Use of sha value for transactions

I used the value given by the sha method in statement and transaction to find them in a database.
This works fine for me but some weeks before i thought i lost some transactions in my database.
In fact they were all there but the sha hashes were the same.
2 cases

First case:
Debit transfer with same day, same amount, same receiver account only transaction information differs (invoice number)
Sha hash is the same because all fields in :61 are identical. The difference is in :86
My quick fix is i build my own sha from :61 and information from :86.

Second case:
Credit transfer all values identical. Sender made accidently same transaction twice the same day. This case is really rare but happend in real world. My fix for this i built also my own hash and add a increment to raw transaction data (source).

So my questions to discuss:

Are the hashes meant to be used to identify transactions?

If yes should cmxl handle these rare cases or should the piece of software which uses cmxl handle this?

support for Mt942 fields

missing tags are:

  • 13 - Date/Time Indication
  • 34 - Floor Limit Indicator
  • 90 - Number and Sum of Entries

Chargeback transfers fail to be parsed

In case of chargebacks there are additional informations in the :61: fields (OCMT and CHGS data).
The current regular expression completely fails in parsing these :61: lines because of that additional data.

Changing the regular expression of Cmxl::Fields::Transaction to
/^(?<date>\d{6})(?<entry_date>\d{4})?(?<storno_flag>R?)(?<funds_code>[CD]{1})(?<currency_letter>[a-zA-Z])?(?<amount>\d{1,12},\d{0,2})(?<swift_code>(?:N|F).{3})(?<reference>NONREF|.{0,16})((?:\/\/)(?<bank_reference>[^\r\n]*))?((?:[\r\n])?((?:\/OCMT\/)(?<ocmt>[^\/]*)(?:\/)(?:\/CHGS\/)(?<chgs>[^\/]*)(?:\/)))?/i
fixes that and additionally gives the OCMT and CHGS fields.

The changed part is: the whole bank reference group is now optional and can contain any characters except for CR-LF. After that there may be an additional block separated by CR-LF containing /OCMT/3a15num with an optional slash at the end followed by /CHGS/3a15num with an optional slash at the end.

Tag 86 at the end does not get parsed

I have :86: at the end of a statement. Per https://www.sepaforcorporates.com/swift-for-corporates/account-statement-mt940-file-format-overview/ this tag is to be treated as meta data at the statement level:
Tag 86 – Information to Account Owner
Optional – 6x65x
Additional information about the statement as a whole

In debugging it in the parser, this tag is getting associated as meta data for tag 64 (Available Balance) which immediately precedes it AND does not have an add_meta_data method. I believe this information should be added at the statement level as meta data. In my case the value is ":86:/OSDR/HSBCPLPW" which is the SWIFT code or BIC.

When the entry date is a month after a statement in january, its year is wrongly reduced by one

As of commit d692be7 there is a check in place for statements spanning over a year boundary. If a statement is parsed, that has an entry date in february and a date in january, the entry_date is wrongly set back to an earlier year. The following field 61 gets parsed with an entry_date of 2018-02-01 but should be 2019-02-01.

:61:1901310201DR1,6NMSCNONREF//XXXXXXXXXX

This happens, because there is the assumption that the entry_date is always before the date and not the other way around, but many banks date their monthly statements like that.

Instead of checking if entry_date.month is bigger than date.month, as a simple fix I suggest to check if entry_date.month is bigger than date.month+6. The same should be checked the other way around. if entry_date.month is smaller than date.month-6 than add a year to entry_date, to catch constalations like the one above when having a year in between:

:61:1812310101DR3498,06NTRFNONREF//XXXXXXXXXX

A more sophisticated fix would be to make sure that the date lies between the opening and closing dates (60F and 62F) of the statetement, and if not, move it there. This would require access to the statement and can therefor not be done while parsing the 61 field without coupling the Statement class.

Perhaps its easier to not guess the year at all, and allow the client to add his own year determination function? The current behaviour could stay in place as a default for backwards compatibility reasons.

Parsing issue in field.rb:51- 'Wrong line format'

Error message:
/Users/xxxx/.rvm/gems/ruby-2.2.1/gems/cmxl-0.1.3/lib/cmxl/field.rb:51:in `parse': Wrong line format: "{1:F01AXISINBBAXXXXXXXXXXXXX}{2:I940XXXXXXXXAXXXN}{4:" (Cmxl::Field::LineFormatError)

Perhaps the statement_separator I'm using doesn't match my file:

Cmxl.config[:statement_separator] = /\n-.\n/m

Here is the MT940 file I'm trying to parse and convert:

{1:F01AXISINBBAXXXXXXXXXXXXX}{2:I940XXXXXXXXAXXXN}{4:
:20:MT940/78274
:25:xxxxxxxxxxxxxx
:28C:3/1
:60F:C160201INR0,00
:61:1602080208CR1338474,92NMSC89962273
:86:-TX BRN-REF NO.0234FIR1600187 USD 19995/RLZ           
:61:1602120212DR390000,00NCHK73401
:86:-CHQ73401   TX BRN-CLG-CHQ PAID TO SANTHANA                      
:61:1603050305DR27300,00NCHK73403
:86:-CHQ73403   TX BRN-CLG-CHQ PAID TO SANTHANA                      
:61:1604060406DR39000,00NCHK73404
:86:-CHQ73404   TX BRN-CLG-CHQ PAID TO SANTHANA                      
:61:1604150415DR33249,00NCHK73405
:86:-CHQ73405   TX BRN-CLG-CHQ PAID TO CHENNAI AIRCONDITIONERS       
:61:1604280428DR5000,00NCHK73406
:86:-CHQ73406   TX VERVE FINANCIAL SERVICES PVT LTD                  
:61:1604300430DR392751,00NCHK73407
:86:-CHQ73407   TX BY SALARY                                         
:61:1605040504DR35100,00NCHK73408
:86:-CHQ73408   TX BRN-CLG-CHQ PAID TO P SANTHANA GOPALA KRISHNA     
:61:1605070507DR5000,00NCHK73402
:86:-CHQ73402   TX BRN-CLG-CHQ PAID TO RELIANCE COMMUNICATI          
:61:1605090509DR76917,00NMSC315222
:86:-TX INB/NEFT/AXIR161301965349/Reliance Comm/Reliance B
:61:1605120512CR1322831,02NMSC96582848
:86:-TX BRN-REF NO.0234FIR1600665 USD 19995/RLZ           
:61:1605170517DR61425,00NTRF7335278
:86:-TX INB/IFT/ANITHA SURESH/Admin Reimbursement         
:61:1605250525DR38000,00NCHK73409
:86:-CHQ73409   TX VERVE FINANCIAL SERVICES                          
:62F:C160527INR1557563,94
-}

Statement Field generation_date NoMethodError: undefined method `date' for nil:NilClass

First of all thanks for sharing the code.

I updated cmxl version in my project from 0.2.0 to 1.4.1. Then one of my tests parsing a mt940 is failing.
Quickly i found why this happens. Generation_date in statement wants to read from field :20 or :13
In my case i get from my bank (Commerzbank) field :20 only with reference no extra values are provided.
Field :13 is not provided at all because it is mt942.

def generation_date field(20).date || field(13).date end

Call of field(13) gives nil and then method date fails with

NoMethodError: undefined method `date' for nil:NilClass

:20 exists in mt940 and mt942 so even if there is no generation_date provided in that field field(20).date will not fail. It only returns nil. But then field(13).date is called and gives error mentioned above.

Method generation_date should check if field :13 is provided before call date on it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.