beancount / beangulp Goto Github PK

View Code? Open in Web Editor NEW

52.0 52.0 22.0 553 KB

Importers framework for Beancount

License: GNU General Public License v2.0

Python 99.42% Shell 0.04% HTML 0.47% PostScript 0.07%

beangulp's Introduction

beancount: Double-Entry Accounting from Text Files

Contents

Description
Documentation
Download & Installation
Versions
Filing Bugs
Copyright and License
Donations
Author

Description

A double-entry bookkeeping computer language that lets you define financial transaction records in a text file, read them in memory, generate a variety of reports from them, and provides a web interface.

Documentation

Documentation can be read at:

https://beancount.github.io/docs/

Documentation authoring happens on Google Docs, where you can contribute by requesting access or commenting on individual documents. An index of all source documents is available here:

http://furius.ca/beancount/doc/index

There's a mailing-list dedicated to Beancount, please post questions there, so others can share in the responses. More general discussions about command-line accounting also occur on the Ledger mailing-list so you might be interested in that group as well.

Download & Installation

You can obtain the source code from the official Git repository on Github:

https://github.com/beancount/beancount/

See the Installing Beancount document for more details.

Versions

There are three versions

Version 3 (branch master): The in-development next version of Beancount since June 2020. This is unstable and you want to use version 2 below. The scope of changes is described in this document.
Version 2 (branch v2): The current stable version of Beancount, in maintenance mode as of July 2020. This was a complete rewrite of the first version, which introduced a number of constraints and a new grammar and much more. Use this now.
Version 1 (branch v1): The original version of Beancount. Development on this version halted in 2013. This initial version was intended to be similar to and partially compatible with Ledger. Do not use this.

Filing Bugs

Tickets can be filed at on the Github project page:

https://github.com/beancount/beancount/issues

Copyright and License

This code is distributed under the terms of the "GNU GPLv2 only". See COPYING file for details.

Donations

Beancount has found itself being useful to many users, companies, and foundations since I started it around 2007. I never ask for money, as my intent with this project is to build something that is useful to me first, as well as for others, in the simplest, most durable manner, and I believe in the genuinely free and open stance of Open Source software. Though its ends are utilitarian - it is about doing my own accounting in the first order - it is also a labor of love and I take great pride in it, pride which has pushed me to add the polish so that it would be usable and understandable by others. This is one of the rare areas of my software practice where I can let my desire for perfection and minimalism run untamed from the demands of time and external constraints.

Many people have asked where they can donate for the project. If you would like to give back, you can send a donation via Wise (preferably):

https://wise.com/share/martinb4019

or PayPal at:

https://www.paypal.com/paypalme/misislavski

Your donation is always appreciated in any amount, and while the countless hours spent on building this project are impossible to match, the impact of each donation is much larger than its financial import. I truly appreciate every person who offers one; software can be a lonely endeavour, and those donations as well as words of appreciation keep reminding me of the positive impact my side projects can have on others. I feel gratitude for all users of Beancount.

Thank you!

Author

Martin Blais <[email protected]>

beangulp's People

Contributors

Stargazers

Watchers

Forkers

dnicolodi xuhcc kubauk bayesianmind maxalbert huruka doriath thomasdenh michelgb nicdumz blaggacao henrikssn jorge1o1 gateswong boogiewookie patbakdev scuraursa ericaltendorf oraluben

beangulp's Issues

Use enhanced extract methods in standard importer

Original report by Patrick Ruckstuhl (GitHub: tarioch).

It would be great to change the standard importers to use the new enhanced method. That way they can be easily used with smart_importer.
I created PR 71 for this.

Don't consume all provided arguments in new ingestion flow

Original report by Zahan Malkani (Bitbucket: Zahan, GitHub: zahanm).

An issue with the implementation in
https://bitbucket.org/blais/beancount/commits/0c66d90c2173fd5fcc83d10c8297800d70a81c9b
which itself addresses the questions raised in beancount/beancount#75
is this:

you can't provide custom arguments to your own import script, because once you call ingest(..) - it is unhappy that it doesn't recognize some of the arguments.

A simple fix would be to parse_known_args instead, which I'll do in a PR shortly - once I figure out how to make a PR on bitbucket.

Colors should be disabled when not supported

Running testing from withini Emacs produces escape characters which renders the output garbled.

Let CSV importer support tab-separated files

Would it be possible for the CSV importer to support tab-separated files as well? Currently the 'text/csv' MIME type seems hardcoded.

While I could monkey patch it with self.remap['mime'] = [re.compile('text/tab-separated-values')] in my subclass' __init__(), it'd be rather ugly.

Should we have a list of beangulp importers?

I have created a repository with three beangulp importers. Definitely not useful for everyone but maybe to a few. Should there be a list somewhere so that community efforts can be combined to write them?

Move the OFX importer to the examples folder?

It has been suggested before but I don't know how many depend on it being importable from beancount.ingest or beangulp now. Moving it to the examples would still make the code available and we could keep maintaining and testing the basic functionalities, while making it clear that it is not supposed to be a general and complete solution and that users are expected to copy the code in their importers.

ingest.cache._FileMemo should be public

Original report by Balázs Keresztury (Bitbucket: belidzs, GitHub: belidzs).

Implementing an importer.ImporterProtocol usually involves using method parameters with the type ingest.cache._FileMemo but this class is currently protected.

In my opinion this class should be public since it is used extensively outside of the beancount.ingest package and it also causes code inspection errors when static typing is employed to enhance IDE code inspection (and type safety of course).

#!python

    def identify(self, file: cache._FileMemo) -> bool:
            return True

Is the support for multiple importers matching the same document useful?

Currently more than one importers can positively identify the same document: all of them are run in sequence to extract entries, the first importer to match and that return a non-None account for the document is used to file the document.

Is this really desired?

I think this was coded this way to have different importers perform different operations on the same files (ie one importer does the extraction and another provides information for document filing). However, I think it takes a quite contrived example to find a case where this saves a significant amount of complexity in the importers. It is also confusing and error prone for the users: an error when more than one importer identifies the same document would be much better.

Create a section classifier for new transactions

Original report by Martin Blais (Bitbucket: blais, GitHub: blais).

There's nothing like that in the Beancount codebase. I've thought about building something to automatically insert imported transactions in the right "section" (I personally use org-mode, where each section corresponds to an institution and its related group of accounts) but it's unclear whether that would generalize.

I think you could turn this into a simple classification problem. Given some syntax for splitting up an input file into sections (e.g., some regular expression matching on a title or separator), you now have groups of transactions and inputs. Somehow reduce this to a simple model for classifying which section an incoming transaction matches with highest probability and insert it there. Or more appropriately - since transactions are imported in groups - find the section that best matches all the transactions in the imported files and insert at the end there.

On Sun, Mar 11, 2018 at 12:28 PM, Michael Droogleever [email protected] wrote:
I believe it is against the design of beancount, but is there any existing code which attempts to add transactions to an existing beancount file. Assuming the entries in the file are grouped by asset account, it would need to append the entry to the subsection of entries all from the same account.

Expose csv.Dialect in beancount.ingest.importers.csv.Importer

Original report by Johannes Harms (Bitbucket: johannesjh, GitHub: johannesjh).

I have been playing around with the CSV importer provided in beancount.ingest.importers.csv.Importer, trying to import CSV like the following example data mimicking by bank's csv format:

DE40100100100000012345;Paying the rent;04.09.2017;04.09.2017;-800,00;EUR
DE40100100100000012345;Transfering accumulated savings to other account;05.09.2017;05.09.2017;-2500,00;EUR
DE40100100100000012345;Payroll;04.09.2017;04.09.2017;2000,00;EUR

One specialty of the above CSV is that it uses semicolons instead of commas as column separators. My suggestion is: This could easily be configured by passing a csv.Dialect to the beancount csv importer. E.g., by adding an additional parameter to its __init__ method, as follows

class Importer(regexp.RegexpImporterMixin, importer.ImporterProtocol):

    def __init__(self, config, account, currency, regexps,
                 institution=None,
                 debug=False,
                 csv_dialect : Union[str, csv.Dialect] ='excel'):

I have tried it out already, which means I can contribute a merge request. (I'll push it in a few minutes).

how to extend precanned csv import extract method to add csv source to source metadata

Original report by Jeff Mondoux.

I have a custom importer to import my banks csv statements, this imported inherits from beancount csv importer in which I override the extract() method as such:

def extract(self, file, existing_entries=None):
    mapped_account = self.file_account(file)
    entries = super().extract(file, existing_entries)
    for entry in entries:
        entry.meta['__source__']='source'
    return entries

What I can’t figure out with my limited python abilities is how can I add the raw csv line to the __source__ metadata field so that it can be displayed by fava import gui. I want to avoid rolling my own csv importer entirely as the generic csv importer provided by beancount does what I need for the most part. I know I can reread the csv file a second time to append the data, but is this the best or only way?

Deduplication is not stable

Imagine a setup with:

documents/foo.csv
documents/bar.csv

And a static importer such as:

class StaticImporter(Importer):
  """No matter the file, identify it and yield the same transaction."""
  def identify(self, filepath):
    return True
  def account(self, filepath):
    return 'Dummy'
  def extract(self, filepath):
    return [DUP_TRANSACTION]

This situation is a simplified version of CSV reports with overlapping dates, something that happen if you're not carefully doing CSV exports at specific time boundaries. (e.g. export Jan-June.csv, then March-Sept.csv).

If you assume the following out.beancount:

;; -*- mode: beancount -*-

**** documents/foo.csv

DUP_TRANSACTION

**** documents/bar.csv

; DUP_TRANSACTION

(where the second transaction has been marked as a duplicate and is commented out), then running extract -o out.beancount -e out.beancount has the following output:

;; -*- mode: beancount -*-

**** documents/foo.csv

; DUP_TRANSACTION

**** documents/bar.csv

DUP_TRANSACTION

What was marked as a duplicate before becomes a non-duplicate, and vice-versa.

Ideally, this behavior should be stable so you can re-import data via idempotent behaviors.

Thanks!

Ignore .DS_Store files in importer regression tests

Original report by Johannes Harms (Bitbucket: johannesjh, GitHub: johannesjh).

this is a very minor change... I'll send a pull request.

csv importer crashes on Python 3.8

Original report by Toke Høiland-Jørgensen.

When trying to use the csv ingester with Python 3.8, I get a crash with a traceback ending in this:

‌

  File "/usr/lib/python3.8/site-packages/beancount/ingest/importers/csv.py", line 162, in __init__
    super().__init__(**kwds)
  File "/usr/lib/python3.8/site-packages/beancount/ingest/importers/mixins/identifier.py", line 65, in __init__
    super().__init__(**kwds)
  File "/usr/lib/python3.8/site-packages/beancount/ingest/importers/mixins/filing.py", line 30, in __init__
    super().__init__(**kwds)
TypeError: object.__init__() takes exactly one argument (the instance to initialize)

‌

It seems it is no longer possible to just blindly pass up args to __init__ like that.

I’ve locally patched the init method of FilingMixin to do this instead, which seems to work:

‌

        if isinstance(super(), importer.ImporterProtocol):
            super().__init__(**kwds)

‌

idea: Tool to manipulate Beancount ledgers

We already discussed a bean-insert command to insert transactions (or better directives) replacing a marker in an existing ledger. I think that a bean-sort command that sorts entries in different ways would also be useful (for example, it would be a solution for #10). Each tool is really just a few lines of code and I don't want to have an unbound proliferation of little command line utilities.

What about introducing a "Beancount Swiss army knife" command line tool that has commands for all these operations? Where should it live? In the beancount or in the beangulp repository? How should it be named? I like bean-slice but I don't think there will ever be a slicing command and bean-slice sort and bean-slice insert do not seem very natural. Better ideas?

How to override "HEADER" in bean-extract?

Original report by Zhuoyun Wei (Bitbucket: wzyboy, GitHub: wzyboy).

Hi,

in extract.py I could see this comment:

# The format for the header in the extracted output.
# You may override this value from your .import script.
HEADER = ';; -*- mode: beancount -*-\n'

However, defining HEADER in .import script does not seem to work. It seems that output.write(HEADER) in extract.py writes the hard-coded header directly to the output.

Encoding not detected in Cache.head

Original report by Anonymous.

The head method was not detecting the encoding.

#!diff
diff -r cdb1de0bfb8a src/python/beancount/ingest/cache.py
--- a/src/python/beancount/ingest/cache.py	Sat Jun 11 01:24:37 2016 -0400
+++ b/src/python/beancount/ingest/cache.py	Tue Jun 14 17:00:53 2016 +0200
@@ -84,8 +84,11 @@
       A converter function.
     """
     def head_reader(filename):
-        with open(filename) as file:
-            return file.read(num_bytes)
+        with open(filename, 'rb') as file:
+            rawdata = file.read(num_bytes)
+            detected = chardet.detect(rawdata)
+            encoding = detected['encoding']
+            return rawdata.decode(encoding)
     return head_reader

Additionally, correct a small typo:

#!diff
diff -r cdb1de0bfb8a src/python/beancount/ingest/identify.py
--- a/src/python/beancount/ingest/identify.py	Sat Jun 11 01:24:37 2016 -0400
+++ b/src/python/beancount/ingest/identify.py	Tue Jun 14 17:00:53 2016 +0200
@@ -15,7 +15,7 @@
 from beancount.ingest import cache
 
 
-# The format for the seciton titles in the extracted output.
+# The format for the section titles in the extracted output.
 # You may override this value from your .import script.
 SECTION = '**** {}'

Thank you!

csv.Importer fails for csv files with CR characters

Subject of the issue

When implementing a custom beancount.ingest.importers.csv.Importer, an error is thrown for csv files with CR newline characters.

Your environment

Python 3.8
beancount-2.3.1.dev0

Steps to reproduce

Take a CSV file that is known to be working with bean-extract and a custom beancount.ingest.importers.csv.Importer
Save the CSV file with CR newline characters instead of CRLF
Try to import the csv file with bean-extract, using the beancount.ingest.importers.csv.Importer

Expected behaviour

The CSV file should be imported as usual without any errors.

Actual behaviour

The following error is thrown:

ERROR:root:Importer importers.mybank.mybank_csv.MyBankCsvImporter: "Assets:MyBank:Checkings".extract() raised an unexpected error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
Traceback (most recent call last):
File "/usr/lib/python3.8/site-packages/beancount-2.3.1.dev0-py3.8-linux-x86_64.egg/beancount/ingest/extract.py", line 183, in extract
new_entries = extract_from_file(
File "/usr/lib/python3.8/site-packages/beancount-2.3.1.dev0-py3.8-linux-x86_64.egg/beancount/ingest/extract.py", line 67, in extract_from_file
new_entries = importer.extract(file, **kwargs)
File "/usr/lib/python3.8/site-packages/beancount-2.3.1.dev0-py3.8-linux-x86_64.egg/beancount/ingest/importers/csv.py", line 212, in extract
iconfig, has_header = normalize_config(
File "/usr/lib/python3.8/site-packages/beancount-2.3.1.dev0-py3.8-linux-x86_64.egg/beancount/ingest/importers/csv.py", line 389, in normalize_config
has_header = csv.Sniffer().has_header(head)
File "/usr/lib/python3.8/csv.py", line 395, in has_header
header = next(rdr) # assume first row is header
_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?

Workaround

Converting the csv file to have CRLF newline characters works as a workaround but since my financial institution exports the files with CR, this seems like something that could be handled by the existing csv.Importer.

Implement on-disk caching of conversions

Original report by Martin Blais (Bitbucket: blais, GitHub: blais).

Deduplicator doesn't support entries with price and inferred amount

I have this in my ledger:

2016-09-25 * "Hooli Vest Event"
  Assets:US:Schwab:HOOL               1 HOOL {786.9 USD}
  Income:Salary:Hooli:GSU     CHF @ 0.968892 USD
  Assets:US:Hooli:Unvested:C123456  -1 HOOL.UNVEST
  Expenses:Hooli:Vested              1 HOOL.UNVEST

Exception:

  Traceback (most recent call last):
    File "/home/erik/.local/lib/python3.7/site-packages/beangulp/__init__.py", line 86, in _extract
      entries = extract.extract_from_file(importer, filename, existing_entries)
    File "/home/erik/.local/lib/python3.7/site-packages/beangulp/extract.py", line 44, in extract_from_file
      entries = importer.deduplicate(entries, existing_entries)
    File "/home/erik/.local/lib/python3.7/site-packages/beangulp/importer.py", line 167, in deduplicate
      return extract.mark_duplicate_entries(entries, existing, window, self.cmp)
    File "/home/erik/.local/lib/python3.7/site-packages/beangulp/extract.py", line 125, in mark_duplicate_entries
      if compare(entry, target):
    File "/home/erik/.local/lib/python3.7/site-packages/beangulp/importer.py", line 145, in cmp
      return compare(a, b)
    File "/home/erik/.local/lib/python3.7/site-packages/beangulp/similar.py", line 101, in __call__
      amounts1 = self.cache[id(entry1)] = amounts_map(entry1)
    File "/home/erik/.local/lib/python3.7/site-packages/beangulp/similar.py", line 151, in amounts_map
      amounts[key] += posting.units.number
  TypeError: unsupported operand type(s) for +=: 'decimal.Decimal' and 'type'

Same-day transactions in incorrect order, CSV importer

I have two transactions in a CSV file with the same date (the CSV file is in descending order) and when I import it they are in the wrong order.

I looked at the CSV importer and extract() correctly sets is_ascending to False, reverses the transactions and returns the entries in the correct order.

So something else must mess up the order. Where should I look?

Test case

bean-extract config.py fidor.csv

Result:

2020-09-30 * "Aktivitätsbonus"
  Assets:Current:Fidor  5.00 EUR

2020-09-30 * "Kontofuehrung"
  Assets:Current:Fidor  -5.00 EUR

2020-10-19 * "Gutschrift; Absender: Martin Michlmayr"
  Assets:Current:Fidor  1.00 EUR

The first two transactions should be swapped.

fidor.csv

Datum;Beschreibung;Beschreibung2;Wert
19.10.2020;Gutschrift;Absender: Martin Michlmayr;1,00
30.09.2020;Aktivitätsbonus;;5,00
30.09.2020;Kontofuehrung;;-5,00

config.py

import os
import sys

sys.path.append(os.path.dirname(__file__))

import fidor

CONFIG = [
    fidor.FidorImporter('Assets:Current:Fidor', '^fidor\.'),
]

fidor.py

"""
Importer for Fidor
"""

import csv
import os
import re

from beancount.core.number import D
import beancount.ingest.importers
from beancount.ingest.importers.csv import Col


class FidorImporter(beancount.ingest.importers.csv.Importer):
    """
    Importer for Fidor
    """

    def __init__(self, account, file_pattern):
        class FidorDialect(csv.Dialect):
            delimiter = ";"
            quoting = csv.QUOTE_NONE
            escapechar = '\\'
            doublequote = False
            skipinitialspace = True
            lineterminator = '\r\n'

        self.file_pattern = file_pattern
        fidor_dialect = FidorDialect()
        super().__init__({
            Col.DATE: 'Datum',
            Col.NARRATION: 'Beschreibung',
            Col.NARRATION2: 'Beschreibung2',
            Col.AMOUNT: 'Wert',
        },
                         account,
                         'EUR', [
                             '^Datum;Beschreibung;Beschreibung2;Wert$',
                         ],
                         csv_dialect=fidor_dialect,
                         dateutil_kwds={'dayfirst': True})

    def identify(self, file):
        if file.mimetype() != "text/csv":
            return False

        if re.search(self.file_pattern, os.path.basename(file.name)):
            return True

        return False

    def parse_amount(self, string):
        """The method used to create Decimal instances. You can override this."""
        return D(string.replace(',', '.'))

same code base for fetching prices and importing transactions?

Original report by Johannes Harms (Bitbucket: johannesjh, GitHub: johannesjh).

Could / should we use the same code base for fetching prices and importing transactions? Using importers for importing downloaded prices could help reduce duplicate code. The only thing missing is to extend the importer base classes to support fetching (of not only prices, but also transactions).

@blais: I would be glad to hear your thoughts on this, including the previous design rationale to split price-fetching and transaction-importing into separate modules.
@seltzered: I am posting this as follow-up to beancount/beanprice#2 "Bean-price: support fetch over range of dates", because I did not want to take the other issue off-topic.

Motivation:
My personal observation is that importing prices is very similar to importing transactions.

fetching: While prices are usually fetched automatically, I found this is not always possible. In a similar way, transactions could be fetched automatically, but that's not always possible (or worth the effort)
Importing (identifying, filing, extracting): These steps are nearly identical for prices and transactions.

Example: I wrote this importer for yahoo prices:

Fetching: As mentioned above, automatic fetching is not always easy. In this case, I found it easier to manually fetch prices by scraping the HTML table using artoo.js.
Importing: The code snippet below illustrates that it makes perfectly sense to import prices using import functionality.

#!python

"""
Imports prices from CSV that was scraped from yahoo finance
"""
# pylint: disable=C0411,C0330


from _pydecimal import DecimalException
import csv
import logging
from typing import Dict, Iterable, NamedTuple

from beancount.core.amount import Amount
from beancount.core.data import Price, new_metadata, sorted as sorted_entries
from beancount.core.number import D
from beancount.ingest.cache import _FileMemo
from beancount.ingest.importer import ImporterProtocol
from beancount.ingest.importers.csv import Col
from beancount.ingest.importers.mixins.identifier import IdentifyMixin
from beancount.utils.date_utils import parse_date_liberally

logger = logging.getLogger(__name__)  # pylint: disable=C0103

Row = NamedTuple(
    "Row", [("file_name", str), ("line_number", int), ("data", Dict)]
)


class PricesImporter(ImporterProtocol):
    """Imports prices from CSV"""

    def __init__(self, **kwargs):  # pylint: disable=R0913
        """
        Initializes the importer.
        """
        # gets required arguments:
        self.columns = kwargs.pop("columns")
        self.commodity = kwargs.pop("commodity")
        self.currency = kwargs.pop("currency")

        # gets optional arguments:
        self.debug = kwargs.pop("debug", False)
        self.csv_dialect = kwargs.get("csv_dialect", None)
        self.dateutil_kwds = kwargs.get("dateutil_kwds", None)
        super().__init__(**kwargs)

    def extract(self, file: _FileMemo, existing_entries=None):
        """Extracts price entries from CSV file"""
        rows = self.read_lines(file.name)
        price_entries = sorted_entries(self.get_price_entries(rows))
        return price_entries

    def read_lines(self, file_name: str) -> Iterable[Row]:
        """Parses CSV lines into Row objects"""
        with open(file_name) as file:
            reader = csv.DictReader(file, dialect=self.csv_dialect)
            for row in reader:
                yield Row(file_name, reader.line_num, row)

    def get_price_entries(self, lines: Iterable[Row]) -> Iterable[Price]:
        """Converts Row objects to beancount Price objects"""
        for line in lines:
            try:
                self.validate_line(line)
                meta = self.build_metadata(line.file_name, line.line_number)
                date = self.parse_date(line.data[self.columns[Col.DATE]])
                amount = self.parse_amount(line.data[self.columns[Col.AMOUNT]])
                amount_with_currency = Amount(amount, self.currency)
                yield Price(  # pylint: disable=E1102
                    meta, date, self.commodity, amount_with_currency
                )
            except (ValueError, DecimalException, AssertionError) as exception:
                logger.warning(
                    "Skipped CSV line due to %s exception at %s line %d: %s",
                    exception.__class__.__name__,
                    line.file_name,
                    line.line_number,
                    line.data,
                )

    def validate_line(self, row):
        """Validates CSV rows. If invalid, an AssertionError is thrown."""
        data = row.data
        assert data[self.columns[Col.AMOUNT]]

    def build_metadata(self, file_name, line_number):
        """Constructs beancount metadata"""
        line_number = str(line_number)
        return new_metadata(
            file_name,
            line_number,
            {"source_file": file_name, "source_line": line_number}
            if self.debug
            else None,
        )

    def parse_date(self, date_str):
        """Parses the date string"""
        return parse_date_liberally(date_str, self.dateutil_kwds)

    def parse_amount(self, amount_str):  # pylint: disable=R0201
        """Parses an amount string to decimal"""
        return D(amount_str)


class YahooFinancePricesImporter(IdentifyMixin, PricesImporter):
    """
    Imports CSV scraped from finance.yahoo.com

    Usage:

    Scrape historical prices using artoo.js, for example from:
    https://finance.yahoo.com/quote/EXS2.DE/history?p=EXS2.DE

    artoo.scrapeTable('table[data-test="historical-prices"]', {
      headers: 'th',
      done: artoo.saveCsv
    })

    Then run this importer to convert the scraped csv file to beancount prices.
    """

    def __init__(self, **kwargs):
        kwargs.setdefault(
            "columns", {Col.DATE: "Date", Col.AMOUNT: "Adj Close**"}
        )
        self.matchers = [
            ("content", r"Date,Open,High,Low,Close\*,Adj Close.*")
        ]
        super().__init__(**kwargs)


class TecdaxImporter(YahooFinancePricesImporter):
    """
    Imports CSV scraped from:
    https://finance.yahoo.com/quote/EXS2.DE/history?p=EXS2.DE
    """

    def __init__(self, **kwargs):
        kwargs.setdefault("commodity", "TECDAX")
        kwargs.setdefault("currency", "EUR")
        super().__init__(**kwargs)

In my opinion, the above code illustrates that prices and transactions could use the same import process. I would therefore like to propose: Let's use importers for importing downloaded prices. And let's extend the importer base classes to support fetching of not only prices, but also transactions.

sort all entries when extracting multiple files

Original report by Yh G (Bitbucket: guoyh, GitHub: guoyh).

I found that when using bean-extract to extract multiple files, the output entries are divided by files and sorted within that file's entries. And then I checked the extract.py and confirmed it.
Is it a little disorderly if just copy all the unsorted entries into an beancount file? Or maybe it is intentionally designed like this because there seem to be no issues discuss it.
However, I'd appreciate all the entries from multiple files can be printed in time sequence, so is it possible to add one choice into bean-extract?

Pass importer results to next importer in bean-extract

Original report by Christoph Sarnowski (Bitbucket: csarn, GitHub: csarn).

I'm writing a couple of importers for personal use, and I am missing this feature.

Rationale:
bean-extract runs all found (and supported) documents through their importer's extract method in one call.
It also has a mechanism to flag duplicate transactions, but only if an existing beancount file is given.
Duplicate transactions happen close to each other in time, so it will be very common that both parts of a duplication will be imported at the same time. Imagine I transfer money from one bank account to another, and I download the CSVs from both banks. Now bean-extract will find both sides of this transaction, but it can't detect them as duplicates.

So I would like to see bean-extract run the importer for one file, append the transactions to the list of existing entries, and then pass this updated list to the next importer run.

In case that this behavior is not universally useful, I'd suggest to add a command line switch to bean-extract to enable this.

Any comments or suggestions? Would you accept this feature?

ingest/importers/fileonly.py test_match failing

Original report by droogmic (Bitbucket: Michael Droogleever, GitHub: droogmic).

This is linked to beancount/beancount#211.
Believed to be a system specific issue.

#!python

================================================================================= FAILURES ==================================================================================
__________________________________________________________________________ TestFileOnly.test_match __________________________________________________________________________

self = <beancount.ingest.importers.fileonly_test.TestFileOnly testMethod=test_match>, filename = '/tmp/tmp3aqvnoxf'

    @unittest.skipIf(not file_type.magic, 'python-magic is not installed')
    @test_utils.docfile
    def test_match(self, filename):
        """\
            DATE,TYPE,REF #,DESCRIPTION,FEES,AMOUNT,BALANCE
            2014-04-14,BUY,14167001,BOUGHT +CSKO 50 @98.35,7.95,-4925.45,25674.63
            2014-05-08,BUY,12040838,BOUGHT +HOOL 121 @79.11,7.95,-9580.26,16094.37
            """
        importer = fileonly.Importer(
            ['Filename: .*te?mp.*',
             'MimeType: text/plain',
             'Contents:\n.*DATE,TYPE,REF #,DESCRIPTION,FEES,AMOUNT'],
            'Assets:BofA:Checking',
            basename='bofa')
        file = cache._FileMemo(filename)
>       self.assertTrue(importer.identify(file))
E       AssertionError: False is not true

fileonly_test.py:35: AssertionError
--------------------------------------------------------------------------- Captured stdout call ----------------------------------------------------------------------------
text/x-Algol68

Separate business logic from command-line controller

Summary

The current implementation mixes the business logic of identify, extract, and archive operations with the command-line controller, which makes the code difficult to reuse in different contexts. I would like to propose separating the business logic from the command-line controller.

What is the issue?

The user is unable to control how they interact with the beangulp importer framework on the command line.

For example, I was trying to call the extract and archive commands from a separate click command that jointly carried out these two operations on a specific importer. I ran into some difficulties using Context.invoke() and came across this Stack Overflow post: Call another click command from a click command. The answer posted by @shevron states:

I would suggest modifying your implementation so that you keep the original functions undecorated and create thin click-specific wrappers for them ... This might seem redundant but in fact is probably the right way to do it: one function represents your business logic, the other (the click command) is a "controller" exposing this logic via command line.

Separating the business logic from the command-line controller in the beangulp framework seems sensible. By decoupling the business logic from the command-line interface, users can reuse the importer framework in different contexts without being tied to the specific command-line interface.

Proposal

Refactor the code in the __init__.py file to extract the business logic of identify, extract, and archive operations into separate functions. This separation will allow users to interact with the importer framework in different ways and give users stronger control over how they integrate the importer framework into their own applications.

Extract business logic from the _identify function into a separate function
Extract business logic from the _extract function into a separate function
Extract business logic from the _archive function into a separate function

in beangulp/examples/importers/csvbank.py column date conflicts with method date()

It creates a Column named date, but it's already inherited a method named date() from csvbase.Importer. So when CSVMeta.new() creates the column dictionary, it sees date as a method and leaves it out. When it comes time to extract there's no column date in the row and it raises an exception.

As a side note, this scheme of using introspection is pretty hard to trace through. Perhaps a more straightforward scheme, like creating a dictionary to map a column name to a function that extracts a value from the raw row tuple might be better. It'd certainly be easier to grok.

HEADER line is printed after SECTION line(s) causing loss of desired major mode in Emacs

Original report by Milind Kamble (Bitbucket: mbkamble, GitHub: mbkamble).

In extract.py, the HEADER line is printed just before printing new entries. However, identify.py can inject SECTION lines into the output. As a result, the HEADER line whose purpose is to set the file major mode for emacs, is not the first non-blank line. So the mode setting is not honored when Emacs opens the file.

Standard OFX importer

I am migrating from beancount 2 to beangulp, and I see that there is no standard OFX importer (there is one in the examples but there is not one in the library).

If I implemented it based on the one in the examples and added tests, would such PR be approved (or was there a reason to omit it in the new library)? If it's ok, do you have any preferences for the libraries I should use or what features it should include?

[PATCH] bean-extract does not report nonzero exit code if an importer fails

Original report by Mikhail Gusarov (GitHub: misha-ridge).

If any of importers fail, bean-extract still exits with 0 exit code, which is not very convenient for scripting.

Attached patch (my hg-fu is too weak to make a pull request) fixes this issue.

csvbase calculates balance wrong with multiple sameday transactions

The new csvbase importer has the option to automatically insert a balance assertion based on a balance column in the input data.

Right now it adds a balance assertion one day after the last transaction, but if there are multiple transactions on that day it incorrectly picks the wrong transaction to take the balance from.

With this input (let’s call it test.csv)

date,payee,narration,amount,balance
01-01-2022,Shop,This is just an expense,-24.85,124.85
01-01-2022,Shop,Some other expense,-15.00,109.85
01-01-2022,Employer,Finally got my paycheck,450.00,409.85

And this configuration (config.py)

import beangulp
from beangulp.importers import csvbase

class Importer(csvbase.Importer):
    date = csvbase.Date('date', '%d-%m-%Y')
    payee = csvbase.Column('payee')
    narration = csvbase.Column('narration')
    amount = csvbase.Amount('amount')
    balance = csvbase.Amount('balance')

    def __init__(self, account, currency, flag='*'):
        super().__init__(account, currency, flag)
    
    def identify(self, filepath: str) -> bool:
        return True

CONFIG = [
    Importer(account='Assets:Current:SNS', currency='EUR'),
]

if __name__ == '__main__':
    ingest = beangulp.Ingest(CONFIG)
    ingest()

The result of executing python3 config.py extract test.csv is:

;; -*- mode: beancount -*-

**** /Users/floris/Documents/Boekhouding/beancount/csv-balance/test.csv

2022-01-01 * "Shop" "This is just an expense"
  Assets:Current:SNS  -24.85 EUR

2022-01-01 * "Shop" "Some other expense"
  Assets:Current:SNS  -15.00 EUR

2022-01-01 * "Employer" "Finally got my paycheck"
  Assets:Current:SNS  450.00 EUR

2022-01-02 balance Assets:Current:SNS                              124.85 EUR

The last line is incorrect and should read

2022-01-02 balance Assets:Current:SNS                              409.85 EUR

(There is another issue, my bank actually adds the balance before the current transaction in the csv, but I can work around that using a custom column definition that combines the amount and the balance columns to come up with the balance after. For the bug here this is not relevant)

Make importers easier to debug

Original report by Martin Blais (Bitbucket: blais, GitHub: blais).

When an importer doesn't identify, it can be a little puzzling to some users.
We should provided more debugging info, perhaps list the regexps, find out the right amount of "noise" to provide (currently too little noise).

What is the right way to use cvsbase.Importer along with IdentifyMixin and FilingMixin?

Stupid question from a python newbie. I am trying to write something like this:

class MyImporter(IdentifyMixin, FilingMixin, csvbase.Importer):
  def __init__(account, currency, **kwargs):
     super().init(account = account, currency = currency, **kwargs)

I noticed that the extract() method is returning None which makes me think the inheritance from csvbase isn't really working. Looking into the constructors of each class I found something interesting.

csvbase.Importer takes positional arguments but no **kwargs.
both mixins take **kwargs but no positional arguments (*args).

I have tried other combinations unsuccessfully.

Option to manually set encoding for file cache

Original report by Chenxing Luo (Bitbucket: chazeon, GitHub: chazeon).

In an example UTF-8 CSV which contains Chinese character and emoji (a typical Venmo statement), chardet is not correctly detect charset (Recognized as Windows-1252, Turkish). And it is difficult to set charset manually. It would be nice to allow manual setting of charset, which is normally known by the user.

Example as attached.

Make duplicate detection configureable

Original report by Jakob Schnitzer (Bitbucket: yagebu, GitHub: yagebu).

I'm using Beancount's import mechanism to import the transaction of my main accounts (that are provided as CSV files by my bank). This works quite well in general and is really a step up from doing it all by hand.

Most times duplicates aren't a problem but sometimes I already typed out some transactions by hand that I'm about to import. Since my importer doesn't automatically assign account (which I do by hand on Fava's import page), these don't get recognized as duplicates as the check for duplicates seems to be quite strict.

In my case, the "perfect" duplicate check would only check the amount posted to the checking account that I'm importing the transactions for. So it would be nice if the duplicate check could be configured (and if a "check only amount posted to a single account" would be shipped with Beancount).

I could implement this, if you agree that it would be a useful addition. I would probably add a duplicate_check(entry1, entry2) to importer.ImporterProtocol.

Is it safe to modify ledger entries in place?

Entries are named tuples and are thus immutable, but they contain mutable objects: lists and dictionaries. Is it advisable to modify these in place? The current deduplication code, for example, does this to add a __duplicate__ metadata field:

        mod_entries = []
        for entry in new_entries:
            if id(entry) in duplicate_set:
                marked_meta = entry.meta.copy()
                marked_meta[DUPLICATE_META] = True
                entry = entry._replace(meta=marked_meta)
            mod_entries.append(entry)

with requires copying the entries, recreating the entries list, and iterating the entries list one extra times.

The meta dictionary is however mutable, thus this could be written:

        for entry in new_entries:
            if id(entry) in duplicate_set:
                entry.meta[DUPLICATE_META] = True

and some more refactoring could get rid of the extra iteration.

This, of course, breaks if someone has been reusing the same dictionary for the metadata of more entries. Should this be considered an error (it is possible to check the refcount number of the dictionary to detect this) or should in-place modification of entries be avoided?

Add comments to exported transactions

I couldn't find any documentation suggesting that this is possible, hence creating this ticket. Not everything about a transaction can/should be encapsulated in metadata/tags/links - I would want to generate a comment containing some of the details about the transaction.

Directory validation code

One of the functionalities I built a long time ago and let rot a bit is a script that would cross check the documents directory hierarchy against the chart of accounts implied by the ledger for validity.

In practice, the chart of accounts evolved over time, and this may include renamings, and this means that the directory hierarchy also needs to be updated. You almost always forget to do that in practice. I have a script somewhere the cross-checks them against each other for that specific reason.

This script, or a new, improved version of it, belong in beangulp.

Support Balance entries in find_similar_entries

Original report by Patrick Ruckstuhl (GitHub: tarioch).

Currently, the detection logic for similar entries in the ingest framework only detects similar transactions. It would be useful if that would also work for balance entries.

Automatic insertion

To speed up my weekly import I am planning to implement the automatic insertion of extracted transactions to my main ledger.

I would be happy to add it as feature to beangulp, but I would first like to discuss if what I am trying to do is right.

I was thinking about following design:

create new command, e.g. insert
it would get two arguments, the directory where imported files are located (the same way as exteact command) and the main ledger file
do almost the same thing as extract command but insert the transactions in the main ledger file
the main difference to extract would be to change the per file header to include the file name that file command generates instead of absolute path to the file
another argument would be format for a line before which we should add imported transactions (e.g. ; IMPORT: {account}).

Another option would be to somehow include the account name and filing_filename in the output from extract, and then have insert command be responsible for just merging those two files.

Mimetype for QFX files has changed

Original report by Ethan Glasser-Camp (Bitbucket: glasserc, GitHub: glasserc).

Oops, sorry, hit Enter too soon on this one..

The built-in QFX importer, on Fedora 28, no longer recognizes my QFX files. I believe this is because the mimetype reported for these files is now application/vnd.intu.qfx instead of application/x-ofx.

Unable to understand how to use beangulp to import transactions

With the overarching goal of moving my workflow from beancount v2 to v3, I have managed to build the beacount master branch using bazel. Fom what I have followed so far, beangulp is the replacemet for beancount.ingest . But I am lost in figuring out how to use beangulp. There is no top level app such as beangulp.py. The older candidate, ie bean-extract (and associated tool bean-identify, bean-file) appear to have been obsoleted.
The beangulp document on google-doc did not help to figure out how to run beangulp.
I would appreciate if anyone cpuld provide a short para worth of details on how this is to be setup. I am not intending to appear as a harsh crtitic and understand that this is bleeding edge, documentation may not be up to date, and developers have other primary jobs etc. I am just looking for a decent description to jump start transitioning my v2-era importers to v3, and at the moment am feeliing lost

Allow importers to specify formatting of postings

Original report by Martin Michlmayr (Bitbucket: tbm13, GitHub: tbm).

By default, bean-extract generates transactions like this:

2019-12-07 * "Anonymous" "Donation"
  Assets:Receivable   100.00 USD
  Income:Donations   -100.00 USD

I know I can just pipe it into bean-format but I’m wondering if it would be possible to allow importers to format the postings, including spacing and amounts? (I know I can do amounts by just rounding to the precision I want).

In any ideal world, this would be per-posting? I like to align different postings with different spacing depending on their nature.

Good idea? Bad idea? Maybe I should just pipe the output to a script that formats it (that’s actually what I do now)

Importers should be able to output postings with @@

Original report by Kamal Marhubi (Bitbucket: kamalmarhubi, GitHub: kamalmarhubi).

When importing foreign currency transactions from my credit card, I am extracting the foreign currency price from the memo, eg 137.00 USD @@ 156.22 CAD. I'm trying to automate this with an importer.

As near as I can tell, this is currently impossible. The parser discards this info, and it is not explicit anywhere in the internals. If I compute the price in the importer, it spits out postings with a really ugly precision in the price.

Ideally, I could import them as I have the data available in the source.

CSV importer can't support non-ascii character

Original report by Li Dongchao (Bitbucket: DongchaoLee).

When using CSV importer to extract csv file including non-ascii character such as chinese words, it may raise UnicodeDecodeError at beancount/ingest/cache.py, line 92, in head_reader:

#!python
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 8190-8191: unexpected end of data

In my situation, three bytes in position 8190-8192 constitute a character, so this exception happens.

What was naming issue with beangulp.importers.csv?

I don't find what was the issue that prompted the rename of beangulp.importers.csv to beangulp.importers.csv_importer. The latter is a mouthful and a tautology. I think it was something related to Bazel, but I don't find the details. I would like to rename the importer back the original name before releasing beangulp, unless there are compelling reasons.

Add mechanism to allow large files for ingest

Original report by Martin Michlmayr (Bitbucket: tbm13, GitHub: tbm).

I have a 32 MB CSV file (it contain all invalid transactions that have been attempted in addition to the valid ones that actually went through). bean-extract doesn’t like this because:

# A file size beyond which we will simply ignore the file. This is used to skip
# large files that are commonly co-present in a Downloads directory.
FILE_TOO_LARGE_THRESHOLD = 810241024

Would it be possible to add an option for individual importers to override the size limit (my importer copes with the 32 MB file just fine) or does this have side effects (caching, etc)?

Deduplication across several input files

Imagine a setup with:

documents/foo.csv
documents/bar.csv

And a static importer such as:

class StaticImporter(Importer):
  """No matter the file, identify it and yield the same transaction."""
  def identify(self, filepath):
    return True
  def account(self, filepath):
    return 'Dummy'
  def extract(self, filepath):
    return [DUP_TRANSACTION]

Running beangulp.Ingest (extract -o out.beancount) will create the following out.beancount with duplicates:

;; -*- mode: beancount -*-

**** documents/foo.csv

DUP_TRANSACTION

**** documents/bar.csv

DUP_TRANSACTION

I believe this happens because the _extract code only compares dups against already existing entries, but not against new files returned from walk().

I think it could instead be accumulating new entries when calling extract.extract_from_file(importer, filename, existing_entries), and instead call extract.extract_from_file(importer, filename, accumulated_existing_entries), so that duplicates across files are detected.

Thanks!

Can the Bazel BUILD files be dropped?

It seems that they are not doing much.

Semantics of guess_file_type()

Currenltly, guess_file_type() has some strange properties:

it may raise an exception if python-magic is not installed (I think the idea was to raise an exception instructing the user to install the package, but a missing raise makes the code fail with a generic exception),
it does some manual filename extension matching, instead than injection new mime types mappings into the mimetypes module,
it determined the file type from the file extension through the mimetypes module, except when ti doesn't and uses the file content via the magic module, without the user having a way to determine which method will be used to provide an answer.

In the effort toward simplification, I would like to revisit this interface.

The easiest thing, would be to make python-magic a required dependency and always use the content based detection. However, it is a trivial dependency only on most Linux distributions where libmagic is part of a basic installation. On Windows there isn't really an easy way to install the libmagic dependency and on macOS it requires installing it via some mean other than pip.

The next best thing could be to do not fall back to content based detection. But in this case the function becomes a trivial wrapper around mimetypes. I therefore suggest to add a beangulp.mimetypes module that does something like

import mimetypes
mimetypes.add_type(...)

and deprecate the beangulp.file_type module in favor of the standard library mimetypes module. I think that content based mimetype detection should be an explicint opt-in, and the magic interface is easy to use.

Alternatively, guess_file_type() should always use content based detection, unless it is not available, and fall back to filename extension based detection in this case.

beancount / beangulp Goto Github PK

beangulp's Introduction

beancount: Double-Entry Accounting from Text Files

beangulp's People

Contributors

Stargazers

Watchers

Forkers

beangulp's Issues

Subject of the issue

Your environment

Steps to reproduce

Expected behaviour

Actual behaviour

Workaround

Test case

fidor.csv

config.py

fidor.py

Summary

What is the issue?

Proposal

Recommend Projects

Recommend Topics

Recommend Org

Jobs