GithubHelp home page GithubHelp logo

Comments (13)

johannesjh avatar johannesjh commented on June 19, 2024 1

hi,

I think that what you want to achieve is to apply the @PredictPostings() decorators to your existing importers, in order to enhance them with machine learning. The easiest way to do this is to apply the decorators straight at your existing importer classes, which in your case are defined in the __init.py__ files.

For example, /chase/__init__.py/ before applying the decorator:

class Importer(importer.ImporterProtocol):
  # ...

...and after applying the decorator:

@PredictPostings()
class Importer(importer.ImporterProtocol):
  # ...

That's it.

Note: You may prefer alternative methods of applying the decorators to be able to unittest undecorated importer classes. But the simple solution above is sufficient to get you started.

from smart_importer.

johannesjh avatar johannesjh commented on June 19, 2024 1

...for your import configuration, this roughly translates to:

chase_importer = chase.Importer(...)
paypal_importer = paypal.Importer(...)

CONFIG = [
    apply_hooks(chase_importer, [PredictPostings(), PredictPayees()]),
    apply_hooks(paypal_importer, [PredictPostings(), PredictPayees()])
]

from smart_importer.

gety9 avatar gety9 commented on June 19, 2024

@johannesjh

thank you for your reply.

Do i need to use any additional commands? (to train the model)

Now i am using bean-extract -e at.beancount at.import ../Downloads/ > temp.beancount
Or this command is enough and it will use at.beancount as data for training ?

from smart_importer.

johannesjh avatar johannesjh commented on June 19, 2024

No additional commands needed. Data from at.beancount are used for training.

Technically, the decorator wraps the importer's extract(self, file, existing_entries=None) method. When the extract method is invoked through bean-extract or fava, the importer grabs the existing_entries and uses them as training data.

from smart_importer.

johannesjh avatar johannesjh commented on June 19, 2024

did it work? can we close this issue?

from smart_importer.

gety9 avatar gety9 commented on June 19, 2024

@johannesjh

i've made it work, surprisingly for some import files it works perfect, but for some not at all, below i will provide examples.

I am still getting warning, not sure if it's important one:
bean-extract -e at.beancount at.import ../Downloads/ > tmp181116.beancount

2 [main] python3.6m 15556 child_info_fork::abort: address space needed by '_superlu.cpython-36m-x86_64-cygwin.dll' (0x400000) is already occupied
/usr/lib/python3.6/site-packages/sklearn/externals/joblib/_multiprocessing_helpers.py:38: UserWarning: [Errno 11] Resource temporarily unavailable.  joblib will operate in serial mode
  warnings.warn('%s.  joblib will operate in serial mode' % (e,))
/usr/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp

I have 5 chase import files and 2 paypal import files https://puu.sh/C31l7/851b3a0da3.png
For some import files like this Paypal (DownloadAT) https://puu.sh/C31fH/1edb32c280.png i am getting perfect results.
For some like this Paypal (DownloadIS) i am not getting results https://puu.sh/C31ax/bbb87441a1.png, for some reason it puts 3 accounts in each transaction.
Not sure what is the reason for this.

So basically now in my case pretty much all predictions in some import files are right, and in some they are wrong. And the "wrongness" is that model puts more accounts in transactions, here is example pretty much all transaction predictions for this import file have 4 accounts in it https://puu.sh/C32nU/7e18db510f.png

from smart_importer.

gety9 avatar gety9 commented on June 19, 2024

Seems like i found the pattern, the first file importer proceeds has very accurate predictions.
But for all next import files within this importer predictions are incorrect.
For example let say we have 5 chase import files:
ChaseXXX1_Activity_20181115.CSV
ChaseXXX2_Activity_20181115.CSV
ChaseXXX3_Activity_20181115.CSV
ChaseXXX4_Activity_20181115.CSV
ChaseXXX5_Activity_20181115.CSV

than predictions for ChaseXXX1_Activity_20181115.CSV will be correct, and for all other ones incorrect, including (ChaseXXX2_Activity_20181115.CSV)

but if i delete the ChaseXXX1_Activity_20181115.CSV, and now we have 4 files
ChaseXXX2_Activity_20181115.CSV
ChaseXXX3_Activity_20181115.CSV
ChaseXXX4_Activity_20181115.CSV
ChaseXXX5_Activity_20181115.CSV

than predictions for ChaseXXX2_Activity_20181115.CSV will be correct, and for others incorrect.

If we use several importers and have following import files:
ChaseXXX1_Activity_20181115.CSV
ChaseXXX2_Activity_20181115.CSV
ChaseXXX3_Activity_20181115.CSV
ChaseXXX4_Activity_20181115.CSV
ChaseXXX5_Activity_20181115.CSV
PaypalAT.CSV
PaypalIS.CSV

than predictions for ChaseXXX1_Activity_20181115.CSV and PaypalAT.CSV will be correct, for all others incorrect.

Could you suggest what's the problem and how could it be solved it?

As per your suggestions i've applied the smart importers like this

from smart_importer.

johannesjh avatar johannesjh commented on June 19, 2024

Hm, difficult to say.

  • What do you mean by saying the predictions are incorrect? In which way are they incorrect?
  • Are there any differences between the CSV files, regarding their content (e.g., are they for the same account or for different accounts), regarding what training data should be used, and regarding your expectation about correct vs. incorrect predictions?
  • How do you start the import, e.g., through beancount's commandline api or through fava? When you start the import, do you tell the program to import several files at once?

I can, for now, only guess, but one idea for an explanation is this: Is it possible that your program (beancount or fava) re-uses importer instances when it is told to import several files? Such re-use would make perfect sense for regular importers, but smart importers could end up using false training data.

EDIT, Note:
I have always imported just one file with each importer, and I never experienced such problems.

from smart_importer.

gety9 avatar gety9 commented on June 19, 2024

1 "What do you mean by saying the predictions are incorrect? In which way are they incorrect?"

It's completely off. Here is example:

If correct transaction is

2017-10-20 * "GODADDY.COM" "Order: #38070"
  Assets:Paypal:IS                                              -19.95 USD
  Expenses:Business:IS:Hosting

than prediction can be

2017-10-20 * "GODADDY.COM" "Order: #38070"
  Assets:Paypal:AT                                              
  Expenses:Business:AT:Advertisement
  Assets:Paypal:IS                                              -19.95 USD

2 "Are there any differences between the CSV files, regarding their content (e.g., are they for the same account or for different accounts), regarding what training data should be used, and regarding your expectation about correct vs. incorrect predictions?"

files are pretty much the same, it's order they go in downloads folder that matters.
If i have file that proceeds (get predicted) incorrectly, once i rename it so it goes first or import only that file than it proceeds (get predicted) correctly.

3 "How do you start the import, e.g., through beancount's commandline api or through fava? When you start the import, do you tell the program to import several files at once?"

I am using command line, example:
bean-extract -e at.beancount at.import ../Downloads/ > tmp181126.beancount

"do you tell the program to import several files at once?"
yes, exporting several at once (usually 7 files), at.beancount looks like this

4 "I can, for now, only guess, but one idea for an explanation is this: Is it possible that your program (beancount or fava) re-uses importer instances when it is told to import several files? Such re-use would make perfect sense for regular importers, but smart importers could end up using false training data."

that's what i think too. It works correctly when i place 1 file in downloads folder, i just wanted to make it work with all 7 files, but it's ok. Not a big deal, i will just import them 1 at a time.

from smart_importer.

johannesjh avatar johannesjh commented on June 19, 2024

Thank you for sharing this information. I think we now have sufficiently narrowed down the problem: False predictions when importing several files at once.

Next steps: This will need some debugging to confirm the suspicion that importer instances are cached and re-used, which leads to false training data being used for the predictions.

from smart_importer.

gety9 avatar gety9 commented on June 19, 2024

@johannesjh hi,

So it should work correctly using hooks? Could you please explain how to apply them (hooks) to my sample file described in OP #77 (comment) ?

from smart_importer.

yagebu avatar yagebu commented on June 19, 2024

@gety9: Instead of applying the decorators to the importer classes, you should apply the hooks to importer instances as outlined in the README

from smart_importer.

gety9 avatar gety9 commented on June 19, 2024

@johannesjh @yagebu thank you guys!

from smart_importer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.