Comments (13)
hi,
I think that what you want to achieve is to apply the @PredictPostings()
decorators to your existing importers, in order to enhance them with machine learning. The easiest way to do this is to apply the decorators straight at your existing importer classes, which in your case are defined in the __init.py__
files.
For example, /chase/__init__.py/
before applying the decorator:
class Importer(importer.ImporterProtocol):
# ...
...and after applying the decorator:
@PredictPostings()
class Importer(importer.ImporterProtocol):
# ...
That's it.
Note: You may prefer alternative methods of applying the decorators to be able to unittest undecorated importer classes. But the simple solution above is sufficient to get you started.
from smart_importer.
...for your import configuration, this roughly translates to:
chase_importer = chase.Importer(...)
paypal_importer = paypal.Importer(...)
CONFIG = [
apply_hooks(chase_importer, [PredictPostings(), PredictPayees()]),
apply_hooks(paypal_importer, [PredictPostings(), PredictPayees()])
]
from smart_importer.
thank you for your reply.
Do i need to use any additional commands? (to train the model)
Now i am using bean-extract -e at.beancount at.import ../Downloads/ > temp.beancount
Or this command is enough and it will use at.beancount
as data for training ?
from smart_importer.
No additional commands needed. Data from at.beancount
are used for training.
Technically, the decorator wraps the importer's extract(self, file, existing_entries=None)
method. When the extract method is invoked through bean-extract or fava, the importer grabs the existing_entries
and uses them as training data.
from smart_importer.
did it work? can we close this issue?
from smart_importer.
i've made it work, surprisingly for some import files it works perfect, but for some not at all, below i will provide examples.
I am still getting warning, not sure if it's important one:
bean-extract -e at.beancount at.import ../Downloads/ > tmp181116.beancount
2 [main] python3.6m 15556 child_info_fork::abort: address space needed by '_superlu.cpython-36m-x86_64-cygwin.dll' (0x400000) is already occupied
/usr/lib/python3.6/site-packages/sklearn/externals/joblib/_multiprocessing_helpers.py:38: UserWarning: [Errno 11] Resource temporarily unavailable. joblib will operate in serial mode
warnings.warn('%s. joblib will operate in serial mode' % (e,))
/usr/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
I have 5 chase import files and 2 paypal import files https://puu.sh/C31l7/851b3a0da3.png
For some import files like this Paypal (DownloadAT) https://puu.sh/C31fH/1edb32c280.png i am getting perfect results.
For some like this Paypal (DownloadIS) i am not getting results https://puu.sh/C31ax/bbb87441a1.png, for some reason it puts 3 accounts in each transaction.
Not sure what is the reason for this.
So basically now in my case pretty much all predictions in some import files are right, and in some they are wrong. And the "wrongness" is that model puts more accounts in transactions, here is example pretty much all transaction predictions for this import file have 4 accounts in it https://puu.sh/C32nU/7e18db510f.png
from smart_importer.
Seems like i found the pattern, the first file importer proceeds has very accurate predictions.
But for all next import files within this importer predictions are incorrect.
For example let say we have 5 chase import files:
ChaseXXX1_Activity_20181115.CSV
ChaseXXX2_Activity_20181115.CSV
ChaseXXX3_Activity_20181115.CSV
ChaseXXX4_Activity_20181115.CSV
ChaseXXX5_Activity_20181115.CSV
than predictions for ChaseXXX1_Activity_20181115.CSV will be correct, and for all other ones incorrect, including (ChaseXXX2_Activity_20181115.CSV)
but if i delete the ChaseXXX1_Activity_20181115.CSV, and now we have 4 files
ChaseXXX2_Activity_20181115.CSV
ChaseXXX3_Activity_20181115.CSV
ChaseXXX4_Activity_20181115.CSV
ChaseXXX5_Activity_20181115.CSV
than predictions for ChaseXXX2_Activity_20181115.CSV will be correct, and for others incorrect.
If we use several importers and have following import files:
ChaseXXX1_Activity_20181115.CSV
ChaseXXX2_Activity_20181115.CSV
ChaseXXX3_Activity_20181115.CSV
ChaseXXX4_Activity_20181115.CSV
ChaseXXX5_Activity_20181115.CSV
PaypalAT.CSV
PaypalIS.CSV
than predictions for ChaseXXX1_Activity_20181115.CSV and PaypalAT.CSV will be correct, for all others incorrect.
Could you suggest what's the problem and how could it be solved it?
As per your suggestions i've applied the smart importers like this
from smart_importer.
Hm, difficult to say.
- What do you mean by saying the predictions are incorrect? In which way are they incorrect?
- Are there any differences between the CSV files, regarding their content (e.g., are they for the same account or for different accounts), regarding what training data should be used, and regarding your expectation about correct vs. incorrect predictions?
- How do you start the import, e.g., through beancount's commandline api or through fava? When you start the import, do you tell the program to import several files at once?
I can, for now, only guess, but one idea for an explanation is this: Is it possible that your program (beancount or fava) re-uses importer instances when it is told to import several files? Such re-use would make perfect sense for regular importers, but smart importers could end up using false training data.
EDIT, Note:
I have always imported just one file with each importer, and I never experienced such problems.
from smart_importer.
1 "What do you mean by saying the predictions are incorrect? In which way are they incorrect?"
It's completely off. Here is example:
If correct transaction is
2017-10-20 * "GODADDY.COM" "Order: #38070"
Assets:Paypal:IS -19.95 USD
Expenses:Business:IS:Hosting
than prediction can be
2017-10-20 * "GODADDY.COM" "Order: #38070"
Assets:Paypal:AT
Expenses:Business:AT:Advertisement
Assets:Paypal:IS -19.95 USD
2 "Are there any differences between the CSV files, regarding their content (e.g., are they for the same account or for different accounts), regarding what training data should be used, and regarding your expectation about correct vs. incorrect predictions?"
files are pretty much the same, it's order they go in downloads folder that matters.
If i have file that proceeds (get predicted) incorrectly, once i rename it so it goes first or import only that file than it proceeds (get predicted) correctly.
3 "How do you start the import, e.g., through beancount's commandline api or through fava? When you start the import, do you tell the program to import several files at once?"
I am using command line, example:
bean-extract -e at.beancount at.import ../Downloads/ > tmp181126.beancount
"do you tell the program to import several files at once?"
yes, exporting several at once (usually 7 files), at.beancount looks like this
4 "I can, for now, only guess, but one idea for an explanation is this: Is it possible that your program (beancount or fava) re-uses importer instances when it is told to import several files? Such re-use would make perfect sense for regular importers, but smart importers could end up using false training data."
that's what i think too. It works correctly when i place 1 file in downloads folder, i just wanted to make it work with all 7 files, but it's ok. Not a big deal, i will just import them 1 at a time.
from smart_importer.
Thank you for sharing this information. I think we now have sufficiently narrowed down the problem: False predictions when importing several files at once.
Next steps: This will need some debugging to confirm the suspicion that importer instances are cached and re-used, which leads to false training data being used for the predictions.
from smart_importer.
@johannesjh hi,
So it should work correctly using hooks? Could you please explain how to apply them (hooks) to my sample file described in OP #77 (comment) ?
from smart_importer.
@gety9: Instead of applying the decorators to the importer classes, you should apply the hooks to importer instances as outlined in the README
from smart_importer.
@johannesjh @yagebu thank you guys!
from smart_importer.
Related Issues (20)
- pylint fails, tox depends on wrong version of beancount? HOT 3
- Expose ImporterHook interface HOT 4
- Allow custom getters for attributes HOT 2
- Predict postings with amounts HOT 3
- Predict metadata (and tags) HOT 3
- Package working with python 3.6.12 but not with python 3.9 HOT 26
- ImportError undefined symbol _PyGen_Send HOT 2
- Unable to use smart_importer with beangulp Importers HOT 6
- release a v0.4 to pypi HOT 2
- How to pass in existing entries as training data other than the current bean file HOT 10
- Using `file_account()` to filter training data doesn't work for certain use cases HOT 4
- TAGS HOT 2
- pylint error, cannot import beancount.core.data HOT 1
- test failure due to invalid syntax HOT 5
- Checker for potential mistakes HOT 1
- Passing too many arguments to extract() HOT 1
- I can't manage to have smart_importer working HOT 6
- Remove suggestions because suggest=True adds list as metadata which is invalid HOT 7
- Bypass specific transactions. HOT 4
- please provide the hook helpers as an own package HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from smart_importer.