GithubHelp home page GithubHelp logo

codereverser / casparser Goto Github PK

View Code? Open in Web Editor NEW
135.0 8.0 63.0 6.53 MB

Parser for Consolidated Account Statements (CAS) generated from CAMS/Karvy/Kfintech

License: MIT License

Python 100.00%
cams karvy kfintech cas mutual-funds mutual-fund-portfolio pdf-parser consolidated-account-statements parser capital-gains

casparser's Introduction

CASParser

code style: black GitHub GitHub Workflow Status codecov PyPI - Python Version

Parse Consolidated Account Statement (CAS) PDF files generated from CAMS/KFINTECH

casparser also includes a command line tool with the following analysis tools

  • summary- print portfolio summary
  • (BETA) gains - Print capital gains report (summary and detailed)
    • with option to generate csv files for ITR in schedule 112A format

Installation

pip install -U casparser

with faster PyMuPDF parser

pip install -U 'casparser[fast]'

Note: Enabling this dependency could result in licensing changes. Check the License section for more details

Usage

import casparser
data = casparser.read_cas_pdf("/path/to/cas/file.pdf", "password")

# Get data in json format
json_str = casparser.read_cas_pdf("/path/to/cas/file.pdf", "password", output="json")

# Get transactions data in csv string format
csv_str = casparser.read_cas_pdf("/path/to/cas/file.pdf", "password", output="csv")

Data structure

{
    "statement_period": {
        "from": "YYYY-MMM-DD",
        "to": "YYYY-MMM-DD"
    },
    "file_type": "CAMS/KARVY/UNKNOWN",
    "cas_type": "DETAILED/SUMMARY",
    "investor_info": {
        "email": "string",
        "name": "string",
        "mobile": "string",
        "address": "string"
    },
    "folios": [
        {
            "folio": "string",
            "amc": "string",
            "PAN": "string",
            "KYC": "OK/NOT OK",
            "PANKYC": "OK/NOT OK",
            "schemes": [
                {
                    "scheme": "string",
                    "isin": "string",
                    "amfi": "string",
                    "advisor": "string",
                    "rta_code": "string",
                    "rta": "string",
                    "open": "number",
                    "close": "number",
                    "close_calculated": "number",
                    "valuation": {
                      "date": "date",
                      "nav": "number",
                      "value": "number"
                    },
                    "transactions": [
                        {
                            "date": "YYYY-MM-DD",
                            "description": "string",
                            "amount": "number",
                            "units": "number",
                            "nav": "number",
                            "balance": "number",
                            "type": "string",
                            "dividend_rate": "number"
                        }
                    ]
                }
            ]
        }
    ]
}

Notes:

  • Transaction type can be any value from the following
    • PURCHASE
    • PURCHASE_SIP
    • REDEMPTION
    • SWITCH_IN
    • SWITCH_IN_MERGER
    • SWITCH_OUT
    • SWITCH_OUT_MERGER
    • DIVIDEND_PAYOUT
    • DIVIDEND_REINVESTMENT
    • SEGREGATION
    • STAMP_DUTY_TAX
    • TDS_TAX
    • STT_TAX
    • MISC
  • dividend_rate is applicable only for DIVIDEND_PAYOUT and DIVIDEND_REINVESTMENT transactions.

CLI

casparser also comes with a command-line interface that prints summary of parsed portfolio in a wide variety of formats.

Usage: casparser [-o output_file.json|output_file.csv] [-p password] [-s] [-a] CAS_PDF_FILE

  -o, --output FILE               Output file path. Saves the parsed data as json or csv
                                  depending on the file extension. For other extensions, the
                                  summary output is saved. [See note below]

  -s, --summary                   Print Summary of transactions parsed.
  -p PASSWORD                     CAS password
  -a, --include-all               Include schemes with zero valuation in the
                                  summary output
  -g, --gains                     Generate Capital Gains Report (BETA)
  --gains-112a ask|FY2020-21      Generate Capital Gains Report - 112A format for
                                  a given financial year - Use 'ask' for a prompt
                                  from available options (BETA)
  --force-pdfminer                Force PDFMiner parser even if MuPDF is
                                  detected

  --version                       Show the version and exit.
  -h, --help                      Show this message and exit.

CLI examples

# Print portfolio summary
casparser /path/to/cas.pdf -p password

# Print portfolio and capital gains summary
casparser /path/to/cas.pdf -p password -g

# Save parsed data as a json file
casparser /path/to/cas.pdf -p password -o pdf_parsed.json

# Save parsed data as a csv file
casparser /path/to/cas.pdf -p password -o pdf_parsed.csv

# Save capital gains transactions in csv files (pdf_parsed-gains-summary.csv and
# pdf_parsed-gains-detailed.csv)
casparser /path/to/cas.pdf -p password -g -o pdf_parsed.csv

Note: casparser cli supports two special output file formats [-o file.json / file.csv]

  1. json - complete parsed data is exported in json format (including investor info)
  2. csv - Summary info is exported in csv format if the input file is a summary statement or if a summary flag (-s/--summary) is passed as argument to the CLI. Otherwise, full transaction history is included in the export. If -g flag is present, two additional files '{basename}-gains-summary.csv', '{basename}-gains-detailed.csv' are created with the capital-gains data.
  3. any other extension - The summary table is saved in the file.

Demo

demo

ISIN & AMFI code support

Since v0.4.3, casparser includes support for identifying ISIN and AMFI code for the parsed schemes via the helper module casparser-isin. If the parser fails to assign ISIN or AMFI codes to a scheme, try updating the local ISIN database by

casparser-isin --update

If it still fails, please raise an issue at casparser-isin with the failing scheme name(s).

License

CASParser is distributed under MIT license by default. However enabling the optional dependency mupdf/fast would imply the use of PyMuPDF / MuPDF and hence the licenses GNU GPL v3 and GNU Affero GPL v3 would apply. Copies of all licenses have been included in this repository. - IANAL

Resources

  1. CAS from CAMS
  2. CAS from Karvy/Kfintech

casparser's People

Contributors

abhishekjain-qb avatar codereverser avatar deepsourcebot avatar dependabot[bot] avatar isaac-philip avatar kaushiksk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

casparser's Issues

Issue when there is space character inside RTA code #14

Screenshot 2023-04-03 at 8 32 14 AM

The SUMMARY_ROW_RE regex fails when there are RTA codes like the second and third row. It ends up reading folio number along with the first three characters of RTA code resulting in a null repsonse when ISIN check is done.

Franklin Schemes not Returned

Hi Team

The Franklin Mutual Fund house changed their registrar to CAMS.
Therefore the data in PDF statement may have also changed
The Latest CAS statement is showing the Franklin Schemes but data is not returned by casparser package.

Plans to support parsing of CAS generated by CDSL?

Hi, checking whether there are plans to support parsing of CAS generated by CDSL, as it is much richer in info (contains all stock holdings too alongwith mutual funds).

If no near-term plans, has there been any effort in this direction? I could pitch in, or start from somewhere if some work has already done,,

Support for Multiple pdf files

Hi,

Do you have support for multiple pds files on your roadmap ?

For eg: I have 2 different reports - one from Karvy and other from Cams.

I can run the parser twice and see the results.

But, in the end I would like to see my complete portfolio in one place.

So, running 2 scripts with output on command line and then combining can be done away with it if the parser can support multiple pds.

I know each pdf can have a different password so that needs to be handled as well.

Or you want the parser to be agnostic to this and the person running the code should handle it at their end ?

Let me know.

Thanks!

Print NAV also in the table

Hi,

Currently table is printing something like below
https://raw.githubusercontent.com/codereverser/casparser/main/assets/demo.jpg

However, we are getting the NAV as on in the pdf.

If NAV is printed, we can easily multiple with 'close calculated' to get the final value of the fund.
I know you recently added the fund value...

This is important as the only thing changing here daily is NAV and if the SIPs are still going out, then even close calculated changes but that change is less frequent.

Thoughts ?

Parsing issue in cas In case of Bonus.

We have received a CAS in which there was a transaction related to bonus, in which some transactions contained NAV with it and some did not. so we are facing the issue like this image.

image

Issue with Amount alone record

In this case, there was just an extra amount credited with zero quantity, and the parser put the amount in the quantity field instead of the amount field. 320 went into qty field and not amount.

I have attached the CAMS statement of last month. And the full statement from the fund house.
Folio
Folio-full

Issue in Franklin Templeton Segregated Units

Hello team,

Franklin Templeton created few Segregated Portfolio's for some stressed Mutual Funds. The data is read incorrectly in some cases. In the example given below - there are 2 Segregation records - one for qty 215931.176, and second for qty 0.008, but the parser scans the 2nd one as qty 215931.184 ...

{"scheme": "Franklin India Credit Risk Fund- Segregated Portfolio 1 (8.25% Vodafone Idea Ltd-10JUL20-Growth Plan)",
"advisor": "ICICIRON",
"rta_code": "FTI880", "type": "DEBT", "rta": "CAMS", "isin": "INF090I01TJ6", "amfi": "147954", "open": "0.000", "close": "0.000", "close_calculated": "215931.176", "valuation": {"date": "2020-07-17", "value": "0.00", "nav": "0.0818"},
"transactions": [
{"date": "2020-01-24", "description": "Creation of units - Segregated Portfolio\t\t215,931.176", "amount": "0", "units": "215931.176", "nav": "0", "balance": "215931.176", "type": "SEGREGATION", "dividend_rate": null},
{"date": "2020-01-24", "description": "Creation of units - Segregated Portfolio\t\t0.008", "amount": "0", "units": "215931.184", "nav": "0", "balance": "215931.184", "type": "SEGREGATION", "dividend_rate": null},
{"date": "2020-06-15", "description": "Payment - Units Extinguished", "amount": "-1338.33", "units": "-16360.996", "nav": "0.0818", "balance": "199570.188", "type": "REDEMPTION", "dividend_rate": null},
{"date": "2020-07-10", "description": "Payment - Units Extinguished", "amount": "-16324.84", "units": "-199570.188", "nav": "0.0818", "balance": "0.000", "type": "REDEMPTION", "dividend_rate": null}]}

decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>] exception when parsing IDCW payout transaction

Thanks for your script. It is very useful. I am trying to build some automation on top of this to analyze my MF investments.
Got this exception when parsing the amount of an IDCW payout transaction. Let me know if I can collect any more debug info to help. I will also see if I can debug further.

Versions used

casparser==0.4.6
python 3.8.3

Traceback

File "C:\Sathish\python\mf_statement_parser\venv2\lib\site-packages\casparser\parsers_init_.py", line 35, in read_cas_pdf
processed_data = process_cas_text("\u2029".join(partial_cas_data.lines))
File "C:\Sathish\python\mf_statement_parser\venv2\lib\site-packages\casparser\process_init_.py", line 28, in process_cas_text
return process_detailed_text(text)
File "C:\Sathish\python\mf_statement_parser\venv2\lib\site-packages\casparser\process\cas_detailed.py", line 167, in process_detailed_text
amt = Decimal(m.group(3).replace(",", "_").replace("(", "-"))
decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>]

Screenshot of the transaction causing the issue

I ran in debug mode and noticed that it fails when attempting to parse the following transaction in pdf

image

pyCharm debug screenshot

image

Some more debug info

I got into the console on the debugger and found that we have a "." in the m.group(3) instead of probably the "amount" number?

m.group(3)
'.'

m.group(3).replace(",", "_").replace("(", "-")
'.'

amt = Decimal(m.group(3).replace(",", "_").replace("(", "-"))

Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.3.5\plugins\python-ce\helpers\pydev_pydevd_bundle\pydevd_exec2.py", line 1, in Exec
def Exec(exp, global_vars, local_vars=None):
File "", line 1, in
decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>]

CAMS CAS Parsing support for Dividend payout transaction

Hi,

CAMS CAS has Dividend Payout transactions like below.

CAMS_CAS1

TRANSACTION_RE doesn't match since "units", "nav" and "balance" columns are missing in these entries.

Can the parser be updated to handle the Dividend Payout transaction? Not sure if Karvy CAS has similar format.
Also, the Dividend amount may need to be negated for XIRR calculations.

Issue in parsing Schemes names without "Advisor"

This parser is really helpful.

In my CAMS CAS, there are a few Scheme entries without "Advisor" like below

CAMS_CAS

The parser skips those particular schemes due to "SCHEME_RE" not matching.
Can something be done regarding this?

Parsed closing units not the same as calc_close

So, I have a unique problem. My CAS has two discreet entries for the same folio number, due to switch from regular plan to direct plan. The Switch in is listed before the Switch out. So what ends up happening is the program parses the closing unit balance of 7.350 first and closing unit balance of 0.00 from the next entry overwrites it. So the parsed closing unit balance is 0.00 and the calc_close is 7.350. Which ends up as an error in the CLI version, but raises no error in the normal call. I saw your TODO comment about adding this validation as well. So here's something you can test it against.

If you don't mind me suggesting options, you could maybe add them up instead of replacing, or include calc_close in the dict too.

Also, here is the parsed data of the pdf for your convenience:

'Folio No: 0000000 / 00\t\tPAN: XXXXX0000X\t\tKYC: OK PAN: OK', 'GD65-IDFC Low Duration Fund-Growth-(Direct Plan) (Advisor: INA000000000)\t\tRegistrar : CAMS', 'Opening Unit Balance: 0.000', '14-Aug-2019\t\tNORMAL SWITCH - From IDFC Low Duration Fund-Gr-(Reg Pln)-BSE -\t\t202.98\t\t7.350\t\t27.6156\t\t7.350', 'Closing Unit Balance: 7.350\t\tNAV on 22-Dec-2020: INR 30.3779\t\tValuation on 22-Dec-2020: INR 223.28', '"Entry Load: Nil - Exit Load : Nil W.E.F 29/June/2012 . Please refer the Offer Document / Addendum issued from time to time"', 'Folio No: 0000000 / 00\t\tPAN: XXXX0000X\t\tKYC: OK PAN: OK', 'G65-IDFC Low Duration Fund-Growth-(Regular Plan) (Advisor: ARN-000000)\t\tRegistrar : CAMS', 'Opening Unit Balance: 0.000', '18-Jun-2019\t\tPurchase\t\t200.00\t\t7.424\t\t26.9379\t\t7.424', '19-Jun-2019 ***Address Updated from KRA Data***', '19-Jun-2019 ***Registration of Nominee***', '14-Aug-2019\t\tSwitch Out - To IDFC Low Duration Fund-Gr-(Dir Pln)-BSE -\t\t(202.98)\t\t(7.424)\t\t27.3406\t\t0.000', '30-Sep-2020 ***Address Updated from KRA Data***', 'Closing Unit Balance: 0.000\t\tNAV on 22-Dec-2020: INR 29.9852\t\tValuation on 22-Dec-2020: INR 0.00', '"Entry Load: Nil - Exit Load : Nil W.E.F 29/June/2012 . Please refer the Offer Document / Addendum issued from time to time"

I have attached relevant screen shots of all, below:
CAS-SCAN
CLI-ERROR-SS
CLI-SS
RESULT-DICT

Issue in parsing

Hello,
I am following the repository continuously and working on it. Currently in Cams version v3.4 live-10014
The annotation of redemption has been changed.
image

Kfintech report parsing always results in error

Hi,

While this is parsing the CAMs report correctly, I have had no luck getting it to parse the Kfintech report.

I generated it from the url you mention in your Readme.

Get the report and I am able to open it fine.

When I run it,

โžœ casparser (main) โœ— casparser karvy2.pdf
Enter PDF password:
Error parsing pdf file :: Error parsing CAS header

Thanks.

Advisor Details are missing

In the advisor field, only ARN is coming. ARN number is required to identify the advisor associated with it.

Unable to parse negative unit balance

There seems to be an assumption that unit balance is never negative. While this assumption seems reasonable, I have a statement in which unit balance is shown as negative (some slight rounding error by AMC). This causes parsing to fail. I believe the fix is simply applying the same logic to unit balance as is applied to units.

See screenshot below for example where it fails.
image

Use quotes for delimiters / use semicolons for separators when generating CSV

Some scheme descriptions have commas in them:

***IDCW @ Rs.2.95000000 per unit  (TDS :138.70, TDS Rate: 7.50%)***
Redemption less TDS, STT
Lateral Shift Out less TDS, STT
Redemption Less STT -BSE - - UTR # CITIN24422132375 , less STT

These cause a problem when reading the CSV file.

Possible solutions:

  1. Use double quotes to delimit the fields
  2. Use semicolons as the separator instead of commas.

parser not working, resulting in excalamation mark / error rather than checked

I generated a consolidated report from CAS - CAMS + KFintech at https://www.camsonline.com/Investors/Statements/Consolidated-Account-Statement

Executing the casparser cli utility does not return successfully.

Is this expected ?

Please note the Error and the Excalamation marks in the image below.

casparser_snippet_error

Command executed,

$ casparser <filename>.pdf -p '<$password$>'

File Type details,

File Type : FileType.CAMS
CAS Type : CASFileType.DETAILED

also, is up-to-date,

(.venv_py310) iceman@pop-os ~/D/M/Statements> casparser-isin --update
2023-08-31 00:26:23,325 - INFO - Fetching remote isin db metadata
2023-08-31 00:26:24,283 - INFO - Local db version  : 2023.8.18
2023-08-31 00:26:24,283 - INFO - Remote db version : 2023.8.18
2023-08-31 00:26:24,283 - INFO - casparser-isin database is already upto date

Error while generating capital gains report with Dividend payout scheme

Getting the following error

File "\lib\site-packages\casparser\analysis\gains.py", line 192, in merge_transactions
merged_transactions[dt].units += txn["units"]
TypeError: unsupported operand type(s) for +=: 'decimal.Decimal' and 'NoneType'

Dividend payout transactions have nothing in the "Units" column as shown in screenshot below (Only "Amount" column)
image

[CAMS CAS]Issue in folio parsing when PAN data unavailable

Hi,

Folio is not getting parsed in below case. Transactions are getting mapped to previously parsed folio.
image

Below are details of pdf elements and lines for debug

[28.93000030517578, 93.44519805908203, 553.7244873046875, 103.9654769897461, 'Date\t\tTransaction\t\tAmount\t\tUnits\t\tPrice\t\tUnit']
[358.6300048828125, 102.31519317626953, 566.5147705078125, 124.0643310546875, '(INR)\t\t(INR)\t\tBalance\nKYC: OK']
[28.93000030517578, 113.60517120361328, 99.20275115966797, 124.12545013427734, 'Folio No: 99999999']

'Date\t\tTransaction\t\tAmount\t\tUnits\t\tPrice\t\tUnit'
'Folio No: 99999999\t\t(INR)\t\t(INR)\t\tBalance\nKYC: OK'

HeaderParseError: Error parsing CAS header


HeaderParseError Traceback (most recent call last)
in ()
----> 1 json_str = data = casparser.read_cas_pdf("33220217220210621ZFBF290265631DC70CPIMBCP130542292.pdf", "abcd1234")

2 frames
/usr/local/lib/python3.7/dist-packages/casparser/process.py in parse_header(text)
17 if m:
18 return m.groupdict()
---> 19 raise HeaderParseError("Error parsing CAS header")
20
21

HeaderParseError: Error parsing CAS header

Code:
json_str = data = casparser.read_cas_pdf("33220217220210621ZFBF290265631DC70CPIMBCP130542292.pdf", "xyz")

Feature Request: MF category and sub-category

HI Team

First of all many thanks for the great package your team has created.

I am author of repo and using your package to parse the cas pdf for my project.

I have requirement to classify funds based on type debt/equity and subtypes such large cap/small cap etc .

Would it be possible to integrate this feature in your package.

Group capital gains by PAN

CAS pdf files are generated primarily based on the email address and may occasionally contain multiple PAN numbers depending upon the filters used during the generation. To handle such cases, the capital gains report should have an extra column for the PAN number and preferably group the entries based on it.

Duplicated transaction

This is a bug in pdfminer/mupdf but I thought It would be useful to document (since the implications are somewhat critical if you rely on the output of casparser).

If you have pages that like look this across page boundaries, it seems to count the transaction at start of page two in the previous page as well. For me, it counts the *** Stamp Duty*** transaction at the start of the second page twice (once as part of the previous page 4, and again for the actual first time it is encountered - in page 5).

parsingbug

My guess is the mediabox (used by pdfminer to determine page boundaries) of the page is larger than necessary and extends into the second one.

Long mutual fund folio scheme name is not fully read

For long mutual fund scheme names that spans more than one row, only the first row is being read.
Example name:
"""
My long mutual fund scheme name ELSS -
Direct growth plan
"""
Only first row will be read: "My long mutual fund scheme name ELSS -"

casparser.exceptions.CASParseError: Unable to parse investor data

while running casparser, it is giving following error:

data = casparser.read_cas_pdf("CAMS_pranshu766.pdf", "pranshu766")
Deprecation: 'getTextPage' removed from class 'Page' after v1.19.0 - use 'get_textpage'.
Traceback (most recent call last):
File "", line 1, in
File "/home2/ajitup/anaconda3/lib/python3.8/site-packages/casparser/parsers/init.py", line 33, in read_cas_pdf
partial_cas_data = cas_pdf_to_text(filename, password)
File "/home2/ajitup/anaconda3/lib/python3.8/site-packages/casparser/parsers/mupdf.py", line 213, in cas_pdf_to_text
investor_info = parse_investor_info(page_dict)
File "/home2/ajitup/anaconda3/lib/python3.8/site-packages/casparser/parsers/mupdf.py", line 145, in parse_investor_info
raise CASParseError("Unable to parse investor data")
casparser.exceptions.CASParseError: Unable to parse investor data

Please help!

Make TransactionType a str enum instead of int

Currently, when the detailed summary is exported, the transaction type with key "type" consists of the string version of the TransactionType Enum.

"type": txn_type.name,

While this is the right design, if someone wants to reuse the TransactionType Enum elsewhere (like I am) on the exported data, this becomes a slight nuisance, as Json parsers like pydantic will not automatically parse the string into the Enum (as TransactionType is an Enum of ints).

Print the Valuation of the Fund as well

Thank you for a great parser.

Took me a few tries to understand which report the parser will pick up as Cams is offering many ( so confusing! :-)

Finally, it worked for me.

The table printed is great.
But, what as a user I would also like to see is the total valuation of my fund.

This is only giving the Open/Close rates but the cams file also has:
Valuation on 06-Nov-2020 INR XX,XXX.XX

Can we pick that up too ?

Thanks.

Different exception for when the password is incorrect

In the current code CASParseError("Incorrect PDF password!") is raised when the password is wrong.

raise CASParseError("Incorrect PDF password!")

So you have to do ugly things like:

try:
    read_cas_pdf("pdf", "password")
except CASParseError as err:
    if err.args:
        if 'incorrect pdf password' in err.args[0].lower():
            raise InvalidPasswordError
    
    raise

One possible solution could be to create a separate Exception for wrong password inheriting from CASParseError. Or a code attribute could be set in the CASParseError class, whose value could be like incorrect_password(or something else depending on the context where it is raised) which you can check for when handling the exception.

If you don't have the bandwidth, I can make a PR for the same this weekend.

Unable to fetch amfi values of recently renamed funds.

Adding the schemes json snippet for reference :
"schemes": [
{
"scheme": "HSBC Medium Duration Fund - Regular Growth (Formerly",
"advisor": "N/A",
"rta_code": "OLRCBG",
"rta": "CAMS",
"isin": null,
"amfi": null,
"type": "N/A",
"open": "xxx",
"close": "xxx",
"valuation": {
"date": "2023-01-12",
"nav": "16.9027",
"value": "2637.70"
},
"transactions": []
}

Unable to parse investor data

This is the code I am using to get the parsed data

import casparser


def main():
    data = casparser.read_cas_pdf('./demo2/JUL2020_AA03773313_TXN.pdf', 'FVXPK2945F', output="json")
    # data = casparser.read_cas_pdf('./demo2/MAR2021_AA06997817_TXN.pdf', password='BCDPJ0121K', force_pdfminer=True)
    print()


if __name__ == '__main__':
    main()

and this is what the error is


Traceback (most recent call last):
  File "/home/usharab/.local/lib/python3.8/site-packages/casparser/parser.py", line 163, in read_cas_pdf
    investor_info = parse_investor_info(layout, *page.mediabox[2:])
  File "/home/usharab/.local/lib/python3.8/site-packages/casparser/parser.py", line 55, in parse_investor_info
    raise CASParseError("Unable to parse investor data")
casparser.exceptions.CASParseError: Unable to parse investor data

The version of casparser I am using is '0.2.1' and before this version I was using version '0.5.3' and that version gave the same error. Can anyone guide me what could be the issue?

I have also tried force_pdfminer too and that also returned the same error

CLI works fine but I can't call read_cas_pdf in code

I am trying to use your library. Followed all steps as listed on your pypi page. But it always shows the error
module casparser has no attribute.
data = casparser.read_cas_pdf('/home/path.pdf', 'pwd') AttributeError: module 'casparser' has no attribute 'read_cas_pdf'

Code:
import casparser data = casparser.read_cas_pdf('/home/path.pdf', 'pwd')

Great work regardless thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.