nissl-lab / toxy Goto Github PK
View Code? Open in Web Editor NEW.net text extraction framework
License: Apache License 2.0
.net text extraction framework
License: Apache License 2.0
Hi,
I am able to import .XLS file in my C# project but when i try to import .XLSX file (into DataTable) i am getting following reference issues.
When i run the test [TestParseTextFromWord] in [Word2003ParserTest] i get the flowing exception
System.IO.FileLoadException : Could not load file or assembly 'NPOI, Version=2.1.3.1, Culture=neutral, PublicKeyToken=0df73ec7942b34e1' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)
at Toxy.Parsers.Word2003TextParser.Parse()
at Toxy.Test.Word2003ParserTest.TestParseTextFromWord() in Word2003ParserTest.cs: line 18
and the NPOI referenced in the Toxy Project is NPOI Version=2.2.0.0 and it says it's need Version =2.1.3.1 what is the problem?
最近业余时间把一个庞大的应用框架迁移到 .NET Core 2.1 上了,改动比想象中小很多,除了 Mvc 部分微软重写了改起来比较多以外,其它倒还挺顺利的,启动速度缩减到了原来的一半。
Toxy 有计划支持 .NET Core 吗 ?
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
Hi,
If you want to add support for Outlook MSG files then use this library --> https://github.com/Sicos1977/MSGReader
Greetings,
Kees van Spelde
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
Hello out there, :)
trying to read a *.xls file with freezed pane(s) results in:
ICSharpCode.SharpZipLib.Zip.ZipException: 'Wrong Local header signature: 0xE011CFD0'
In detail:
ICSharpCode.SharpZipLib.Zip.ZipException
HResult=0x80131600
Message=Wrong Local header signature: 0xE011CFD0
Source=ICSharpCode.SharpZipLib
StackTrace:
at ICSharpCode.SharpZipLib.Zip.ZipInputStream.GetNextEntry()
at NPOI.OpenXml4Net.Util.ZipInputStreamZipEntrySource..ctor(ZipInputStream inp)
at NPOI.OpenXml4Net.OPC.ZipPackage..ctor(Stream in1, PackageAccess access)
at NPOI.OpenXml4Net.OPC.OPCPackage.Open(Stream in1)
at NPOI.Util.PackageHelper.Open(Stream is1)
at NPOI.XSSF.UserModel.XSSFWorkbook..ctor(Stream is1)
at Exel_with_NAPI.Form1.Form1_Load(Object sender, EventArgs e) in K:\Projects\Software\Software Test\Excel with Napi\Exel with NAPI\Form1.vb:line 28
at System.EventHandler.Invoke(Object sender, EventArgs e)
at System.Windows.Forms.Form.OnLoad(EventArgs e)
at System.Windows.Forms.Form.OnCreateControl()
at System.Windows.Forms.Control.CreateControl(Boolean fIgnoreVisible)
at System.Windows.Forms.Control.CreateControl()
at System.Windows.Forms.Control.WmShowWindow(Message& m)
at System.Windows.Forms.Control.WndProc(Message& m)
at System.Windows.Forms.ScrollableControl.WndProc(Message& m)
at System.Windows.Forms.Form.WmShowWindow(Message& m)
at System.Windows.Forms.Form.WndProc(Message& m)
at System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m)
at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
at System.Windows.Forms.NativeWindow.DebuggableCallback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)
It would be nice to be able to import fixed panes or that they are unfixed.
People always forget unfixing before saving.
Have a sunny weekend ...
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
I have an Excel file which contains 5 spread sheets out of which:
• 2 are hidden (as they are dialog, user interface, kind of sheets DlgView1, DlgView2).
• 3 work sheets (WrkSht1, WrkSht2, WrkSht3) are visible with some data containing formulas and external links.
Also, the Excel sheet contains a VBA module called Module1.bas with some logic. When I check the number of worksheets using workbook.NumberOfSheets property, it shows 5, but when I iterate through the code for all worksheets present, I get the below results:
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
Hello,
I am working on a NPOI project & I need to know how to access the external links with respect to .xls & .xlsx files. I am able to work with rest all kind of objects but not with External links. When I access the cell formula, in .xls files, I get only the external file name but not the path it is pointing to whereas in .xlsx file, I dont get external file name but numeric value like [1] or [2] ect. Can anyone just let know how to access at external links using NPOI for .xls and .xlsx files. I looked at apache POI counterpart and they have method called getExternalLinks() through which we can get at external links but not in NPOI. Thank you.
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
你给我举几个例子npp怎么**了?我也爱国,但我没觉得他**啊。
知乎虽然是**网络的**明灯了,但关键字过滤还是绕不过,被政府约谈过很多次了,我也不怪他,所以还是来这里聊吧。
Just tested with a docx document and it doesn't extract any text from the header. Is it possible to fix?
It was removed and I cannot find how to parse RTF files. Is it possible?
How to Extract EPUB Files? An .epub is a popular extension that is used for electronic publications.
--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/12941059-extract-epub-format?utm_campaign=plugin&utm_content=tracker%2F621213&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F621213&utm_medium=issues&utm_source=github).I am trying to find a way to extract text from a instance of System.IO.Stream(not only FileStream),but Toxy can only do extraction work when source is a file,actually,the source may varies for some reasons.Can you add the stream support to ParseContext,tell me if you have any tech problems,we could work it out together.
@tonyqus 可以给一个解决方案吗?Toxy如果不能在.net core上使用,就要移花接木用Java的 tika了
I cannot parse any newer PDF files because PDFSharp throws an error due to a bug where it cannot read >v1.5 of PDFs. Is there anything we can do to fix this?
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
PDF document parser can't properly handle paragraph and page counts. Paragraphs array shows lines instead of paragraphs and total page count is zero.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.