spring-projects / spring-batch-extensions Goto Github PK
View Code? Open in Web Editor NEWSpring Batch Extensions
Spring Batch Extensions
Could you please publish the latest version to the maven repo?
I am using the excel extension in one of my projects and I am facing an issue that was fixed in #90 however, the latest version in maven 0.1.0 does not contain the fix.
Thanks!
A workbook with differents sheets can't be read if the objects targets aren't the same type.
I don't know for sure if this is an issue with this plugin, or whether it is caused by something I am doing / not doing. Details are here: http://stackoverflow.com/questions/29127028/grails-spring-batch-excel-reader-only-reads-file-once.
When I read an XLSX file, it works the first time, but subsequent attempts to parse the same file (by restarting Grails) just mean the job continues indefinitely. I need to reboot the computer to be able to rerun the job (and then it will only run once before hanging again).
From a DataBinder binder, say I were to register a custom editor by calling:
binder.registerCustomEditor(String.class, new StringTrimmerEditor(true))
It tried to read this particular row as a Long; hence an IllegalArgumentException leading to the aforementioned BinderException.
@mdeinum Please add a comment here when the module is ready for a release.
Please note that we are reviewing the internal release process within the entire portfolio, so I will add an update here when the release is done.
Thank you for making a good library.
But it has one drawback, which it cannot read excel file resources using FileSystemResource.
In current version, If you put FileSystemResource in openExcelFile(Final Resources resources) method’s parameter and execute it, it throws Exception. Because FileInputStream isn't mark supported and also not wrapped as PushBackInputStream.
@Override
protected void openExcelFile(final Resource resource) throws Exception {
workbookStream = resource.getInputStream();
if (!workbookStream.markSupported() && !(workbookStream instanceof PushbackInputStream)) {
throw new IllegalStateException("InputStream MUST either support mark/reset, or be wrapped as a PushbackInputStream");
}
this.workbook = WorkbookFactory.create(workbookStream);
this.workbook.setMissingCellPolicy(Row.CREATE_NULL_AS_BLANK);
}
But it’s an unnecessary check because WorkbookFactory.create(workbookStream) method can wrap InputStream as PushBackInputStream when it isn’t mark supported.
public static Workbook create(InputStream inp) throws IOException, InvalidFormatException {
if(!((InputStream)inp).markSupported()) {
inp = new PushbackInputStream((InputStream)inp, 8);
}
if(POIFSFileSystem.hasPOIFSHeader((InputStream)inp)) {
return new HSSFWorkbook((InputStream)inp);
} else if(POIXMLDocument.hasOOXMLHeader((InputStream)inp)) {
return new XSSFWorkbook(OPCPackage.open((InputStream)inp));
} else {
throw new IllegalArgumentException("Your InputStream was neither an OLE2 stream, nor an OOXML stream");
}
}
So I think it's more useful to remove unnecessary validation check in openExcelFile(Final Resources resources) method to read excel file as FileSystemResource which is used frequenly in batch environment
If the fields of target object does not have the same names of the header in the excel file, org.springframework.beans.NotWritablePropertyException is thrown.
Find out what is available and give a brief understanding how to spend money cost-efficiently for BigQuery read operations.
Dear,
Is it possible to add this project into Maven repository ? https://mvnrepository.com/search?q=spring-batch-excel&p=2
Thanks,
Regards,
Brandon
I have a use case where I would like AbstractExcelItemReader to ignore a number of empty lines.
When trying to read excel file larger than 10MB, an error occurs:
"Unexpected error Tried to allocate an array of length 162,386,364, but the maximum length for this record type is 100,000,000. If the file is not corrupt or large, please open an issue on bugzilla to request increasing the maximum allowable size for this record type. As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride()"
This issue is to move the current item reader/writer for Neo4j in Spring Batch to this extension repository.
The problem is described in this stackoverflow thread, can anyone give a suggestion as soon as possible? thx!
In my file dates are in DD/MM/YYYY and when Spring/POI are reading data org.apache.poi.ss.usermodel.DataFormatter is used and in performDateFormatting method the parameter dateFormat has a pattern of M/d/yy.
Is there a way to force the date pattern when reading ?
My RowMapper configuration is
<bean id="caricaAnagraficheReader" class="org.springframework.batch.extensions.excel.poi.PoiItemReader" scope="step">
<property name="resource" value="file:#{batchParameters.genericBatchParameters.allegatoNomeCompleto}" />
<property name="linesToSkip" value="1" />
<property name="rowMapper">
<bean class="it.blue.batch.portali.components.CaricaAnagraficheRowMapper" />
</property>
</bean>
Columns in Excel files being batch processed may contain numbers or dates. Currently all rows are parsed into String[]
(see Sheet line 47). spring-batch-excel
interfaces should use generics to support other representations of a row, such as Object[]
.
Hey. I'm having this error using your extension library.
when I execute the job it works just fine. but at the second time of executing, it gives me that error.
@Bean
public PoiItemReader<ActivosExcel> excelReader() throws MalformedURLException {
PoiItemReader<ActivosExcel> reader = new PoiItemReader<>();
reader.setSaveState(false);
reader.setLinesToSkip(1);
reader.setResource(new UrlResource("file:\\file.xlsx") {
});
reader.setRowMapper(excelRowMapper());
return reader;
}
that is my Reader there... the complete error is:
Caused by: java.lang.IllegalArgumentException: Sheet index (1) is out of range (0..0)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.validateSheetIndex(XSSFWorkbook.java:1527) ~[poi-ooxml-3.16.jar:3.16]
at org.apache.poi.xssf.usermodel.XSSFWorkbook.getSheetAt(XSSFWorkbook.java:1134) ~[poi-ooxml-3.16.jar:3.16]
at org.apache.poi.xssf.usermodel.XSSFWorkbook.getSheetAt(XSSFWorkbook.java:121) ~[poi-ooxml-3.16.jar:3.16]
at org.springframework.batch.item.excel.poi.PoiItemReader.getSheet(PoiItemReader.java:47) ~[spring-batch-excel-0.5.0-SNAPSHOT.jar:na]
at org.springframework.batch.item.excel.AbstractExcelItemReader.openSheet(AbstractExcelItemReader.java:120) ~[spring-batch-excel-0.5.0-SNAPSHOT.jar:na]
at org.springframework.batch.item.excel.AbstractExcelItemReader.doOpen(AbstractExcelItemReader.java:112) ~[spring-batch-excel-0.5.0-SNAPSHOT.jar:na]
at org.springframework.batch.item.support.AbstractItemCountingItemStreamItemReader.open(AbstractItemCountingItemStreamItemReader.java:144) ~[spring-batch-infrastructure-3.0.8.RELEASE.jar:3.0.8.RELEASE]
... 79 common frames omitted
can you please help me? or should I do a custom Reader with POI. If that's the case, can you give me an example of how to do it?
thank you.
UPDATE: The first time that I use it, it works (my .xlsx file only have 1 sheet.) but the second time it doesnt because The Reader doesnt find a second sheet.. so it throws that error.
I did a little test and i just created another sheet in file and it worked. but still have the problem in code.
Where is the index incrementing? it should always be 0!
The method is missing or removed from spring-batch-excel. Kindly assist to explain why this was removed and the allternative
Upgrade the following
The method doPageRead() from ElasticsearchItemReader will stop if the ES query return null.
So normally if there are 100 items to retrieve with a 50 items range, the method doPageRead() is called three times. The first time 50 items are retrieved, the second time 50 others and the last time, the query returns null so doPageRead() stops.
Here the query keeps retrieving indefinitely the 50 first items, even if the SearchQuery is paginated.
I will find a solution, then share it here.
since JExcelAPI is an abandoned project (no release since 2009, with serious bugs remaining).
This way we could simplify the API and remove some of the abstraction and make it dedicated for POI. We could then also consider creating an ItemWriter
for writing out excel files instead of only reading.
Apache POI is currently at version 3.15 we should support that version.
the issue is that CustomMapper is reading empty rows. even after deleting the rows in excel, its reading.
For consistency with other projects from the portfolio, the master
branch should be renamed to main
.
master
branch to main
Resources:
AbstractExcelItemReader
ignores a number of empty rows in between filled set of rows. It would help if the this functioning is made configurable since this causes incorrect meta data provided for the row number.
Issue Description:
Suppose 5 rows containing data which 6th and 7th row is empty and the last 8th row has data(Screenshot attached for reference). This would return the 8th row as the 6th which may be problem if exact row number from cell is to be determined.
The following code is used to read the excel sheets:
StreamingXlsxItemReader.java
:
protected void openExcelFile(Resource resource, String password) throws Exception {
try {
File file = resource.getFile();
this.pkg = OPCPackage.open(file, PackageAccess.READ);
} catch (FileNotFoundException var4) {
this.inputStream = resource.getInputStream();
this.pkg = OPCPackage.open(this.inputStream);
}
XSSFReader reader = new XSSFReader(this.pkg);
this.initSheets(reader, this.pkg);
}
PoiItemReader.java
:
protected void openExcelFile(Resource resource, String password) throws Exception {
try {
File file = resource.getFile();
this.workbook = WorkbookFactory.create(file, password, false);
} catch (FileNotFoundException var4) {
this.inputStream = resource.getInputStream();
this.workbook = WorkbookFactory.create(this.inputStream, password);
}
this.workbook.setMissingCellPolicy(MissingCellPolicy.CREATE_NULL_AS_BLANK);
}
It's nice that there is a fallback to attempt to use resource.getInputStream()
but I ran into a problem with this spring-cloud project which uses a GoogleStorageResource
and the issue is that the exception being thrown is UnsupportedOperationException
which isn't handled by the code above. Please see here:
https://github.com/spring-attic/spring-cloud-gcp/blob/main/spring-cloud-gcp-storage/src/main/java/org/springframework/cloud/gcp/storage/GoogleStorageResource.java#L244
To fix this wondering if it makes sense to check the Resource if it's a file and if that's true call getFile() otherwise attempt to use getInputStream(). So it would look like this:
try {
if(resource.isFile()) {
File file = resource.getFile();
this.pkg = OPCPackage.open(file, PackageAccess.READ);
} else {
this.inputStream = resource.getInputStream();
this.pkg = OPCPackage.open(this.inputStream);
}
} catch (Exception ex) {
throw new IllegalArgumentException("Unable to read data from resource", ex);
}
XSSFReader reader = new XSSFReader(this.pkg);
this.initSheets(reader, this.pkg);
For POI Item Reader API if initial rows are null and we have applied linesToSkip it will skip those many lines but once it finds rows with not null values it will pick number of columns from row 0. Which should not be the case . It should pick that row column numbers.
//PoiSheet
@OverRide
public int getNumberOfColumns() {
if (numberOfColumns < 0) {
numberOfColumns = this.delegate.getRow(0).getLastCellNum();
}
return numberOfColumns;
}
springframewor instead of springframework
sample code:
var result = rs.getCurrentRow();
var actualDate = result[5]; // returns "5/29/19" year has been shorted from "2019" to "19"
var expected = "5/29/2019";
assertEquals(expected, actualDate);
I'm using Spring Batch Excel Extension to read Excel (.xlx) file. I cloned the source and did mvn install and added the dependency to my Spring Boot project. I also added Apache poi-ooxml.
My Excel file has simple data:
Id Last Name First Name
3 Aguinaldo Emilio
4 Aquino Melchora
5 Dagohoy Francisco
6 Luna Antonio
7 Jacinto Emilio
This is my Student class:
@Entity
public class Student {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private long id;
@NotBlank(message = "{NotBlank.student.lastName}")
private String lastName;
@NotBlank(message = "{NotBlank.student.firstName}")
private String firstName;
private LocalDateTime createdAt;
// getters, setters
}
I created utility class whose method does actual reading of Excel file:
public class ExcelUtils {
public static <T> ItemReader<T> excelToItemReader(Path file, Class<T> clazz) throws Exception {
PoiItemReader<T> reader = new PoiItemReader<>();
reader.setLinesToSkip(1);
System.out.println("File Name: " + file.toString()); // Displays: File Name: uploads/excel/<Excel file selected to import>
Resource resource = new FileSystemResource(file);
System.out.println("File exists? " + resource.exists()); // Displays: File exists? true
reader.setResource(resource);
reader.setRowMapper(excelRowMapper(clazz));
return reader;
}
private static <T> RowMapper<T> excelRowMapper(Class<T> clazz) {
BeanWrapperRowMapper<T> rowMapper = new BeanWrapperRowMapper<>();
rowMapper.setTargetType(clazz);
return rowMapper;
}
}
After uploading the files, I would select a file to import its data to my database:
@PostMapping("/import")
public String importStudents(@RequestParam String fileName, RedirectAttributes redirectAttributes) throws Exception {
ItemReader<Student> studentItemReader = ExcelUtils.excelToItemReader(storageService.load(fileName), Student.class);
Student student = studentItemReader.read();
if (student != null) {
System.out.println("Student has data.");
studentService.save(student);
} else {
System.out.println("Student is null");
throw new Exception("Student is null");
}
redirectAttributes.addFlashAttribute("message", "You successfully imported students data from " + fileName + "!");
return "redirect:/students";
}
I don't understand why student is getting null when there is not error being logged in console at all.
Hey,
is there any method to skip the first column of an excel file while reading it in the batch ?
thank you
At the moment the switch case dealing with the various data does not support invalid fields. this is an issue with data sets that are generated incorrectly meaning the whole sheet is unable to be parsed if one field is out. At the moment it returns
Cannot handle cells of type 5
for these fields. are we able to add a switch to support these type.
this is the field type I'm referring
PoiSheet
always reads the number of columns from the first row.
Especially when a RowNumberColumnNameExtractor
is defined (with the headerRowNumber
attribute set) it would make sense to read the number of columns from the row that has the header.
And maybe the default should be to read the number of columns for the current row that is processed.
@mdeinum What do you think?
Is this project dead?
I see a lot of useful pull requests that have been ignored for several years.
How can we push this forward?
Update Spring Batch to most recent version.
@dgray16 Please add a comment here when the module is ready for a release.
Please note that we are reviewing the internal release process within the entire portfolio, so I will add an update here when the release is done.
DataFormatter enables the POI version in spring-batch-excel to read the cell values as they appear in Excel (rather than returning the value with the type that excel used internally.
I would like to add this as an option to the PoiItemReader
- so the user can choose to retrieve all values as Strings and just in the way they appear in Excel.
The reason is that I am having numbers that I want to be read as strings. But this is currently not possible.
Since the xlsx format supports storing multiple tabs named differently and with different columns there is a need for supporting such files. It could be done by giving the user an ability to specify which sheet to read from by adding the Id or Name.
Hi,
When I am running the job (PoiItemReader) using Simple Async Executor, I am getting the following exception -
Exception parsing Excel file (Because of null rows)
Whereas if I run the job normally (without SimpleAsyncTaskExecutor), I do not get any exceptions.
What could be the issue? Can someone help me out here?
I noticed when restarting jobs after failure, the AbstractExcelItemReader
would begin reading from the first row in the spreadsheet. The doRead
method is simply calling rowSet.next()
and ignoring the AbstractItemCountingItemStreamItemReaders
attempt to jump to the correct row. A solution that I found to work was to override the jumpToItem
method in AbstractExcelItemReader
and simply call rowSet.next
until we have the correct row.
Bug description
In Neo4jItemReaderBuilder
, startStatement(String startStatement)
is required, but Neo4j itself deprecated the START statement and throw error when used. If not used, application will throw BeanCreationException
with message java.lang.IllegalArgumentException: startStatement is required.
Environment
Spring Boot: 2.7.0
Kotlin: 1.6.10
Neo4j: 4.4.4
Steps to reproduce
@Bean
fun postReader(): ItemReader<Post> {
return Neo4jItemReaderBuilder<Post>()
.name("postReader")
.sessionFactory(getSessionFactory())
.startStatement("")
.matchStatement("(p:Post)")
.returnStatement("p")
.targetType(Post::class.java)
.pageSize(1000)
.build()
}
Expected behavior
startStatement()
should be optional, not mandatory.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.