spring-projects / spring-batch-extensions Goto Github PK

Spring Batch Extensions

Java 100.00%

spring-batch-extensions's Issues

Publish Spring Batch Excel 0.1.1 artifacts to maven

Could you please publish the latest version to the maven repo?
I am using the excel extension in one of my projects and I am facing an issue that was fixed in #90 however, the latest version in maven 0.1.0 does not contain the fix.

Thanks!

Mapping multiple sheets with multliple target objects

A workbook with differents sheets can't be read if the objects targets aren't the same type.

Excel file only read once - subsequent parses hang

I don't know for sure if this is an issue with this plugin, or whether it is caused by something I am doing / not doing. Details are here: http://stackoverflow.com/questions/29127028/grails-spring-batch-excel-reader-only-reads-file-once.

When I read an XLSX file, it works the first time, but subsequent attempts to parse the same file (by restarting Grails) just mean the job continues indefinitely. I need to reboot the computer to be able to rerun the job (and then it will only run once before hanging again).

BeanWrapperRowMapper throws BinderException on trying to read basic String files.

From a DataBinder binder, say I were to register a custom editor by calling:

binder.registerCustomEditor(String.class, new StringTrimmerEditor(true))

It tried to read this particular row as a Long; hence an IllegalArgumentException leading to the aforementioned BinderException.

Release spring-batch-excel version 0.1.0

@mdeinum Please add a comment here when the module is ready for a release.

Please note that we are reviewing the internal release process within the entire portfolio, so I will add an update here when the release is done.

Make it possible to read FileSystemResource in PoiItemReader

Thank you for making a good library.
But it has one drawback, which it cannot read excel file resources using FileSystemResource.

In current version, If you put FileSystemResource in openExcelFile(Final Resources resources) method’s parameter and execute it, it throws Exception. Because FileInputStream isn't mark supported and also not wrapped as PushBackInputStream.

@Override
protected void openExcelFile(final Resource resource) throws Exception {
    workbookStream = resource.getInputStream();

    if (!workbookStream.markSupported() && !(workbookStream instanceof PushbackInputStream)) {
        throw new IllegalStateException("InputStream MUST either support mark/reset, or be wrapped as a PushbackInputStream");
    }

    this.workbook = WorkbookFactory.create(workbookStream);
    this.workbook.setMissingCellPolicy(Row.CREATE_NULL_AS_BLANK);
}

But it’s an unnecessary check because WorkbookFactory.create(workbookStream) method can wrap InputStream as PushBackInputStream when it isn’t mark supported.

public static Workbook create(InputStream inp) throws IOException, InvalidFormatException {
    if(!((InputStream)inp).markSupported()) {
        inp = new PushbackInputStream((InputStream)inp, 8);
    }

    if(POIFSFileSystem.hasPOIFSHeader((InputStream)inp)) {
        return new HSSFWorkbook((InputStream)inp);
    } else if(POIXMLDocument.hasOOXMLHeader((InputStream)inp)) {
        return new XSSFWorkbook(OPCPackage.open((InputStream)inp));
    } else {
        throw new IllegalArgumentException("Your InputStream was neither an OLE2 stream, nor an OOXML stream");
    }
}

So I think it's more useful to remove unnecessary validation check in openExcelFile(Final Resources resources) method to read excel file as FileSystemResource which is used frequenly in batch environment

Upgrade to Spring Batch 4.3.3

Add mapping between differents headers names and fields object target

If the fields of target object does not have the same names of the header in the excel file, org.springframework.beans.NotWritablePropertyException is thrown.

BigQuery Item Reader

Find out what is available and give a brief understanding how to spend money cost-efficiently for BigQuery read operations.

JDBC / ODBC Driver - https://cloud.google.com/bigquery/docs/reference/odbc-jdbc-drivers
BigQuery Java library - https://cloud.google.com/bigquery/docs/running-queries

add project into maven repository

Dear,

Is it possible to add this project into Maven repository ? https://mvnrepository.com/search?q=spring-batch-excel&p=2

Thanks,

Regards,

Brandon

Investigate possibility to support Parquet

https://hadoop.apache.org/
https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-parquet
https://cloud.google.com/bigquery/docs/exporting-data#parquet_export_details

Add ability to intentionally skip empty rows

I have a use case where I would like AbstractExcelItemReader to ignore a number of empty lines.

Not able to open excel files larger than 10MB

When trying to read excel file larger than 10MB, an error occurs:

"Unexpected error Tried to allocate an array of length 162,386,364, but the maximum length for this record type is 100,000,000. If the file is not corrupt or large, please open an issue on bugzilla to request increasing the maximum allowable size for this record type. As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride()"

Add module for Neo4j

This issue is to move the current item reader/writer for Neo4j in Spring Batch to this extension repository.

spring batch unknown table unknown-table-batch-job-seq-in-field-list

The problem is described in this stackoverflow thread, can anyone give a suggestion as soon as possible? thx!

Investigate possibility to support multiple types of writers

Load Job - Implemented
Write API - TO BE REVIEWED - https://cloud.google.com/bigquery/docs/write-api
Streaming - TO BE REVIEWED - https://cloud.google.com/bigquery/streaming-data-into-bigquery
JDBC / ODBC Driver - TO BE REVIEWED - https://cloud.google.com/bigquery/docs/reference/odbc-jdbc-drivers

Date format when reading

In my file dates are in DD/MM/YYYY and when Spring/POI are reading data org.apache.poi.ss.usermodel.DataFormatter is used and in performDateFormatting method the parameter dateFormat has a pattern of M/d/yy.

Is there a way to force the date pattern when reading ?

My RowMapper configuration is

	<bean id="caricaAnagraficheReader" class="org.springframework.batch.extensions.excel.poi.PoiItemReader" scope="step">
		<property name="resource" value="file:#{batchParameters.genericBatchParameters.allegatoNomeCompleto}" />
		<property name="linesToSkip" value="1" />
	    <property name="rowMapper">
	        <bean class="it.blue.batch.portali.components.CaricaAnagraficheRowMapper" />
    	</property>
	</bean>

Support parsing rows to other data types than String[]

Columns in Excel files being batch processed may contain numbers or dates. Currently all rows are parsed into String[] (see Sheet line 47). spring-batch-excel interfaces should use generics to support other representations of a row, such as Object[].

Sheet index (1) is out of range (0..0)

Hey. I'm having this error using your extension library.

when I execute the job it works just fine. but at the second time of executing, it gives me that error.

 @Bean
    public PoiItemReader<ActivosExcel> excelReader() throws MalformedURLException {
        PoiItemReader<ActivosExcel> reader = new PoiItemReader<>();
        reader.setSaveState(false);
        reader.setLinesToSkip(1);
        reader.setResource(new UrlResource("file:\\file.xlsx") {
        });
        reader.setRowMapper(excelRowMapper());
        return reader;
    }

that is my Reader there... the complete error is:

Caused by: java.lang.IllegalArgumentException: Sheet index (1) is out of range (0..0)
	at org.apache.poi.xssf.usermodel.XSSFWorkbook.validateSheetIndex(XSSFWorkbook.java:1527) ~[poi-ooxml-3.16.jar:3.16]
	at org.apache.poi.xssf.usermodel.XSSFWorkbook.getSheetAt(XSSFWorkbook.java:1134) ~[poi-ooxml-3.16.jar:3.16]
	at org.apache.poi.xssf.usermodel.XSSFWorkbook.getSheetAt(XSSFWorkbook.java:121) ~[poi-ooxml-3.16.jar:3.16]
	at org.springframework.batch.item.excel.poi.PoiItemReader.getSheet(PoiItemReader.java:47) ~[spring-batch-excel-0.5.0-SNAPSHOT.jar:na]
	at org.springframework.batch.item.excel.AbstractExcelItemReader.openSheet(AbstractExcelItemReader.java:120) ~[spring-batch-excel-0.5.0-SNAPSHOT.jar:na]
	at org.springframework.batch.item.excel.AbstractExcelItemReader.doOpen(AbstractExcelItemReader.java:112) ~[spring-batch-excel-0.5.0-SNAPSHOT.jar:na]
	at org.springframework.batch.item.support.AbstractItemCountingItemStreamItemReader.open(AbstractItemCountingItemStreamItemReader.java:144) ~[spring-batch-infrastructure-3.0.8.RELEASE.jar:3.0.8.RELEASE]
	... 79 common frames omitted

can you please help me? or should I do a custom Reader with POI. If that's the case, can you give me an example of how to do it?

thank you.

UPDATE: The first time that I use it, it works (my .xlsx file only have 1 sheet.) but the second time it doesnt because The Reader doesnt find a second sheet.. so it throws that error.

I did a little test and i just created another sheet in file and it worked. but still have the problem in code.

Where is the index incrementing? it should always be 0!

DefaultRowset method getColumnValue is Missing

The method is missing or removed from spring-batch-excel. Kindly assist to explain why this was removed and the allternative

Upgrade checkstyle plugin to 3.2.0

Upgrade the following

maven-checkstyle-plugin -> 3.2.0
Checkstyle -> 10.5.0
Spring Java Format -> 0.0.35

ElasticsearchItemReader keeps grabbing data indefinitely because of the implementation of the doPageRead() method.

The method doPageRead() from ElasticsearchItemReader will stop if the ES query return null.

So normally if there are 100 items to retrieve with a 50 items range, the method doPageRead() is called three times. The first time 50 items are retrieved, the second time 50 others and the last time, the query returns null so doPageRead() stops.

Here the query keeps retrieving indefinitely the 50 first items, even if the SearchQuery is paginated.

I will find a solution, then share it here.

Remove JXL support

since JExcelAPI is an abandoned project (no release since 2009, with serious bugs remaining).

This way we could simplify the API and remove some of the abstraction and make it dedicated for POI. We could then also consider creating an ItemWriter for writing out excel files instead of only reading.

Update Apache POI

Apache POI is currently at version 3.15 we should support that version.

reading empty rows

the issue is that CustomMapper is reading empty rows. even after deleting the rows in excel, its reading.

Rename master branch to main

For consistency with other projects from the portfolio, the master branch should be renamed to main.

Rename master branch to main
Update CI build descriptors
Update Contributor Guidelines
Check open PRs

Resources:

https://github.com/github/renaming

AbstractExcelItemReader ignores a number of empty rows inbetween filled set of rows.It would help if the this functioning is made configurable since this causes incorrect meta data provided for the row number.

AbstractExcelItemReader ignores a number of empty rows in between filled set of rows. It would help if the this functioning is made configurable since this causes incorrect meta data provided for the row number.
Issue Description:
Suppose 5 rows containing data which 6th and 7th row is empty and the last 8th row has data(Screenshot attached for reference). This would return the 8th row as the 6th which may be problem if exact row number from cell is to be determined.

Issue with spring-batch-excel using Resource which might not have getFile() implemented and does not throw a FileNotFoundException exception

The following code is used to read the excel sheets:
StreamingXlsxItemReader.java:

    protected void openExcelFile(Resource resource, String password) throws Exception {
        try {
            File file = resource.getFile();
            this.pkg = OPCPackage.open(file, PackageAccess.READ);
        } catch (FileNotFoundException var4) {
            this.inputStream = resource.getInputStream();
            this.pkg = OPCPackage.open(this.inputStream);
        }

        XSSFReader reader = new XSSFReader(this.pkg);
        this.initSheets(reader, this.pkg);
    }

PoiItemReader.java:

    protected void openExcelFile(Resource resource, String password) throws Exception {
        try {
            File file = resource.getFile();
            this.workbook = WorkbookFactory.create(file, password, false);
        } catch (FileNotFoundException var4) {
            this.inputStream = resource.getInputStream();
            this.workbook = WorkbookFactory.create(this.inputStream, password);
        }

        this.workbook.setMissingCellPolicy(MissingCellPolicy.CREATE_NULL_AS_BLANK);
    }

It's nice that there is a fallback to attempt to use resource.getInputStream() but I ran into a problem with this spring-cloud project which uses a GoogleStorageResource and the issue is that the exception being thrown is UnsupportedOperationException which isn't handled by the code above. Please see here:
https://github.com/spring-attic/spring-cloud-gcp/blob/main/spring-cloud-gcp-storage/src/main/java/org/springframework/cloud/gcp/storage/GoogleStorageResource.java#L244

To fix this wondering if it makes sense to check the Resource if it's a file and if that's true call getFile() otherwise attempt to use getInputStream(). So it would look like this:

                try {
                    if(resource.isFile()) {
                        File file = resource.getFile();
                        this.pkg = OPCPackage.open(file, PackageAccess.READ);
                    } else {
                        this.inputStream = resource.getInputStream();
                        this.pkg = OPCPackage.open(this.inputStream);
                    }
                } catch (Exception ex) {
                    throw new IllegalArgumentException("Unable to read data from resource", ex);
                }

                XSSFReader reader = new XSSFReader(this.pkg);
                this.initSheets(reader, this.pkg);

Lines To skip not working properly

For POI Item Reader API if initial rows are null and we have applied linesToSkip it will skip those many lines but once it finds rows with not null values it will pick number of columns from row 0. Which should not be the case . It should pick that row column numbers.

//PoiSheet
@OverRide
public int getNumberOfColumns() {
if (numberOfColumns < 0) {
numberOfColumns = this.delegate.getRow(0).getLastCellNum();
}
return numberOfColumns;
}

maven group is incorrect

springframewor instead of springframework

spring-batch-extensions/spring-batch-elasticsearch/pom.xml

Line 5 in 6e6f2dc

<groupId>org.springframewor.batch</groupId>

When extracting data from the RowSet (rs) year in date cell is shortened

sample code:

var result = rs.getCurrentRow();
var actualDate = result[5]; // returns "5/29/19" year has been shorted from "2019" to "19"
var expected = "5/29/2019";
assertEquals(expected, actualDate);

Add support for resetting the currentSheet index to 0 upon doClose()

Object (POJO) is getting null

I'm using Spring Batch Excel Extension to read Excel (.xlx) file. I cloned the source and did mvn install and added the dependency to my Spring Boot project. I also added Apache poi-ooxml.

My Excel file has simple data:

Id  Last Name   First Name
3   Aguinaldo   Emilio
4   Aquino      Melchora
5   Dagohoy     Francisco
6   Luna        Antonio
7   Jacinto     Emilio

This is my Student class:

@Entity
public class Student {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private long id;
    @NotBlank(message = "{NotBlank.student.lastName}")
    private String lastName;
    @NotBlank(message = "{NotBlank.student.firstName}")
    private String firstName;
    private LocalDateTime createdAt;

    // getters, setters
}

I created utility class whose method does actual reading of Excel file:

public class ExcelUtils {
    public static <T> ItemReader<T> excelToItemReader(Path file, Class<T> clazz) throws Exception {
        PoiItemReader<T> reader = new PoiItemReader<>();
        reader.setLinesToSkip(1);
        System.out.println("File Name: " + file.toString()); // Displays: File Name: uploads/excel/<Excel file selected to import>
        Resource resource = new FileSystemResource(file);
        System.out.println("File exists? " + resource.exists()); // Displays: File exists? true
        reader.setResource(resource);
        reader.setRowMapper(excelRowMapper(clazz));
        return reader;
    }

    private static <T> RowMapper<T> excelRowMapper(Class<T> clazz) {
        BeanWrapperRowMapper<T> rowMapper = new BeanWrapperRowMapper<>();
        rowMapper.setTargetType(clazz);
        return rowMapper;
    }
}

After uploading the files, I would select a file to import its data to my database:

@PostMapping("/import")
public String importStudents(@RequestParam String fileName, RedirectAttributes redirectAttributes) throws Exception {
    ItemReader<Student> studentItemReader = ExcelUtils.excelToItemReader(storageService.load(fileName), Student.class);
    Student student = studentItemReader.read();
    if (student != null) {
        System.out.println("Student has data.");
        studentService.save(student);
    } else {
        System.out.println("Student is null");
        throw new Exception("Student is null");
    }

    redirectAttributes.addFlashAttribute("message", "You successfully imported students data from " + fileName + "!");

    return "redirect:/students";
}

I don't understand why student is getting null when there is not error being logged in console at all.

skip a column

Hey,
is there any method to skip the first column of an excel file while reading it in the batch ?
thank you

Parsing #NA fields within excel sheet

At the moment the switch case dealing with the various data does not support invalid fields. this is an issue with data sets that are generated incorrectly meaning the whole sheet is unable to be parsed if one field is out. At the moment it returns

Cannot handle cells of type 5 for these fields. are we able to add a switch to support these type.

this is the field type I'm referring

Upgrade to Spring Batch 4.3.5

Investigate possibility to repeat failed jobs

https://cloud.google.com/bigquery/docs/managing-jobs#repeating_a_job

Release spring-batch-neo4j version 0.1.0

PoiSheet reads number of columns always from the first row

PoiSheet always reads the number of columns from the first row.

Especially when a RowNumberColumnNameExtractor is defined (with the headerRowNumber attribute set) it would make sense to read the number of columns from the row that has the header.

And maybe the default should be to read the number of columns for the current row that is processed.

@mdeinum What do you think?

Upgrade surefire/failsafe to version 3.0.0-M7

Is project abandoned?

Is this project dead?
I see a lot of useful pull requests that have been ignored for several years.
How can we push this forward?

Update Spring Batch

Update Spring Batch to most recent version.

Upgrade maven-compiler-plugin to 3.10.1

Release spring-batch-bigquery version 0.1.0

@dgray16 Please add a comment here when the module is ready for a release.

Please note that we are reviewing the internal release process within the entire portfolio, so I will add an update here when the release is done.

Support DataFormatter in spring-batch-excel POI implementation

DataFormatter enables the POI version in spring-batch-excel to read the cell values as they appear in Excel (rather than returning the value with the type that excel used internally.

I would like to add this as an option to the PoiItemReader - so the user can choose to retrieve all values as Strings and just in the way they appear in Excel.

The reason is that I am having numbers that I want to be read as strings. But this is currently not possible.

Need of reading one particular sheet.

Since the xlsx format supports storing multiple tabs named differently and with different columns there is a need for supporting such files. It could be done by giving the user an ability to specify which sheet to read from by adding the Id or Name.

Exception when running in Async

Hi,
When I am running the job (PoiItemReader) using Simple Async Executor, I am getting the following exception -
Exception parsing Excel file (Because of null rows)
Whereas if I run the job normally (without SimpleAsyncTaskExecutor), I do not get any exceptions.

What could be the issue? Can someone help me out here?

Use reader current row count from failed execution to get correct row on restart

I noticed when restarting jobs after failure, the AbstractExcelItemReader would begin reading from the first row in the spreadsheet. The doRead method is simply calling rowSet.next() and ignoring the AbstractItemCountingItemStreamItemReaders attempt to jump to the correct row. A solution that I found to work was to override the jumpToItem method in AbstractExcelItemReader and simply call rowSet.next until we have the correct row.

startStatement() should not be required in Neo4jItemReader

Bug description
In Neo4jItemReaderBuilder, startStatement(String startStatement) is required, but Neo4j itself deprecated the START statement and throw error when used. If not used, application will throw BeanCreationException with message java.lang.IllegalArgumentException: startStatement is required.

Environment
Spring Boot: 2.7.0
Kotlin: 1.6.10
Neo4j: 4.4.4

Steps to reproduce

@Bean
    fun postReader(): ItemReader<Post> {
        return Neo4jItemReaderBuilder<Post>()
            .name("postReader")
            .sessionFactory(getSessionFactory())
            .startStatement("")
            .matchStatement("(p:Post)")
            .returnStatement("p")
            .targetType(Post::class.java)
            .pageSize(1000)
            .build()
    }

Expected behavior
startStatement() should be optional, not mandatory.

spring-projects / spring-batch-extensions Goto Github PK

spring-batch-extensions's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs