Comments (6)
Another question...
If one file is huge, e.g. millions of lines, and I'm importing it using a batch size of let's say 1000, what happens if the connector dies before the entire was processed?
Seems like that when I'll restart the connector, it'll read the entire file again, starting from the beginning, without skipping the lines that were already sent to kafka.
from kafka-connect-spooldir.
You're on it. It's been missing offset management for a while. I'm working on a few changes and refactors to cover this and some better schema handling. A new release will be coming soon.
from kafka-connect-spooldir.
Perfect, thanks. I'm working on it as well... I managed to implement the offset management but I'm struggling to implement the clean up of the folder in a nice way.
from kafka-connect-spooldir.
Any updates on this subject? I am investigating ways to import from an application that continually appends data to a CSV file and your connector seems like a perfect match (the application will let go of the file when it is renamed, so the automatic renaming to a .PROCESSING file is perfect). However the lack of resuming from the recorded offset is a showstopper for us.
from kafka-connect-spooldir.
@gbehrmann Missed your comment. I've had this pull out there but haven't received much feedback. #17
from kafka-connect-spooldir.
This was fixed with #17.
from kafka-connect-spooldir.
Related Issues (20)
- Unable to write JSON schemaless events HOT 3
- NPE trying to determine field names with a blank file
- CSV connector's task hanged in infinite loop after meeting record it can't process HOT 1
- Incorrect skipping of records when reprocessing file HOT 2
- CsvSpoolDir error out HOT 1
- How to retrieve header infos in a Stream with KSQLDB? HOT 2
- CSV files' name problem
- java.lang.NoSuchMethodError: 'void com.google.common.base.Preconditions.checkStat
- While fetching JSON data , shows integer value to null HOT 2
- Question Is it possible to create an SMT that will indicate the number of records in the file
- Could not parse <field> to 'Date' HOT 4
- Continious logs coming in kafka connect logs HOT 3
- schema.generation.value.name - not taking effect in the schema registry
- NPE when specifying timestamp.field on a autogenerated schema
- InputFile split on non-regexable Windows File.Seperator
- Relative file path is not retrieved properly HOT 2
- Date Time Parse Error... HOT 1
- Format for defining arrays in schema
- SpoolDirLineDelimitedSourceConnector cleanup policy HOT 1
- Fix misleading messages logged by SpoolDirLineDelimitedSourceConnector HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kafka-connect-spooldir.