Comments (23)
This is probably related to #29816 which was fixed in #30969. Can you verify this on 2.56.0?
from beam.
@je-ik Hi, thanks for the tip, but just tried upgrading to 2.56 and still seeing the same error. I am able to get it to work if I set my parallelism to 2 but any other values wont work which poses issues with autoscaling on aws. I also notice that the flink UI is showing me this:
No Watermark (Watermarks are only available if EventTime is used)
Im seeing this on the watermarks tab for each subtask. Any advice you could give would be very helpful
from beam.
Can you provide all the command-line flags, which you pass to the runner, please?
from beam.
I am running it through aws managed flink so kind of a black box there, however, the only pipeline option I am passing is
--runner=FlinkRunner in addition to my application specific options.
After reading the linked issue, I was able to get it to work locally using beam_fn_api experiments + upgrading to 2.56, but not really sure what thats doing.
I also noticed that this expirement is adding a bunch of operators and is resulting in higher backpressure and lower performance which means its most likely not a viable solution
from beam.
Strange. Seems loke the fix is not working in your case. Can you double-check that you run with 2.56.0 (e.g. that no dependency brings some older Beam version, shading, etc.). Other than that it might help to set lpg level to DEBUG and investigate logs around FlinkSourceSplit
and SplitEnumerator
.
from beam.
will take a look to ensure no earlier version is being brought in. I was seeing this log:
May 07, 2024 9:51:04 AM org.apache.beam.runners.flink.translation.wrappers.streaming.io.source.FlinkSourceReaderBase notifyNoMoreSplits INFO: Received NoMoreSplits signal from enumerator.
also just reran and saw this INFO: Adding splits [FlinkSourceSplit{splitIndex=0, beamSource=org.apache.beam.sdk.io.aws2.kinesis.KinesisSource@6de059f3, splitState.isNull=true, checkpointMark=null}]
Does this give any indication into the issue?
from beam.
Not really, but it seems you run the correct 2.56.0 version. The noMoreSplits signal just tells that there is indeed no more work. However that should result in emission of final watermark and should not hold the watermark. Could you patch your Beam version to add more logs? Ideally where the reader emits/computes watermark - e.g.
from beam.
I am actually seeing the watermarks work now in the flink runner on the web UI. And also seeing the idle tasks from my source reader get finished which I believe is ideal. However, I am still not getting the logs that occur when my window gets triggered unless beam_fn_api is enabled. Is there something else I need to be doing to get the window to trigger? This works without issue in dataflow and directrunner
from beam.
I am actually seeing the watermarks work now in the flink runner on the web UI. And also seeing the idle tasks from my source reader get finished which I believe is ideal.
Yes, that is how the fix should work.
However, I am still not getting the logs that occur when my window gets triggered unless beam_fn_api is enabled. Is there something else I need to be doing to get the window to trigger?
Can you try setting autoWatermarkInterval
?
from beam.
where is autoWatermarkInterval set in beam? Is this a pipeline option or set in the kinesis reader somewhere?
from beam.
Pipeline option. E.g. --autoWatermarkInterval=100
from beam.
that worked, thank you so much for your help!
from beam.
Hi, I'm facing a similar issue with Beam 2.56.0 and Flink 1.16.3 and Java SDK.
The problematic pipeline has parallelism 4 and has many slowly updating global window side inputs (uses unbounded GenerateSequence
as in patterns) that are being updated every hour. A watermark is not being emitted (I can see No Watermark (Watermarks are only available if EventTime is used)
in Flink UI in the following tasks) untill all source subtasks emit a message. This is a major issue, since it's not feasible to generate impulses more frequently and the pipeline is not able to make any progress.
Another source is Kafka and input topics have 3 partitions. I can see that one of the subtasks become FINISHED after some time and only 3 subtasks are active. Some of them receive very few messages and it's a norm for some partitions not to receive data for some time, this is especially true for testing environment.
I've tryied to set --autoWatermarkInterval=100
, but it did not have any effect. There are no other Beam-specific properties set.
Any ideas how to fix this? Should I downgrade Beam version to workaround the problem?
from beam.
After all side inputs emit their first message then watermarks are emitted correctly?
from beam.
@je-ik thanks for a quick response!
What I ment is that only when all subtasks of the source emit a record (see 'Records sent' in the image below) then I can see watermark on the next operator in the UI. Each side input is independent in this respect.
Below you can see an example of the source that do not emit a watermark, since only 2 subtasks emitted a record.
from beam.
Understood, this is probably unrelated to the issue reported here, can you please create another one? It would be best if you could provide a simple pipeline that exhibits the behavior you observe.
from beam.
Thanks, I'll create a separate issue.
What do you think about the issue with KafkaIO?
The topic has 3 partitions. As you can see from the screenshot below it received just one record in one of the partitions. A watermark was not propagated in this case further.
I've tryied to set --autoWatermarkInterval=100, but it did not have any effect. There are no other Beam-specific properties set.
from beam.
No, I don't think it is related to KafkaIO, more likely some subtlety related to refactoring of Flink runner sources, see #25525
from beam.
@je-ik Do you have any suggestions how to workaround the issue when there are several idle Kafka partitions/topics in the topology that hold an overall progress? It used to work with Beam 2.45.0 and Flink 1.15.0, but it seems that behavior has changed since then and it does not work as expected in Beam 2.56.0 and Flink 1.16.3.
from beam.
I don't know if there are any workarounds, as the described behavior seems to be (unknown) bug. It needs further investigation. Could you please provide a simplified pipeline that is affected by this?
from beam.
I am experiencing what i believe is quite the same issue, using FixedWindow when running in FlinkRunner.
I am also using processing time, and from checking tracing logs i can tell there's something wrong with the watermarks:
WatermarkHold.addHolds: element hold at 2024-05-19T07:52:59.999Z is on time for key:aaa-bbb-ccc; window:[2024-05-19T07:52:00.000Z..2024-05-19T07:53:00.000Z); inputWatermark:-290308-12-21T19:59:05.225Z; outputWatermark:-290308-12-21T19:59:05.225Z
from beam.
@yardenbm do you have idle partitions, or do all partitions contain data at all times?
from beam.
I tested various versions of Apache Beam with Apache Flink 1.15.4 and the issue started to appear in 2.52.0. Apache Beam 2.46.0 - 2.51.0 does not have this issue with Flink 1.15.4. Hopefully, it will help. Meanwhile i'll try to create a minimal example that reproduces the problem.
from beam.
Related Issues (20)
- [Feature Request]: Include job name in GCS custom audit info
- [Bug]: windmillServiceCommitThreads option can lead to ConcurrentModificationException and stuck commits
- [Bug]: KinesisIO source on FlinkRunner initializes the same splits twice HOT 1
- [Failing Test]: :sdks:python:test-suites:direct:py38:tensorflowInferenceTest fails in Python 3.8 postcommit suite
- [Bug]: Using WriteToBigQuery FILE_LOADS in a streaming pipeline does not remove temporary tables
- [Feature Request]: Manage GCS soft delete policy in temp location HOT 1
- [Bug]: JmsIOTests don't actually verify that the queue is empty HOT 1
- [Feature Request]: Enable BigQueryIO to support _CHANGE_SEQUENCE_NUMBER fixed hex string pseudo column
- [Bug][Prism]: panic: unknown coder urn key: beam:coder:nullable:v1
- [Bug][Prism]: panic: nothing in progress and no refreshes with non zero pending elements
- [Bug][Prism]: panic: unsupported StateKey Get type: *fnexecution_v1.StateKey_MultimapKeysSideInput_
- Performance Regression or Improvement: gbk_python_batch_load_test_2gb_of_100KB_records:runtime
- Performance Regression or Improvement: sideinpts_python_batch_10gb_1kb_10workers_1000window_first_iterable:runtime
- The PreCommit Java Debezium IO Direct job is flaky
- [Bug]: Add a consistent schema format for TypedSchemaTransformProvider
- [Feature Request]: Enable withFormatRecordOnFailureFunction() equivalent for BigQuery STORAGE_WRITE_API
- [Failing Test]: Python Precommit failing due to RRIO test failures HOT 1
- [Feature Request]: Add `withoutValidation` option added to `BigtableIO.ReadChangeStream`
- Performance Regression or Improvement: pytorch_image_classification_benchmarks-resnet152-GPU-mean_inference_batch_latency_micro_secs:mean_inference_batch_latency_micro_secs
- Performance Regression or Improvement: pytorch_image_classification_benchmarks-resnet152-GPU-mean_load_model_latency_milli_secs:mean_load_model_latency_milli_secs
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from beam.