Comments (9)
@smclay thanks for reporting this.
In the email exchanged we had about this I mentioned that simply we should check for the presence of the subscriber ID on the database and act differently depending on that result (i.e., succeed?). But now that I think about it a bit more, we should probably have the full picture before deciding how to proceed with this one.
The deeper problem I remember you are having is that your logfiles are getting flooded with these messages. This means there is a counterpart to this server that is trying to create this subscription without ever giving up. That counterpart is probably this:
ngas/src/ngamsServer/ngamsServer/ngamsSrvUtils.py
Lines 138 to 141 in 8fa1c39
This check sits inside a loop that won't finish until all subscriptions are successfully made.
Subscription creation is currently successful only if a new subscription is effectively added to the database. The question is then what to do if a subscription with the same ID comes in. Here are some options:
- Assume that because the ID is the same as an existing subscription, the subscription parameters must be the same, and simply return successfully (200).
- Like above, but double-check that the subscription parameters are exactly the same. If they are, declare a success, if they are not declare an error (400).
- Like above, but return a specific success code different from 200 when parameters are the same.
- If a subscription ID is already taken just fail, but with a different error (some 4xx specific code).
I don't find options 1 and 2 too attractive because they declare a success even when a new subscription was not created, which I see as a bit of a contradiction in the semantics of the command; option 1 in particular is too simplistic, as it assumes too much. And if we go for option 3 or 4 we'd need to change the code I quoted above to take into account the new return code and adjust itself.
Thoughts? There are probably other alternatives I haven't thought of.
from ngas.
@rtobar I tried to look at the subscriber side logs but unfortunately they have already been deleted.
I am unsure but I believe when the subscriber shuts down it sends an UNSUBSCRIBE command to the subscription publisher. Is that correct? Therefore in the event of a crash or messy shutdown the subscription entry will probably not be removed. On restarting the subscriber it will send a new SUBSCRIBE command which should result in an identical request to the entry already stored in the database. I think this is probably the most likely scenario for duplicate requests which should be quite harmless.
Therefore I think I agree that options 3 and 4 are better solutions. The subscriber can log a warning when option 3 occurs. In the event of option 4, I would suggest a notification email is sent to alert the NGAS administrator. If the administrator is expecting data to flow but the subscription is rejected it is probably better to inform them so they can act on it otherwise they may not notice and the problem could go undetected for hours or days.
from ngas.
@smclay I've pushed a few changes to the issue-50
branch. I've implemented both the changes in the SUBSCRIBE
command (returning different HTTP codes when a subscription already exists) and in the automatic-subscription-on-server-startup logic that should look into these HTTP codes and react accordingly. In particular, an error is logged, an email is sent, and no further attempts are done to create the faulty subscription.
Could you give these changes a go? I've added unit tests for the changes in SUBSCRIBE
, but haven't yet tried automatically testing the rest (it will require some more effort), so if you could try to reproduce the situation and check that things are working it would be great.
from ngas.
@rtobar thanks for the update. I will build new packages and deploy over the weekend. Hopefully I will have some test results next week.
from ngas.
@rtobar I have carried out tests. For the most part it is working well. If the subscription already exists I get the following log messages from the subscriber host...
2021-05-31T12:05:49.721 [ 7901] [SUBSCRIBER] [ ERROR] ngamsServer.ngamsSrvUtils#_create_remote_subscriptions:151 Different subscription with ID '%s' already exists, giving up
2021-05-31T12:05:49.723 [ 7901] [SUBSCRIBER] [ INFO] ngamsLib.ngamsNotification#_sendNotifMsg:121 Sending Notification Message to: [email protected]. Subject: aat-ngas-6:8001: Automatic subscription cannot be created
2021-05-31T12:05:49.760 [ 7901] [SUBSCRIBER] [ INFO] ngamsServer.ngamsSrvUtils#_create_remote_subscriptions:164 Successfully subscribed to aat-ngas-5:8001 with url=http://aat-ngas-6.hq.eso.org:8001/QARCHIVE
2021-05-31T12:05:49.761 [ 7901] [SUBSCRIBER] [ ERROR] ngamsServer.ngamsSrvUtils#_create_remote_subscriptions:171 Error while adding subscription, will try later
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ngamsServer/ngamsSrvUtils.py", line 168, in _create_remote_subscriptions
subscriptions.remove(subscrObj)
ValueError: list.remove(x): x not in list
There appears to be two bugs. The log message "151 Different subscription with ID '%s' already exists, giving up" has not been formatted properly and there is a ValueError
exception that could probably be caught and handled.
On the publisher side we get the following log messages...
2021-05-31T12:05:49.656 [11060] [ R-0] [ INFO] ngamsServer.ngamsServer#handleHttpRequest:1806 Handling HTTP request: client_address=('134.171.18.32', 57492) - method=GET - path=|SUBSCRIBE?subscr_id=aat-ngas-6%3A8001&priority=1&url=http%3A%2F%2Faat-ngas-6.hq.eso.org%3A8001%2FQARCHIVE&start_date=2021-05-31T12%3A05%3A49.646|
2021-05-31T12:05:49.658 [11060] [ R-0] [ INFO] ngamsServer.ngamsCmdHandling#_get_module:81 Received command: SUBSCRIBE
2021-05-31T12:05:49.662 [11060] [ R-0] [ INFO] ngamsServer.commands.subscribe#handleCmd:117 Creating subscription for files >= 2021-05-31T12:05:49.646
2021-05-31T12:05:49.701 [11060] [ R-0] [ INFO] ngamsServer.ngamsServer#send_status:350 Returning status FAILURE with message Different subscription with ID 'aat-ngas-6:8001' existed and HTTP code 409
2021-05-31T12:05:49.705 [11060] [ R-0] [ INFO] ngamsServer.ngamsServer#send_data:320 Sending 369 bytes of data of type text/xml and headers {}
2021-05-31T12:05:49.705 [11060] [ R-0] [ INFO] ngamsServer.ngamsServer#handleHttpRequest:1833 Total time for handling request: (GET, SUBSCRIBE ,, ): 0.050 [s]
I also receive an email from the subscriber...
Notification Message:
NGAS attempted to create an automatic subscription with ID=aat-ngas-6:8003 to obtain data from aat-ngas-5:8003, but the remote server already has a subscription registered with the same ID, but different details.
Instead of retrying to create this subscription over and over, this server will give up now. To fit this either remove the remote subscription, or change the ID of the subscription to be created in the local server configuration.
Note: This is an automatically generated message
There is a typo 'fit' -> 'fix' in the email message.
from ngas.
@rtobar I noticed that shutting down the subscriber removes the subscriptions from the ngas_subscribers
table. This happens also when the subscriber ID was already taken. I think this bahviour is probably the best solution.
from ngas.
@smclay thanks for the detailed report. I've addressed all the points you mentioned (typo in email, unformatted log statement, ValueError
issue) while maintaining the behavior at shutdown. If you pull the issue-50
branch you should see the new commit with the fixes. Let me know if things are correctly for you. Hopefully this time I got things right, and if so I'll merge back to the master
branch.
from ngas.
@rtobar I tested your latest changes today. Everything looks good...
2021-06-07T13:12:31.566 [31652] [SUBSCRIBER] [ ERROR] ngamsServer.ngamsSrvUtils#_create_remote_subscriptions:158 Different subscription with ID 'aat-ngas-6:8001' already exists, giving up
2021-06-07T13:12:31.568 [31652] [SUBSCRIBER] [ INFO] ngamsLib.ngamsNotification#_sendNotifMsg:121 Sending Notification Message to: [email protected]. Subject: aat-ngas-6:8001: Automatic subscription cannot be created
...
2021-06-07T13:12:41.601 [31652] [SUBSCRIBER] [ INFO] ngamsServer.ngamsSrvUtils#_create_remote_subscriptions:184 No Subscriptions established
I think we can now close this issue. Thank you for the improvements.
from ngas.
Thanks @smclay for double-checking the new changes, I've just merged the new issue-50
branch into master
.
from ngas.
Related Issues (20)
- XML response contains invalid link to NgamsStatus DTD HOT 7
- ngamsServer and ngamsDaemon use different PID files HOT 1
- NGAMS python client --server option bug HOT 1
- Checksum calculation mismatch failure on python 3 HOT 3
- Improve path computation fixed in #80
- Deprecated function calls in src/ngamsCore/ngamsLib HOT 4
- ngamsDapiStatus attribute __ioTime: Integer vs. Float HOT 1
- ngamsCore/ngamsLib/logutils.py inconsistent types of local variable HOT 1
- ngamsCore, function isoTime2Secs() mixes Integers and Floats HOT 1
- ngamsUDT/ngamsUDTSender.py not Python3 ready HOT 1
- ngamsUtils/ngamsUtils/ngasUtilsLib.py, function decrypt_access_code() is broken HOT 2
- There is no six.input() function HOT 1
- base64 decoding of DB password is incorrect Python3 code HOT 1
- Insufficient error checks in get_db_parameters() function HOT 2
- more Python3-related changes to ngamsUtils/ngamsUtils/ngasXSyncTool.py HOT 1
- Python3-related fixes to ngamsUtils/ngamsUtils/ngasVerifyCloning.py HOT 1
- Python3-related issues in ngamsUtils/ngamsUtils/ngasCheckFileCopies.py HOT 2
- Defunct (Janitor) process after OFFLINE,ONLINE (or INIT) HOT 2
- QUERY/files command does not support UseFileIgnore=false HOT 4
- Backward compatible checksum variant HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ngas.