Objective to achieve were to save the time for admin and technical personal to check RSS feed aggregators. This program were not meant to cover all use cases. However, serve as a start with fundamental approach to automate the process.
See script automation with CRON in action:
Here is an example RSS priodically update for fimware release
This serve as an example, RSS xml file can come from any provider, as long as follow RSS/ATOM namaspace format.
What is RSS (RDF (Resource Description Framework) Site Summary or Really Simple Syndication). To put into simple term, RSS is a dialect of XML. All RSS files must conform to the XML 1.0 specification, as published on the World Wide Web Consortium (W3C) website.
web feed that allows users and applications to access updates to websites in a standardized, computer-readable format. Subscribing to RSS feeds can allow a user to keep track of many different websites in a single news aggregator, which constantly monitor sites for new content, removing the need for the user to manually check them.
Websites usually use RSS feeds to publish frequently updated information, such as blog entries, news headlines, episodes of audio and video series, or for distributing podcasts. An RSS document (called "feed", "web feed", or "channel") includes full or summarized text, and metadata, like publishing date and author's name. RSS formats are specified using a generic XML file.
These features and functions allow automation possible due to machine-readable information metadata feeds to be process and handle information greater efficiency and certainty.
In order to work with RSS feed metadata. we need to understand have a bit of unerstanding how RSS XML structure constructed.
At the top level, a RSS document is a <rss> element, with a mandatory attribute called version, that specifies the version of RSS that the document conforms to. If it conforms to this specification, the version attribute must be 2.0.
Subordinate to the <rss> element is a single <channel> element, which contains information about the channel (metadata) and its contents.
XML namespaces provide a simple method for qualifying element and attribute names used in Extensible Markup Language documents by associating them with namespaces identified by URI references.
Generally common available namespace provided by W3C were sufficient to common use cases. namespace topic were beyond the scope of this article. if you interested to know more.
More info can be found in below articles.
https://www.disobey.com/detergent/2002/extendingrss2/
https://validator.w3.org/feed/docs/howto/declare_namespaces.html
- atoma
- lxml
- ElementTree
- minidom
Tried with some others XML parser stated above. I end up with lxml as its compatible with well-known ElementTree API, and documentation seem cover most of common use cases. Additionally, lxml support xpath() method.
code snippets example usage for atoma and lxml for comparison.
mylist = [0,1,3]
afeed = atoma.parse_rss_file('rss_files/firmware.xml')
efdx = etree.parse('rss_files/firmware.xml', etree.XMLParser())
itemtitle = efdx.xpath(".//item")
def lxmlreleaseprod(j):
myklist = []
for i in j:
myset = etree.tostring(itemtitle[i], pretty_print=True).decode()
k_myset = re.findall(r'^<title>.*<\/title>$', myset, re.M)
pk_myset = re.sub(r'<title>|<\/title>', '', k_myset[0])
myklist.append(pk_myset)
return myklist
def atomareleasedprod(k):
myklist = []
for i in k:
myklist.append(afeed.items[i].title)
return myklist
print(*lxmlreleaseprod(mylist), sep=', ', end=' ')
print(*m, sep=' ')
print(*atomareleasedprod(mylist), sep=', ', end=' ')
print(*l, sep=' ')
Output:
FortiSOAR 7.5.0, FortiAP-W2 7.2.4, FortiPAM 1.3.0 lxml 2024-04-10 18:00:03.717363
FortiSOAR 7.5.0, FortiAP-W2 7.2.4, FortiPAM 1.3.0 atoma 2024-04-10 18:00:03.717363
- pull latest feeds xml as memory
- locate today's latest build release date vs local today's date - 1 day (GMT -7)
- comparison checks; if latest build release date same as today's date, means there were new releases; if latest build release date not the same as today's date, means no new releases as of today.
- checks return False;stop further process
- wait for next schedule; 24 hours; 7:59am GMT+8
- checks return True; proceed download latest copy of XML file
- parse XML file metadata content
- perform checks and title metadata extraction
- complile metadata into JSON format, and sent via webhook url
load rss xml file into memory with niquests. limit bytes range to be pull from rss url top level <rss> and <channel> elements without <items> contents.
pre_date = todaydate.strftime('%e %b %Y')
# send client side HTTP header range option to request server only send interested bytes range; adjust based on Channel section.
headers = {'Range': 'bytes=1-418'}
# send HTTP GET request with URL and headers
r = niquests.get('https://support.fortinet.com/rss/firmware.xml', headers=headers)
# iterate mem loaded data (byte size 418)
for line in io.StringIO(r.text):
# check if line contain "lastBuildDate"
resultss = re.search(r'lastBuildDate', line)
# due to empty match regex will return None, reverse check the regex search result
if resultss != None:
# remove new line
one_line = line.replace("\n", "")
# check if lastBuildDate equal to today's date
if pre_date.lstrip() == " ".join(one_line.split(' ')[1:4]):
return True
else:
return False
isolated releases for the day. perform extraction on each product <title> metadata add into list for further process later
# initiate empty list for product items
k_release_info = []
# parse xml file
fdx = etree.parse ('rss_files/firmware.xml', etree.XMLParser())
# setup search path with "item"
itemtitle = fdx.xpath(".//item")
# extract child group as string format
for i in get_today_released_firmware():
myset = etree.tostring(itemtitle[i], pretty_print=True).decode()
# find product title for each released product
k_myset = re.findall(r'^<title>.*<\/title>$', myset, re.M)
# replace HTML tag from product title
pk_myset = re.sub(r'<title>|<\/title>', '', k_myset[0])
# append each released product items into prep list
k_release_info.append(pk_myset)
# return captured product items as list
return k_release_info
XML <item> content
Some security measure by moving webhook url into separate ini config, and include into .gitignore file. more into about configparser.
These steps were to encourage you to practise basic security measure, merely to demontrate way to separate secret info. uses configparser is just one of the methods.
Here is a discussion about that topic.
storing-the-secrets-passwords-in-a-separate-file
https://stackoverflow.com/questions/25501403/storing-the-secrets-passwords-in-a-separate-file
[webhook.prod.url]
# forticlient webhook url
fct_ems_webhook_url = 'https://example.webhook.office.com/webhookb2/<hash-key>/IncomingWebhook/<hash-key>'
Perform regex match to capture respective product name. Follow by, converting captured data into JSON format as python dict data type does not equal to json string type, necessary format dict data type into json (string) type with json.dumps(). if you need to load json into python as dict, uses json module load function json.loads(). once data prep with JSON type, important text message can be sent via webhook into respective Teams Channel.
JSON strings always support unicode
JSON | Python |
---|---|
string | string *1 |
number | int/float *2 |
object | dict |
array | list |
boolean | bool |
null | None |
one of the challenges encounter during testing phase were, conneciton sometime may gets timeout or unreachable to Teams WebHook server. solution to counter the timeout issue were to add timeout options to give it more time before actual timeout expired.
exceptions.ConnectTimeout
config = configparser.ConfigParser()
config.read('config/mysecret.ini')
# setup respective webhook teams channel link
# forticlient webhook url
fct_ems_webhook_url = config['webhook.prod.url'][fct_ems_webhook_url]
# Fortiweb webhook url
fwb_webhook_url = config['webhook.prod.url'][fwb_webhook_url]
# fortimanager webhook url
fmg_webhook_url = config['webhook.prod.url'][fmg_webhook_url]
# fortisiem webhook url
siem_webhook_url = config['webhook.prod.url'][siem_webhook_url]
# setup HTTP header json data type
heasders = {"content-type": 'application/json'}
# if title match condition; send respective webhook message
for prod in extract_release_info():
# setup filters to match respective products; add timeout to handle slow server response
if re.match(r'FortiADC|FortiWeb|FortiDDoS', prod):
# convert dict into json format with json.dumps
mydata = {'text': prod + ' released'}
send_post_webhook = niquests.post(fwb_webhook_url, data=json.dumps(mydata), headers=heasders, timeout=150)
There were discussion about whether uses python scheduler or system scheduler. Here is the topic for the discussion should which approach suitable for your use case.
https://stackoverflow.com/questions/31189783/schedule-time-and-date-to-run-script
https://stackoverflow.com/questions/68862565/running-scheduled-task-in-python
For this particular example, i uses linux cron as scheduler
webhook_output.log logging output example
no release update - 2024-04-08 09:00:01.412303
no release update - 2024-04-09 09:00:01.936488
FortiAP-U 7.0.3, FortiWeb 7.4.3 released on 2024-04-10 10:39:10.573758
Here is the guide how to enable Teams Channel webhook
python -m venv .venv
Windows:
.\.venv\Scripts\activate
Linux & Unix:
source .venv/bin/activate
pip install -r requirements.txt
git clone https://github.com/scheehan/rss_update_teams_webhook
cd rss_update_teams_webhook
crontab -e
59 7 * * * /usr/bin/python /var/automation/rss_update_teams_webhook/rss_tracker_with_webhook.py >> /tmp/webhook_output.log 2>&1