RedactKit is a CLI tool to redact and un-redact sensitive data from multiple log files on Government Standard Image Build (GSIB) devices.
What is it about?
GovTech GIG, as the Infrastructure Engineer Capability Center, provides functional leadership to WOG. As part of this initiative, there are tools and processes that Agencies can leverage for their technical operations work. This tool addresses the common issue GIG has faced during the Co-sourcing model with vendors and product principals.
Scenario:
- When we seek support from product principals, there may be instances where we will need to send logs with sensitive internal IP addresses, URLs, email addresses, SOE IDs etc. to them.
- Engineers will then need to manually eyeball and redact such data which could be time-consuming and prone to errors.
- This tool enables engineers to automate this process and save time, thereby, reducing operation overheads and errors.
Why use a tool?
- To redact sensitive data like internal IP addresses, emails, domain names, hostnames and SOE-IDs before sending them to product principles for troubleshooting.
- Sure, you can use
sed
andgrep
to redact sensitive data. But the original data is lost. - RedactKit CLI tokenizes the sensitive data for later un-redaction if you need to deep dive into certain parts of the log file during troubleshooting.
A python-based command line tool that helps you automate the redaction of common sensitive data from the log files. The tool can be used on GSIB via Powershell. Engineers can redact / un-redact sensitive log data using the tool.
The core redaction engine redacts the following list of data types from your log files. (Extensible to other types of data based on user-defined regular expressions). ๐ โ๏ธ
- SG NRIC ๐ (M Series not included yet)
- Credit cards ๐ง
- URLs ๐
- Emails โ๏ธ
- Ipv4 ๐
- Ipv6 ๐
- Base64
๐ ฑ๏ธ - SOE-ID ๐
Saves time โณ. Focus on what matters. Here is a sample redaction run on a log file with over 10k lines. If an engineer were to manually go through this it could take about ~6 hours.
[+] Redacted 10072 targets...
[+] Redacted results saved to ./redacted_test.txt
[+] Estimated total words : 29052
[+] Estimated total minutes saved : 388
[+] Estimated total man hours saved : 6
The tool is available on software center as a Government Standard Software Package (GSSP) immediately. (GSSP_Python310 RedactKit_0.1.2)
Agency IT reps can opt to list it in WOG App Library for their respective agency's use.
- Original Ideation by Benjamin Quek
- Senior Infrastructure Engineer - LinkedIn
- Improved and expanded more features by Oaker Min
- Infrastructure Engineer - LinkedIn
The RedactKit CLI also has an upstream open source version on GitHub. You can get involved here: PyRedactKit