linkedinattic / dmarc-msys Goto Github PK
View Code? Open in Web Editor NEWThis set of scripts in Lua implements DMARC policy checking and reporting for the Message Systems MTA products, a popular extendable commercial MTA.
This set of scripts in Lua implements DMARC policy checking and reporting for the Message Systems MTA products, a popular extendable commercial MTA.
I've wondered if avoiding multiple large hash table seeks in your loop, could be improved a bit this way:
...
local t = explode(".",domain);
local found=false;
--# if t ~= nil and #t >= 1 then
if #t >0 then --# believe t = nil and t = {} both returns 0, even t = {hash key = nil}
local seekdomain = {nil, nil, nil, nil, nil, nil, nil, nil}
local orgDomain = {nil, nil, nil, nil, nil, nil, nil, nil}
seekdomain=t[#t];
local j
for j=#t,0,-1 do
local psltag = psl[seekdomain]; --# search through large global hash table
if psltag then
if psltag=="" then
if j>=1 then
orgDomain=t[j-1].."."..seekdomain;
else
orgDomain=nil;
end
elseif psltag=="*" then
if j>=2 then
orgDomain=t[j-2].."."..t[j-1].."."..seekdomain;
else
orgDomain=nil;
end
elseif psltag=="!" then
orgDomain=seekdomain;
else
orgDomain=seekdomain;
end
found=true;
elseif found then
break;
end
if j>=1 then
seekdomain=t[j-1].."."..seekdomain;
end
end
end
...
I don’t think that this is supposed to be a DKIM result of ‘neutral’, such as is being reported in the Authentication-Results headers from LinkedIn. This message should have been reported as ‘none’.
ref: http://tools.ietf.org/html/rfc5451#section-2.4.1
Issue is here: https://github.com/linkedin/dmarc-msys/blob/master/dmarc.lua#L141
The sieve function reports the domain present in i= if the tag is present otherwise it reports the domain in d=
This creates a problem to report the correct DKIM status for valid signatures that use different domains in d= and i=
Line 277 reads:
t = explode(".",domain);
to make function getOrgDomain thread safe and perform better avoid having this table in global name space by:
local t = explode(".",domain);
same issue with loop variable j,seekdomain,orgDomain, so declare these local before loop in line 281
local j;
--# pre alloc table to avoid rehashing when doing table inserts in loop due to perf.
--# 8 elem arrays such cover most seen domains (otherwise do more :)
local orgDomain = {nil,nil,nil,nil,nil,nil,nil,nil};
local seekdomain = {nil,nil,nil,nil,nil,nil,nil,nil};
Perf wise 'psl[seekdomain]==' should a relative expensive Lua OP, considering PSL is +6500 entries, so I would prefer another lookup method like using a CDB or SQLite data source, which also can be cached. See separate issue...
I'll prefer to load PSL into a CDB data source, hence I'll use below Python snippet to generate a CDB input source file. Hopefully this is faster than doing looping over Lua large hash table searches :)
The CDB data source is created by running these cmds (from a makefile):
${path_cdbtables}/psl.cdb: psl
@cat psl | ${python} psl2cdb.py | ${plocal}/cdbmake ${path_cdbtables}/psl.cdb ${path_cdbtables}/tmp.$$$$
@ls -lrt ${path_cdbtables}/
the 'psl' file (from http://publicsuffixlist.org) and psl2cdb.py are distributed through ecconfigd and kept under svn, each MTA nodes then runs make every 5 min.
psl2cdb.py:
import sys
from encodings.idna import ToASCII
for line in sys.stdin:
if line[0:2] != "//" and line[:-1] != "":
if line[0:1]=="!":
domain=line[1:-1].decode('string_escape')
val="!" # indicate negation
elif line[0:2]==".":
domain=line[2:-1].decode('string_escape')
val="" # indicate wildcard
else:
domain=line[0:-1].decode('string_escape')
val="t" # indicate TLD :)
domain = domain.decode('UTF-8')
compd = domain.split(".")
newline=""
for atom in compd:
newline=newline+ToASCII(atom)+"."
kl = len(newline[:-1])
dl = len(val)
# using python 2.4 output formatting
print '+%d,%d:%s->%s' % (kl,dl,newline[:-1],val)
print ''
Remember to add CDB source to your ecelerity.conf with wanted cache size, TTL & path:
Datasource "publicsuffixlist" {
cache_size = "8192"
cache_life = "1800"
uri = ( "cdb://psl.cdb" )
}
test-and-set operations in parallel execution environments needs to be atomic OPs. So I suggest either changing this piece of v.1.17:
...
function getOrgDomain(domain)
if psl==nil then
psl=loadpsl();
end
...
to something like this:
...
local DMARC_PSL_mutex = '_DMARC_PSL_mutex';
function getOrgDomain(domain)
msys.lock(DMARC_PSL_mutex);
if psl==nil then
psl=loadpsl();
end
msys.unlock(DMARC_PSL_mutex);
...
or possible perform load PSL from a module:init function and then removed this test-and-set 3 lines of code :)
A DMARC record's RUA target may specify a maximum report size accepted at the target URI. The reporting code should be aware of this and minimally strip the option before attempting to make use of the URI or URI-domain. A more complete solution obviously would make use of the data in the option.
Question:
do you consider+enforce the rua size limitation flag, such as provided by "rua=mailto:[email protected]!25m”? Do you enforce an alternate maximum?
or do you ignore it?
or do records with this option fail?
Answer from Franck:
I ignore it
and I think it would fail at the moment
https://github.com/linkedin/dmarc-msys/blob/master/dmarc_report.py#L85
because the code would extract the email address as [email protected]!25m which is an address with a domain tomki.com!25m not related to the organizational domain or from _report._dmarc….
It would fail only for the email addresses that are set like this.
RFC 5965 requires that the field is a valid email Date header (http://pretty-rfc.herokuapp.com/RFC5322#datetime) and this is not one:
Arrival-Date: 2014-04-29 02:05:14 UTC
RFC-3462 requires reports to have the multipart/report content type for the whole report message.
Moreover, RFC-5965 has further requirements on content types for parts, e.g. the second MIME part should have the message/feedback-report content type and has three required headers for the message: Feedback-Type, User-Agent, and Version.
Finally, RFC-6591 has several requirements one of them is to set the Feedback-Type to auth-failure.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.