GithubHelp home page GithubHelp logo

linkedinattic / dmarc-msys Goto Github PK

View Code? Open in Web Editor NEW
36.0 36.0 3.0 34 KB

This set of scripts in Lua implements DMARC policy checking and reporting for the Message Systems MTA products, a popular extendable commercial MTA.

Lua 74.64% Python 25.36%

dmarc-msys's People

Contributors

franckhlmartin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dmarc-msys's Issues

PSL hash table perf.

I've wondered if avoiding multiple large hash table seeks in your loop, could be improved a bit this way:

...
local t = explode(".",domain);
local found=false;
--# if t ~= nil and #t >= 1 then
if #t >0 then --# believe t = nil and t = {} both returns 0, even t = {hash key = nil}
local seekdomain = {nil, nil, nil, nil, nil, nil, nil, nil}
local orgDomain = {nil, nil, nil, nil, nil, nil, nil, nil}
seekdomain=t[#t];
local j
for j=#t,0,-1 do
local psltag = psl[seekdomain]; --# search through large global hash table
if psltag then
if psltag=="" then
if j>=1 then
orgDomain=t[j-1].."."..seekdomain;
else
orgDomain=nil;
end
elseif psltag=="*" then
if j>=2 then
orgDomain=t[j-2].."."..t[j-1].."."..seekdomain;
else
orgDomain=nil;
end
elseif psltag=="!" then
orgDomain=seekdomain;
else
orgDomain=seekdomain;
end
found=true;
elseif found then
break;
end
if j>=1 then
seekdomain=t[j-1].."."..seekdomain;
end
end
end
...

ec_dkim_domains does not always report the domain in d=

The sieve function reports the domain present in i= if the tag is present otherwise it reports the domain in d=

This creates a problem to report the correct DKIM status for valid signatures that use different domains in d= and i=

PSL: more thread safety/performance

Line 277 reads:

t = explode(".",domain);

to make function getOrgDomain thread safe and perform better avoid having this table in global name space by:

local t = explode(".",domain);

same issue with loop variable j,seekdomain,orgDomain, so declare these local before loop in line 281

local j;
--# pre alloc table to avoid rehashing when doing table inserts in loop due to perf.
--# 8 elem arrays such cover most seen domains (otherwise do more :)
local orgDomain = {nil,nil,nil,nil,nil,nil,nil,nil};
local seekdomain = {nil,nil,nil,nil,nil,nil,nil,nil};

Perf wise 'psl[seekdomain]==' should a relative expensive Lua OP, considering PSL is +6500 entries, so I would prefer another lookup method like using a CDB or SQLite data source, which also can be cached. See separate issue...

PSL CDB data source

I'll prefer to load PSL into a CDB data source, hence I'll use below Python snippet to generate a CDB input source file. Hopefully this is faster than doing looping over Lua large hash table searches :)

The CDB data source is created by running these cmds (from a makefile):

${path_cdbtables}/psl.cdb: psl
@cat psl | ${python} psl2cdb.py | ${plocal}/cdbmake ${path_cdbtables}/psl.cdb ${path_cdbtables}/tmp.$$$$
@ls -lrt ${path_cdbtables}/

the 'psl' file (from http://publicsuffixlist.org) and psl2cdb.py are distributed through ecconfigd and kept under svn, each MTA nodes then runs make every 5 min.

psl2cdb.py:

--- Python snippet ---

import sys
from encodings.idna import ToASCII

-- read lines from stdin

for line in sys.stdin:
if line[0:2] != "//" and line[:-1] != "":
if line[0:1]=="!":
domain=line[1:-1].decode('string_escape')
val="!" # indicate negation
elif line[0:2]==".":
domain=line[2:-1].decode('string_escape')
val="
" # indicate wildcard
else:
domain=line[0:-1].decode('string_escape')
val="t" # indicate TLD :)
domain = domain.decode('UTF-8')
compd = domain.split(".")
newline=""
for atom in compd:
newline=newline+ToASCII(atom)+"."
kl = len(newline[:-1])
dl = len(val)
# using python 2.4 output formatting
print '+%d,%d:%s->%s' % (kl,dl,newline[:-1],val)

at final EOF \n

print ''

--- end of snippet

Remember to add CDB source to your ecelerity.conf with wanted cache size, TTL & path:

Datasource "publicsuffixlist" {
cache_size = "8192"
cache_life = "1800"
uri = ( "cdb://psl.cdb" )
}

Secure against multiple threads loading of PSL

test-and-set operations in parallel execution environments needs to be atomic OPs. So I suggest either changing this piece of v.1.17:

...
function getOrgDomain(domain)
if psl==nil then
psl=loadpsl();
end
...

to something like this:

...
local DMARC_PSL_mutex = '_DMARC_PSL_mutex';

function getOrgDomain(domain)
msys.lock(DMARC_PSL_mutex);
if psl==nil then
psl=loadpsl();
end
msys.unlock(DMARC_PSL_mutex);
...

or possible perform load PSL from a module:init function and then removed this test-and-set 3 lines of code :)

rua size-option awareness

A DMARC record's RUA target may specify a maximum report size accepted at the target URI. The reporting code should be aware of this and minimally strip the option before attempting to make use of the URI or URI-domain. A more complete solution obviously would make use of the data in the option.

Question:
do you consider+enforce the rua size limitation flag, such as provided by "rua=mailto:[email protected]!25m”? Do you enforce an alternate maximum?
or do you ignore it?
or do records with this option fail?

Answer from Franck:
I ignore it

and I think it would fail at the moment
https://github.com/linkedin/dmarc-msys/blob/master/dmarc_report.py#L85

because the code would extract the email address as [email protected]!25m which is an address with a domain tomki.com!25m not related to the organizational domain or from _report._dmarc….

It would fail only for the email addresses that are set like this.

The generated report is not RFC compliant

RFC-3462 requires reports to have the multipart/report content type for the whole report message.

Moreover, RFC-5965 has further requirements on content types for parts, e.g. the second MIME part should have the message/feedback-report content type and has three required headers for the message: Feedback-Type, User-Agent, and Version.

Finally, RFC-6591 has several requirements one of them is to set the Feedback-Type to auth-failure.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.