GithubHelp home page GithubHelp logo

Failure for ~200k messages about imap2maildir HOT 13 OPEN

viric avatar viric commented on June 24, 2024
Failure for ~200k messages

from imap2maildir.

Comments (13)

viric avatar viric commented on June 24, 2024

I notice that in checkmessage() the turbo mode does an sql select query for every possible message to check if the message is there. This is a lot of work; I think that it would be far better to get the list into memory into an appropiate searchable structure, and do the check there.

from imap2maildir.

rtucker avatar rtucker commented on June 24, 2024

I've run into a couple cases where a specific message is "corrupted" on gmail's end, and trying to fetch it via IMAP fails. In simpleimap.py, putting a try/except around the get_summary_by_uid should find the IMAP UID that is choking it:

try:
    summ = self.__parent.get_summary_by_uid(u)
except:
    print "uid", u
    raise

Once you have that, it should be possible to delete the offending message.

It should be doing a better job of handling errors such as these. And yes, it is doing a SQL query for each UID... I don't remember why I did it that way, but I think memory consumption was a concern. On second thought, it shouldn't take THAT much memory, and it would likely improve performance a lot. :-) Good catch.

from imap2maildir.

viric avatar viric commented on June 24, 2024

Gmail simply closes the socket due to that much inactivity during the first stage of the TURBO MODE.

Once having the list of uids on memory, and checking there instead of by a sql query per uid, I think the turbo mode will work great.

I'm trying without turbo mode, but gmail disconnects me before I can reach even the 15% of my mail.

from imap2maildir.

rtucker avatar rtucker commented on June 24, 2024

Well.

On my gmail mailbox of ~145,000 messages,
Last night's run: about 3.75 hours
With a cache: 7 minutes, 22 seconds

Pull in the latest HEAD and let me know how that works for you.

from imap2maildir.

viric avatar viric commented on June 24, 2024

I just tried. I got, with turbo mode, with the old maildir directory that had some letters:

Exception!  Clearing locks and safing database.
Traceback (most recent call last):
  File "./imap2maildir", line 536, in 
    main()
  File "./imap2maildir", line 517, in main
    seencache=seencache)
  File "./imap2maildir", line 435, in copy_messages_by_folder
    for i in folder.Summaries(search=search):
  File "/home/llbatlle/tmp/imap2maildir/simpleimap.py", line 357, in Summaries
    summ = self.__parent.get_summary_by_uid(u)
  File "/home/llbatlle/tmp/imap2maildir/simpleimap.py", line 256, in get_summary_by_uid
    '(UID ENVELOPE RFC822.SIZE INTERNALDATE)')
  File "/nix/store/hd089201zv5fb1lqdxscv194snnynplj-python-2.7/lib/python2.7/imaplib.py", line 753, in uid
    typ, dat = self._simple_command(name, command, *args)
  File "/nix/store/hd089201zv5fb1lqdxscv194snnynplj-python-2.7/lib/python2.7/imaplib.py", line 1060, in _simple_command
    return self._command_complete(name, self._command(name, *args))
  File "/nix/store/hd089201zv5fb1lqdxscv194snnynplj-python-2.7/lib/python2.7/imaplib.py", line 890, in _command_complete
    raise self.abort('command: %s => %s' % (name, val))
imaplib.abort: command: UID => socket error: unterminated line

I am not very good at python, so sorry if I don't get more into details of the code. :)
I will try again creating a new maildir.

from imap2maildir.

rtucker avatar rtucker commented on June 24, 2024

Well, at least it should be faster to test :-)

I just pushed a patch that will spit out the UID it choked on. Once you have that UID, you can try firing up Python and seeing if you can figure out what's wrong with the message:

import simpleimap
server = simpleimap.Server(hostname='imap.gmail.com', username='[email protected]', password='blah').Get()
server.select('[Gmail]/All Mail')
server.uid('FETCH', 376544, '(RFC822)')

... would spit out message uid 376544. Try the neighboring messages (presumably 376543 and 376545) as well. You can also try:

    server.uid('FETCH', 376544, '(UID ENVELOPE RFC822.SIZE INTERNALDATE)')

to see what that does, since that's what it is trying to do when it crashes.

imap2maildir could easily ignore this exception and have it continue on, but I think understanding why it is happening will be a very good thing.

Thanks! -rt

from imap2maildir.

viric avatar viric commented on June 24, 2024

Here you have it:

>>> server.uid('FETCH', 165982, '(RFC822)')
('OK', [('43816 (UID 165982 RFC822 {5523}', 'Delivered-To: [email protected]\r\nReceived: by 10.142.169.1 with SMTP id r1cs178792wfe;\r\n        Sun, 28 Sep 2008 07:49:53 -0700 (PDT)\r\nReceived: by 10.115.23.19 with SMTP id a19mr4311058waj.133.1222613393492;\r\n        Sun, 28 Sep 2008 07:49:53 -0700 (PDT)\r\nReturn-Path: \r\nReceived: from n16a.bullet.sp1.yahoo.com (n16a.bullet.sp1.yahoo.com [69.147.64.121])\r\n        by mx.google.com with SMTP id t1si2136057poh.13.2008.09.28.07.49.52;\r\n        Sun, 28 Sep 2008 07:49:52 -0700 (PDT)\r\nReceived-SPF: pass (google.com: domain of sentto-9862331-5848-1222613385-viriketo=gmail.com@returns.groups.yahoo.com designates 69.147.64.121 as permitted sender) client-ip=69.147.64.121;\r\nDomainKey-Status: good\r\nAuthentication-Results: mx.google.com; spf=pass (google.com: domain of sentto-9862331-5848-1222613385-viriketo=gmail.com@returns.groups.yahoo.com designates 69.147.64.121 as permitted sender) smtp.mail=sentto-9862331-5848-1222613385-viriketo=gmail.com@returns.groups.yahoo.com; domainkeys=pass [email protected]\r\nComment: DomainKeys? See http://antispam.yahoo.com/domainkeys\r\nDomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=lima; d=yahoogroups.com;\r\n\tb=LSlgVDUGFtooqe064kt32c5atqJ2pBA+7kklkoqGGl95lG8xCcl8wjfXI6G5C61jPvg4vE0TWl1f2ZdNkYh5Xeade6B9I0le2BqDz8bMtZLINLIKi8XRYyp1pFTQEyGw;\r\nReceived: from [69.147.65.171] by n16.bullet.sp1.yahoo.com with NNFMP; 28 Sep 2008 14:49:45 -0000\r\nReceived: from [66.218.67.109] by t13.bullet.mail.sp1.yahoo.com with NNFMP; 28 Sep 2008 14:49:45 -0000\r\nX-Yahoo-Newman-Id: 9862331-m5848\r\nX-Sender: [email protected]\r\nX-Apparently-To: [email protected]\r\nX-Received: (qmail 68424 invoked from network); 28 Sep 2008 14:49:42 -0000\r\nX-Received: from unknown (66.218.67.96)\r\n  by m45.grp.scd.yahoo.com with QMQP; 28 Sep 2008 14:49:42 -0000\r\nX-Received: from unknown (HELO mail.libertysurf.net) (213.36.80.105)\r\n  by mta17.grp.scd.yahoo.com with SMTP; 28 Sep 2008 14:49:42 -0000\r\nX-Received: from aliceadsl.fr (192.168.10.57) by mail.libertysurf.net (8.0.015)\r\n        id 482DC6AA00F031DC for [email protected]; Sun, 28 Sep 2008 16:49:42 +0200\r\nMessage-Id: \r\nX-Sensitivity: 3\r\nTo: "=?iso-8859-1?Q?tradukado?=" \r\nX-XaM3-API-Version: 3.2 R18 (B34 pl1)\r\nX-type: 0\r\nX-SenderIP: 91.171.195.43\r\nX-Originating-IP: 213.36.80.105\r\nX-eGroups-Msg-Info: 1:12:0:0:0\r\nFrom: "[email protected]?=" \r\nX-Yahoo-Profile: jorgos_esperanto\r\nSender: [email protected]\r\nMIME-Version: 1.0\r\nMailing-List: list [email protected]; contact [email protected]\r\nDelivered-To: mailing list [email protected]\r\nList-Id: \r\nPrecedence: bulk\r\nList-Unsubscribe: \r\nDate: Sun, 28 Sep 2008 16:49:42 +0200\r\nSubject: =?iso-8859-1?Q?Re:[tradukado]_verboj_por_tabulaj_sportoj_(surftabulo,\r\n\t_negxtabulo,_rultabulo,_ktp)?=\r\nReply-To: [email protected]\r\nX-Yahoo-Newman-Property: groups-email-tradt-m\r\nContent-Type: text/plain; charset=ISO-8859-1\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\nOni jam delonge neplu biciklumas au gitarludas sed biciklas=0D\r\nkaj gitaras (kvankam ne mem estas biciklo au gitaro) kaj=0D\r\npraktikas bicikladon kaj gitaradon, ^cu ne ? ; nu kial ne ? =0D\r\n=0D\r\n^Ciu elektu mem kaj la popolo decidos tion, kion akcepti...=0D\r\n=0D\r\nJs.=0D\r\n=0D\r\ntradukado, 28 Sep 2008 : verboj por tabulaj sportoj=0D\r\n(surftabulo, negxtabulo, rultabulo, ktp)=0D\r\n=0D\r\nSaluton,=0D\r\nkiel vi verbe esprimus la diversajn X-tabulan sportojn, ekz=0D\r\nuzon de=0D\r\nsurftabulo, negxtabulo, rultabulo, ktp?=0D\r\n1. simple verbigu la substantivon, kompreneble!=0D\r\nsurftabuli, negxtabuli, rultabuli, ...  Do "Li X-tabulas."=0D\r\n2. ne ne, tia verba formo de "tabul-" sensencas aux sugestas=0D\r\nke la=0D\r\nsubjekto ESTAS tia tabulo, do necesas aldoni -um al la=0D\r\nsubstantivo:=0D\r\nsurftabulumi, negxtabulumi, rultabulumi, ... Do "Li X-tabulumas"=0D\r\n3. ne eblas verbigi tiel, oni bezonas uzi ian verbon kun la=0D\r\nsubstantivo: rajdi surftabulon, gliti sur negxtabulo, veturi=0D\r\nper rultabulo, ... Do "Li iras per X-tabulo" aux "Li iras=0D\r\nX-tabule" ktp=0D\r\n4. io alia...?=0D\r\nKiel oni nomu la agadojn substantive?=0D\r\n1. surftabulado, negxtabulado, rultabulado, ...=0D\r\n2. surftabulumado, negxtabulumado, rultabulumado, ...=0D\r\n3. surftabulrajdado, negxtabulglitado, rultabulveturado, ...=0D\r\n4. io alia...?=0D\r\ndankon,    russ=0D\r\n\r\n\r\n\r\n---------------------- ALICE N=B01 de la RELATION CLIENT 2008*-------------=\r\n-------\r\nD=E9couvrez vite l\'offre exclusive ALICE BOX! En cliquant ici http://abonne=\r\nment.aliceadsl.fr Offre soumise =E0 conditions.*Source : TNS SOFRES / BEARI=\r\nNG POINT. Secteur Fournisseur d.Acc=E8s Internet\r\n\r\n\r\n\r\n------------------------------------\r\n\r\nYahoo! Groups Links\r\n\r\n<*> To visit your group on the web, go to:\r\n    http://groups.yahoo.com/group/tradukado/\r\n\r\n<*> Your email settings:\r\n    Individual Email | Traditional\r\n\r\n<*> To change settings online go to:\r\n    http://groups.yahoo.com/group/tradukado/join\r\n    (Yahoo! ID required)\r\n\r\n<*> To change settings via email:\r\n    mailto:[email protected]=20\r\n    mailto:[email protected]\r\n\r\n<*> To unsubscribe from this group, send an email to:\r\n    [email protected]\r\n\r\n<*> Your use of Yahoo! Groups is subject to:\r\n    http://docs.yahoo.com/info/terms/\r\n\r\n'), ' FLAGS (\\Seen))'])

The big trouble looks like the Subject: line having a \r\n\t in the middle.

The relevant information
from rfc2822 is in section 2.2.3. In short:

"""
The process of moving from this folded multiple-line
representation of a header field to its single line
representation is called "unfolding". Unfolding is
accomplished by simply removing any CRLF that is
immediately followed by WSP. Each header field should
be treated in its unfolded form for further syntactic
and semantic evaluation.
"""
(I took this reference from this http://bugs.python.org/issue504152 )

from imap2maildir.

viric avatar viric commented on June 24, 2024

Sorry, I notice it is a problem of imaplib, still in python2,.7 and python3.
I'll have to get around it somehow.

from imap2maildir.

viric avatar viric commented on June 24, 2024

I had the chance to investigate the issue more. My mailbox has messages from a specific person that, when he wrote long Subjects, his letters were written with an RFC 2822 violation. Instead of breaking the subject with CRLF + WSP, his letters have the subject broken only LF + WSP. That affects parsing the ENVELOPE answer, as imaplib works with readline(), and for readline() either \n or \r\n are end of lines.
I wrote a patch for imaplib so I can keep on downloading. When finding a line ending in \n (not \r\n), I concatenate the next line and remove the \n\t sequence.

from imap2maildir.

rtucker avatar rtucker commented on June 24, 2024

Cool! I, unfortunately, haven't had a chance to look at this yet but that's probably where I was headed.

I am not opposed to working around bugs in imaplib.py using simpleimap.py... see the SimpleImapSSL class for an example of this. The process of getting a bug fixed in the Python library is very slow, and then it has to actually make it onto people's systems via Debian/Ubuntu/RHEL/CentOS/. And yes, there are more than a few such bugs.

from imap2maildir.

viric avatar viric commented on June 24, 2024

Once I success getting all my gmail mail, I'll try to write something worth sending, for that bug.

from imap2maildir.

viric avatar viric commented on June 24, 2024

Ouch - my quick hack worked for the case I had, but I got a new more difficult to defeat, also failing in the python library, not your code:
Date: Sat, 12 Aug 2006 21:07:54 +0400
Subject: [EK-MASI] =?koi8-r?B?IkFydG8ga2FqIGFrdGl2ZWNvIg0KDQojRWtvdG9waW8gMjAwNiBaYWplanhv?=
=?koi8-r?B?dmEgU2xvdmFraW8j?=

from imap2maildir.

rtucker avatar rtucker commented on June 24, 2024

Niiiice!

See my comment on Issue #10 -- having the "raw" response from the IMAP server helps with testing the weird ones.

from imap2maildir.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.