GithubHelp home page GithubHelp logo

rtucker / imap2maildir Goto Github PK

View Code? Open in Web Editor NEW
98.0 98.0 22.0 129 KB

Backs up an IMAP mailbox to a maildir. Useful for backing up mail stored on free webmail providers, etc.

Home Page: http://blog.hoopycat.com/index.php/2009/07/04/imap2maildir-a-tool-for-mirroring-imap-t

License: MIT License

Python 40.30% HTML 59.70%

imap2maildir's People

Contributors

aaptel avatar dhellmann avatar marknenadov avatar rtucker avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

imap2maildir's Issues

TypeError in parseFetch

When trying imap2maildir for the first time, I find it first downloads many (perhaps all) all of the messages and then crashes with the exception.

 TypeError: expected string or buffer

If I repeat it, then it crashes immediately (although if new messages have appeared in the meantime it might download them, I haven't checked). I am running commit 9d8a6ef, under Python v 2.7.2, on Ubuntu 11.10, the connection is IMAP over SSL.

I'd send a more complete log and a stack trace if I knew where to post it, but in short, the problem is that the "text" parameter to __simplebase.parseFetch is a tuple rather than a string. the value is:

text[0] == '182 (UID 183 RFC822.SIZE 3297 INTERNALDATE "08-Dec-2011 16:40:03 +0100" ENVELOPE ("Thu, 8 Dec 2011 16:40:03 +0100" {79}'

text[1] == (the subject line of one of my emails)

This tuple is the first "data" element returned from the FETCH command in get_summary_by_uid. My reading of the imaplib documentation makes me expect a tuple in this situation (although unbalanced parentheses in text[0] surprises me).

Finally I can't get away with simple tricks like, sending data[0][0] or ' '.join(data[0]) to parseFetch -- there seems to be some nontrivial change in format. My imaplib.py says it's version is 2.58

Cheers.

Better UID handling

Per the RFC: http://tools.ietf.org/html/rfc3501#section-2.3.1.1

  1. The combination of mailbox name, uidvalidity value, and uid is unique on a server
  2. If uidvalidity increments, uids are no longer guaranteed to be as they were for a lower uidvalidity
  3. uids are assigned in a strictly ascending order

I think this could be used to improve uid handling a lot.

Search criteria problem

I try "FROM foo" och "KEYWORD bar" but gets UID errors. Not implemented more-than-one-word search criterias?

(Tried all (?) kinds of quoting to get the second word in, with no success.)

/Peter Svanberg, Sweden

Missing Chat Logs

I am pretty sure this doesn't download chatlogs. Is it by design? How hard would it be to get the chat logs as well?

Thanks a lot.

Bad quote_re

I noticed the quote_re failed for a letter I have. It has the quoted text:

"blablabla"

I updated quote_re in simpleimap.py to get it working to this:
quoted_re = re.compile(r'^"((?:[^"]|(?:\)|"|))"')
while you had:
quoted_re = re.compile(r'^"((?:[^"]|"|)
?)"')

Do you think it will work?

Notice that I did not understand the last '?' you had so I removed it.

"Hole" in mailbox not being properly handled

rtucker@arrogant-bastard:~$ /home/rtucker/bin/imap2maildir -m 1 -c /home/rtucker/etc/imap2maildir.conf -v
Opening sqlite3 database '/home/rtucker/Backups/Gmail/.imap2maildir.sqlite'
Attempting SSL connection to imap.gmail.com:993
* OK Gimap ready for requests from 74.67.182.52 v32if332749wah.16
Connected to 209.85.147.109:993 using a SSL connection.
Selected mailbox [Gmail]/All Mail, message count: 89582
Preparing to fetch batch from 31414 to 31414
Num: 31414 UID: 35174 Size:    7878 Date:      "22-May-2006 17:14:43 +0000" Msgid:<25722210.1148318082509.JavaMail.evite@www3>
List from 31414 to 31414 complete, 1 received, 7878 bytes found
High water mark from last: seq 31414.
Retrieving messages from 31414 to 31415...
Preparing to fetch batch from 31414 to 31415
Traceback (most recent call last):
  File "/home/rtucker/bin/imap2maildir", line 514, in 
    main()
  File "/home/rtucker/bin/imap2maildir", line 486, in main
    dict = retreive_imap_message_listchunk(imap, rangestart, rangeend)
  File "/home/rtucker/bin/imap2maildir", line 136, in retreive_imap_message_listchunk
    raise IOError("Unexpected response from IMAP server: %s" % `response`)
IOError: Unexpected response from IMAP server: ('NO', ['Some messages could not be FETCHed (Failure)'])

in manual testing:

>>> imap.fetch(31415,'(UID ENVELOPE RFC822.SIZE INTERNALDATE)')
('NO', ['Some messages could not be FETCHed (Failure)'])

this is going to be a fun one, isn't it?

Turbo mode skipping emails it shouldn't skip

I have some folders on my IMAP servers where imap2maildir tubo mode is eager to turbo new emails. It is very annoying. I have to do non-turbo mode on those, and it takes a long time to run.

$ imap2maildir -c imap2md.conf -v --create -d archive/main -r INBOX/main -s ALL
Opening sqlite3 database 'archive/main/.imap2maildir.sqlite'
Synchronizing 833 messages from gwmail.emea.novell.com:INBOX/main to /home/aaptel/mails/archive/main...
TURBO MODE ENGAGED!
Populating hash cache...
Hash cache: 2224 hashes
Populating uid cache...
Uid cache: 1405 uids
FINISHED: Turboed 833, handled 0, copied 0 (0 bytes), last UID was 0

I've added a few print statements here and there (simpleimap.py Summaries)

def Summaries(self, search='ALL'):
    """ Summaries
    """

    if self.__turbo:
        self.__parent.select(self.__folder, readonly=True)
        for u in self.Uids(search=search):
            if not self.__turbo(u):
                print(' not TURBO u= '+repr(u))
                try:
                    summ = self.__parent.get_summary_by_uid(u)
                    if summ:
                        yield summ
                except Exception:
                    logging.exception("Couldn't retrieve uid %s", u)
                    continue
            else:
                print('     TURBO u= '+repr(u)) ### print 833 times here
                # long hangtimes can suck
                self.__keepaliver()
                self.__turbocounter += 1
    else:
        for s in self.__parent.get_summaries_by_folder(self.__folder, self.__charset, search):
            yield s

And in imap2maildir check_message()

elif uid:
    print("u=%s uid"%uid) ### prints 833 times here
    if str(uid) in seencache.uids:
        mailfile = seencache.uids[str(uid)]
    else:
        c.execute('select mailfile from seenmessages where uid=?', (uid,))
        row = c.fetchone()
        if row:
            log.debug("Cache miss on uid %s" % uid)
            mailfile = row[0]
        else:
            return False

I'm not sure what is happening but it is super annoying >:(

Invalid mailbox name

I'm trying to access a Dovecop IMAP maildir, but no matter what I specify as the maildir I get the error "Exception: folder Maildir: Mailbox doesn't exist: Maildir".

--remote-folder="Maildir"

In my installation I have the following setup:

/var/vmail/[email protected]/Maildir/

In that there is cur/ new/ tmp/ .INBOX.Drafts/ .INBOX.Sent/ .INBOX.Trash/ .Queue/

How do I fetch all that then?

Gmail requires localized --remote-folder="[Gmail]/All Mail" parameter

At first I thought I was confronted with the UK trademark bug mentioned in the README, because I received the Exception: folder [Gmail]/All Mail: [NONEXISTENT] Unknown Mailbox: [Gmail]/All Mail (Failure) error (despite being in France).

However after checking the real name of the folder via Thunderbird (via right click on folder > Properties), I noticed it was translated in my language on the server as well.

So I had to use --remote-folder="[Gmail]/Tous les messages" for the script to work. Again, a warning about the necessity to use the translated name when using non-English accounts may come in handy :)

Cheers

Warn that it must be run with python2

Some distros such as Arch uses Python 3 by default. Maybe add a small warning in the README to precise the program must be run with python2, so that oblivious users such as myself won't be stuck in awe in front of the many errors spat :)

Better handling of large mailboxes (make -m moot)

Right now, there's a per-run limit, because each message being checked requires that much more RAM. Making this more smart would be a good idea, so it could make it all the way through a mailbox without using a lot of RAM.

A way to do this might be to wrap things in a loop and do 1000 messages at a time. Or, using a temp file to store the FETCH results from the server and then iterating through there could do the trick.

Couldn't retrieve uid b'116'

Attempting to download a regular IMAP folder I'm getting the following traceback:

Synchronizing 23 messages from hostname:INBOX to /home/marius/hostname/username...
ERROR:root:Couldn't retrieve uid b'116'
Traceback (most recent call last):
File "/home/marius/src/imap2maildir/simpleimap.py", line 445, in Summaries
summ = self.__parent.get_summary_by_uid(u)
File "/home/marius/src/imap2maildir/simpleimap.py", line 303, in get_summary_by_uid
return self.parse_summary_data(data)
File "/home/marius/src/imap2maildir/simpleimap.py", line 334, in parse_summary_data
combined_data = ' '.join(data)
TypeError: sequence item 0: expected str instance, bytes found
FINISHED: Turboed 0, handled 0, copied 0 (0 bytes), last UID was 0
INFO:main:FINISHED: Turboed 0, handled 0, copied 0 (0 bytes), last UID was 0

The command I'm using is:

$ ./imap2maildir.py -H hostname -u username -d -r "INBOX" --create

Abstract out the IMAP stuff into a simpleimap module

There's been some demand for a simpleimap module, which would present an interface that's more like the mailbox module. Ripping out the IMAP stuff into its own module would go a long way towards doing this, and would make future IMAP tools suck less.

Work around Google marking fetched messages as read

From an e-mail thread:

I've run into the same thing... I don't think it is "intentionally" being marked as read by the script or imaplib, but since the message is being downloaded, Google is marking it as read. I first noticed this around September of last year, if I recall correctly.

It has been awhile since I poked at it, but I think I tried two different approaches to work around this:

  1. Remember the original flags for the message, fetch the message, then set the flags back to the way they were.
  2. Hide the problem by only considering messages marked as "SEEN"

I ended up with the second solution (--search=CRITERIA, default to "SEEN"), in commit a708c65. This seems to work well with my workflow (run it nightly to back up the day's read messages), but revisiting the solution would be a very good idea. -rt

Failure for ~200k messages

Hello,

having four or five stops, I could end up downloading 47k of messages of 200k:
$ ls gmailbackup/new/ | wc -l
43815

I run the command, to grab all the messages:
$ python imap2maildir -u xxxxxx -r "[Gmail]/Tots els missatges" -s ALL --create -v -d gmailbackup

and for every run, I'm asked the password, and then it goes:
Opening sqlite3 database 'gmailbackup/.imap2maildir.sqlite'
Synchronizing 199663 messages from imap.gmail.com:[Gmail]/Tots els missatges to /home/llbatlle/tmp/rtucker-imap2maildir-fa0abe3/gmailbackup...
TURBO MODE ENGAGED!
Exception! Clearing locks and safing database.
Traceback (most recent call last):
File "imap2maildir", line 495, in
main()
File "imap2maildir", line 476, in main
search=options.search)
File "imap2maildir", line 396, in copy_messages_by_folder
for i in folder.Summaries(search=search):
File "/home/llbatlle/tmp/rtucker-imap2maildir-fa0abe3/simpleimap.py", line 357, in Summaries
summ = self.__parent.get_summary_by_uid(u)
File "/home/llbatlle/tmp/rtucker-imap2maildir-fa0abe3/simpleimap.py", line 256, in get_summary_by_uid
'(UID ENVELOPE RFC822.SIZE INTERNALDATE)')
File "/nix/store/qlmlvbsgb3q8iqlhkc7j8m6f9z71sbd6-python-2.6.5/lib/python2.6/imaplib.py", line 753, in uid
typ, dat = self._simple_command(name, command, *args)
File "/nix/store/qlmlvbsgb3q8iqlhkc7j8m6f9z71sbd6-python-2.6.5/lib/python2.6/imaplib.py", line 1060, in _simple_command
return self._command_complete(name, self._command(name, *args))
File "/nix/store/qlmlvbsgb3q8iqlhkc7j8m6f9z71sbd6-python-2.6.5/lib/python2.6/imaplib.py", line 890, in _command_complete
raise self.abort('command: %s => %s' % (name, val))
imaplib.abort: command: UID => socket error: unterminated line

I cannot download anymore. It takes quite a lot of time until the error appears. Can it be that gmail disconnects due to an inactivity timeout?

Handling \\ in the subject line

First off THANK YOU for putting this together. I'm extraordinarily happy with the results.

I did want to fire off a quick note that emails with a subject line containing "" are failing, presumably because it isn't handled during parsing.

I made a quick change to simpleimap.py in order to find the offending messages and simply changed their tag to get them out of the folder I was syncing.

On line 75:
raise ValueError('Unexpected parenthesis at pos %(pos)d text %(text)s' % {'pos':pos, 'text': text})

The addition of the text variable shows enough information about the message to find it and move it or delete it for now. I only had two of these messages so I didn't get into the code deep enough to come up with a resolution to the problem.

Trouble pulling down all mail from Outlook.com IMAP?

It seems that by default, we assume the IMAP account is Gmail:

Opening sqlite3 database '/Users/victorhooi/Documents/melmail/.imap2maildir.sqlite'
Exception!  Clearing locks and safing database.
Traceback (most recent call last):
  File "./imap2maildir", line 577, in main
    seencache=seencache)
  File "./imap2maildir", line 464, in copy_messages_by_folder
    outdict['total'] = len(folder)
  File "/Users/victorhooi/code/imap2maildir/simpleimap.py", line 390, in __len__
    raise Exception('folder %s: %s' % (self.__folder, data[0]))
Exception: folder [Gmail]/All Mail: [TRYCREATE] Specified mailbox does not exist.
Traceback (most recent call last):
  File "./imap2maildir", line 596, in <module>
    main()
  File "./imap2maildir", line 577, in main
    seencache=seencache)
  File "./imap2maildir", line 464, in copy_messages_by_folder
    outdict['total'] = len(folder)
  File "/Users/victorhooi/code/imap2maildir/simpleimap.py", line 390, in __len__
    raise Exception('folder %s: %s' % (self.__folder, data[0]))
Exception: folder [Gmail]/All Mail: [TRYCREATE] Specified mailbox does not exist.

This should probably be documented somewhere in the README.

In my case, I'm trying to pull down an Outlook.com address - I can use --remote-folder="FOLDER" to pull down specific folders, but there doesn't seem to be any way to pull down all folders and email?

Is it possible to enable imap2maildir to enumerate all folders from the root, and pull them down? This seems like it might be a useful feature. Apologies if I'm somehow missed it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.