GithubHelp home page GithubHelp logo

kosat / telegram-messages-dump Goto Github PK

View Code? Open in Web Editor NEW
122.0 9.0 27.0 79 KB

Command-line tool to dump message history of a Telegram chat.

License: MIT License

Python 93.80% Shell 6.20%
telegram cli dump utility python travis-ci

telegram-messages-dump's Introduction

Telegram Messages Dump

GitHub version Build Status

This is a simple console tool for dumping message history from a Telegram chat into a jsonl, csv or plain text file.

Installation

From PyPI:

pip install telegram-messages-dump

From sources: Fetch the latest sources with git:

git clone https://github.com/Kosat/telegram-messages-dump.git

Then run directly from sources

cd telegram-messages-dump
python -m telegram_messages_dump

Or run after installing locally

python setup.py install
telegram-messages-dump

Binaries:

Binaries for Linux, Windows and MacOS are available in Releases section.

Usage

Mandatory parameters are <chat_name> e.g. @Python, @CSharp or a title of a dialogue, as seen in the UI, and <phone_num> - a telephone number. A phone number is needed for authentication and will not be stored anywhere. After the first successful authorization it will create telegram_chat_dump.session file containing auth token. The information from this file is being reused in next runs. If this is not a desirable behaviour, use -cl flag to delete session file on exit.

Note1: You can use telegram dialogue multi-word title like so: --chat="Telegram Geeks" with double quotes. However, when using multi-word title (rather than @channel_name), you need to join the channel first. Only then you will be able to dump it. This way you can dump private dialogues which doesn't have @channel_name.

Note2: For private channels you can also pass an invitation link as chat name. E.g. --chat="https://t.me/joinchat/XXXXXYYYYZZZZZ". IMPORTANT: It only works when you (the logged-in user) has already joined the private chat that the invitation link corresponds to.

telegram-messages-dump -c <chat_name> -p <phone_num> [-l <count>] [-o <file>] [-cl]

Where:
    -c,  --chat     Unique name of a channel/chat. E.g. @python.
    -p,  --phone    Phone number. E.g. +380503211234.
    -o,  --out      Output file name or full path. (Default: telegram_<chatName>.log)
    -e,  --exp      Exporter name. text | jsonl | csv (Default: 'text')
      ,  --continue Continue previous dump. Supports optional integer param <message_id>.
    -l,  --limit    Number of the latest messages to dump, 0 means no limit. (Default: 100)
    -cl, --clean    Clean session sensitive data (e.g. auth token) on exit. (Default: False)
    -v,  --verbose  Verbose mode. (Default: False)
      ,  --addbom   Add BOM to the beginning of the output file. (Default: False)
    -h,  --help     Show this help message and exit.

telegram-dump-gif

Increamental/Continuous mode

After dumping messages into an output file, telegram-messages-dump also creates a meta file with the latest (biggest) message id that was successfully saved into an output file. For instance, if messages with ids 10..100 were saved in output file, the metafile will contain the "latest_message_id": 100 record in it.

  • If you want to update an existing dump file use --continue option without a parameter value. In this case telegram-messages-dump will read the latest message id from a meta file. In the sample below it will be C:\temp\xyz.txt.meta:
    telegram-messages-dump -p... -oC:\temp\xyz.txt  --continue
    
    In this case telegram-messages-dump will look for C:\temp\xyz.txt.meta file and try to incrimentally update the contents of C:\temp\xyz.txt with new messages.

Note: In incremental mode when metafile exists --exp and --chat will be taken from the meta file and must NOT be specified explicitely as parameters. --limit setting has to be omitted.

  • Otherwise, if you DON'T have a metafile or want to ignore it, you can still open your dump file and find the last message's id at the bottom of the file and then specify it explicitly as --continue=<LAST_MSG_ID> command, along with the correct --exp and --chat that were used to generate the existing dump file.
    telegram-messages-dump -p... -oC:\temp\xyz.txt --continue=100500 --exp=jsonl --chat=@geekschat
    

In both aforementioned cases, telegram-messages-dump will open the existing C:\temp\xyz.txt file and append the newer messages that were posted in the telegram chat since the message with the message with id 100500 was created.

Note1: There must be = sign between the --continue command name and integer message id.

Note2: In incremental mode without metafile, --out, --exp and --chat must be specified explicitely as parameters. --limit setting has to be omitted.

Notes

  • This tool relies on Telethon - a Telegram client implementation in Python.

Plugins

Output format is managed by exporter plugins. Currently there are two exporters available: text, jsonl and csv. Exporters reside in ./exporters subfolder. Basically an exporter is a class that implements three methods:

  • format(...) that extracts all necessary data from a message and stringifies it.
  • begin_final_file(...) that allows an exporter to write a preamble to a resulting output file.

To use a custom exporter. Place you .py file with a class implementing those 3 methods into ./exporters subfolder and specify its name in --exp <exporter_name> setting.

Note1: the class name MUST exactly match the file name of its .py file. This very same name is used as an argument for the --exp setting.

Note2: in .vscode subfolder you can find the default settings that I use for debugging this project.

License

This project is licensed under the MIT license.

telegram-messages-dump's People

Contributors

foxcpp avatar janclarin avatar kosat avatar teoretic6 avatar wowkin2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

telegram-messages-dump's Issues

Mixed message_ids

I want to retrieve history from channel where I have 22K of messages.
When I dump with this tool message ids are mixed sometimes.
Tried to get last 2200 messages - got them in following order:

200 items: #18854-19087
1000 items: #20243-21342
1000 items: #19088-20242

3500 messages:

500: #17269-17925
1000: #20243-21342
1000: #19088-20242
1000: #17928-19087

ERROR:Uncaught exception occured. RPCError 406: UPDATE_APP_TO_LOGIN (caused by SendCodeRequest)

When running against any public channel the script (installed with pip install telegram-messages-dump) can't dump anything:

Try to load exporter 'text.py'...  OK!
INFO:Initializing session...
Connecting to Telegram servers...
INFO:Connecting to 149.154.167.51:443/TcpFull...
INFO:Connection to 149.154.167.51:443/TcpFull complete!
Initial connection failed.
First run. Sending code request...
ERROR:Uncaught exception occured. RPCError 406: UPDATE_APP_TO_LOGIN (caused by SendCodeRequest)
0 messages were successfully written in the resulting file. Done!

Save history of groups without public link

It would be nice to be able to save messages from groups without public link (private groups), something like this:

~> python3 -m telegram_messages_dump -c "group name here" -p ... -l 0

but:

__main__.py: error: Chat name must start with "@"

-l 0 takes 0 messages, not all messages from the group

I tried running "telegram-messages-dump --chat="----" -p +3----- -e csv -l 0" and it says me

Dumping 0 messages into "telegram_--aDeZaBA.log" file ...
Merging results into an output file.
INFO:Writing a new metadata file.
0 messages were successfully written in the resulting file. Done!

Uncaught exception occured

Hello,

Could please correct this error :

ERROR:Uncaught exception occured. Cannot cast NoneType to any kind of InputPeer.
0 messages were successfully written in the resulting file. Done!

This is a private channel, I tried 3 differents and it throw the same error. Created a test channel with dummy messages and also got the same error.

Best regards

Can't dump user messages

I'm not able to dump user messages.

image

It finds the user, and seemingly gets the list of messages, but it then loops Invoking GetHistoryRequest until I manually interrupt the process.

I am able to do chats/channels without any issues.

More robust error handling

  • Clean temporary files in case of error
~> ls /tmp
*snip*
 tmp05du3ixs
 tmp0n247tf3
 tmp0vvu04yo
 tmp1q2n_ihv
 tmp3kj7o7gi
 tmp62emsoxb
 tmp6wj05ie0
 tmp7v_8i3p7
 tmpa1rkn8yq
 tmpa48ct2z3
 tmpbz8527p6
 tmpe8u4ckfp
 tmpg76f675h
 tmph2d3ne26
 tmpi5jva1i6
 tmpj8ccctns
 tmpj8reg5p8
 tmpjo436qv3
 tmpkdep2ist
 tmpl4gpptbd
 tmplu0xi49a
 tmpm73f50ww
 tmpojemi7nr
 tmppgw7_h38
 tmpq2_x5se2
 tmpqxhyt0ui
 tmp_uqgjofc
 tmpusg7_s9m
 tmpux0hldrk
 tmpuzuebz81
 tmp_wgk9u3o
 tmpyrd71m5j
 tmpzmjqufsj
 tmpznp5vvhf
  • Retry X times in case of RPC error (sometimes I get "Telegram is having internal issues now, retry later"). Would be useful when dumping full history of big chats.

  • Provide more user-friendly error message instead of scary stacktrace.

--
Thanks for such a good dumper, finally we can replace broken and abandoned telegram-history-dump ;)

Export usernames

Hi,
Is it possible to export displayed usernames (together with telegram id's)?
Thanks,

idk

i not received otp

always get this error.

Nobody is using this username, or the username is unacceptable. If the latter,
it must match r"[a-zA-Z][\w\d]{3,30}[a-zA-Z\d]"'). are you familiar with it ?

I get an error when running telegram-messages-dump

I'm trying to run telegram-messages-dump on windows 10. I usually clone and install python. But I get the error of the image when running the telegram-messages-dump command.

Any ideas of what I may be doing wrong, or is it an issue with windows?

image

phone banned

My phone was banned shortly after using this to download messages

Consider removing UTF-8 Byte Order Mark

The Unicode Standard permits the BOM in UTF-8, but does not require or recommend its use. Byte order has no meaning in UTF-8, so its only use in UTF-8 is to signal at the start that the text stream is encoded in UTF-8, or that it was converted from another stream that contained an optional BOM. The standard also does not recommend removing a BOM when it is there, so that round-tripping between encodings does not lose information, and so that code that relies on it continues to work. The IETF recommends that if a protocol either (a) always uses UTF-8, or (b) has some other way to indicate what encoding is being used, then it "SHOULD forbid use of U+FEFF as a signature."

-- https://en.wikipedia.org/wiki/Byte_order_mark

Additionally BOM can cause issues in Unicode-unaware programs (and even aware ones! json.loads fails if BOM is present).

TypeError after some time

 $ python3 -m telegram_messages_dump -c @... -p ... -l 0                                                                                    
Initializing session...
Connecting to Telegram servers...
Chat name @... resolved into channel id=...
Dumping all messages into "telegram_....log" file ...
/usr/lib/python3.6/site-packages/telethon/telegram_client.py:1092: UserWarning: get_message_history is deprecated, use get_messages instead
  'get_message_history is deprecated, use get_messages instead'
Processing messages with ids 604580-604481 ...
Processing messages with ids 604480-604376 ...
Processing messages with ids 604375-604275 ...
Processing messages with ids 604274-604166 ...
Processing messages with ids 604165-604057 ...
Processing messages with ids 604055-603948 ...
Processing messages with ids 603947-603844 ...
Processing messages with ids 603843-603741 ...
Processing messages with ids 603740-603638 ...
Processing messages with ids 603637-603528 ...
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/fox/Code/python/telegram-messages-dump/telegram_messages_dump/__main__.py", line 6, in <module>
    run.main()
  File "/home/fox/Code/python/telegram-messages-dump/telegram_messages_dump/run.py", line 28, in main
    TelegramDumper(os.path.basename(__file__), settings).run()
  File "/home/fox/Code/python/telegram-messages-dump/telegram_messages_dump/telegram_dumper.py", line 82, in run
    count = self.dump_messages_in_file(chat)
  File "/home/fox/Code/python/telegram-messages-dump/telegram_messages_dump/telegram_dumper.py", line 215, in dump_messages_in_file
    with tempfile.TemporaryFile(mode='w+', encoding='utf-8', delete=False) as tf:
TypeError: TemporaryFile() got an unexpected keyword argument 'delete'

According to https://docs.python.org/3.6/library/tempfile.html there is no "delete" argument in tempfile.TemporaryFile func but it present in tempfile.NamedTemporaryFile.

Latest osx binary is not useable

Running the osx binary in latest release (0.3.5) will raise error as below:

$ ./telegram-messages-dump
Fatal Python error: initfsencoding: unable to load the file system codec
zipimport.ZipImportError: can't find module 'encodings'

Current thread 0x00007fff97c8b380 (most recent call first):
[1]    66361 abort      ./telegram-messages-dump

This seems to be the problem of pyinstaller, which is said not compatible with Python 3.7 in this issue: pyenv/pyenv#1095

I tried to build the binary with 3.6.7 on my mac, and the result works properly. Maybe you should restrict the python version to 3.6.X in the travis build script?

ERROR:Uncaught exception occured. 'Chat' object has no attribute 'username'

Hello!

I get this strange error and can't figure out what is causing it:
"ERROR:Uncaught exception occured. 'Chat' object has no attribute 'username'"

I successfully got and entered verification code from Telegram.
I'm using multi-word name of the chat.

Python 3.6.5
Linux Mint 18.3 (if it matters)

screenshot from 2018-07-21 22-30-56

Any ideas on what is causing it?

Using with proxy

I cannot connect to Telegram server unless using a proxy. However, currently there is no way to specify a proxy.

Try to load exporter 'text.py'...  OK!
INFO:Initializing session...
DEBUG:Using selector: EpollSelector
Connecting to Telegram servers...
INFO:Connecting to 149.154.167.51:443/TcpFull...
DEBUG:Connection attempt 1...
WARNING:Attempt 1 at connecting failed: ConnectionRefusedError: [Errno 111] Connect call failed ('149.154.167.51', 443)
DEBUG:Connection attempt 2...
WARNING:Attempt 2 at connecting failed: ConnectionRefusedError: [Errno 111] Connect call failed ('149.154.167.51', 443)
DEBUG:Connection attempt 3...
WARNING:Attempt 3 at connecting failed: ConnectionRefusedError: [Errno 111] Connect call failed ('149.154.167.51', 443)
DEBUG:Connection attempt 4...
WARNING:Attempt 4 at connecting failed: ConnectionRefusedError: [Errno 111] Connect call failed ('149.154.167.51', 443)
DEBUG:Connection attempt 5...
WARNING:Attempt 5 at connecting failed: ConnectionRefusedError: [Errno 111] Connect call failed ('149.154.167.51', 443)
DEBUG:Connection attempt 6...
WARNING:Attempt 6 at connecting failed: ConnectionRefusedError: [Errno 111] Connect call failed ('149.154.167.51', 443)
ERROR:Uncaught exception occured. Connection to Telegram failed 6 time(s)
DEBUG:Make sure there are no temp files left undeleted.
INFO:Session data cleared.
0 messages were successfully written in the resulting file. Done!

embedded link messages

hello is there any way to get the link hidden behind the embedded text? i tried to read common.py but couldn't figure out how to extract the url from raw message.

any help would be appreciated, thanks. :)

Stops with a limit=0

I tried to export all messages of a chat I have. It stopped working after this log:

Processing messages with ids 44150-42271 ...
Processing messages with ids 42266-41564 ...
Processing messages with ids 41545-41525 ...

This also looks like a way smaller range of ids than all previous logs.

Also, when I Ctrl+C now, it doesn't write anything to the file, even though the message says it does. The file is created, but completely empty with a size of 0.

Increase telethon version to avoid exceptions

Got an issue with current using version 0.18.3

Traceback (most recent call last):
  File "...\telegram_messages_dump\telegram_dumper.py", line 80, in run
    chatObj = self._getChannel()
  File "...\telegram_messages_dump\telegram_dumper.py", line 173, in _getChannel
    dialogs = self.get_dialogs(limit=None)
  File "...\venv\lib\site-packages\telethon-0.18.3-py3.6.egg\telethon\telegram_client.py", line 619, in get_dialogs
    dialogs = UserList(self.iter_dialogs(*args, **kwargs))
  File "C:\Python36\lib\collections\__init__.py", line 1039, in __init__
    self.data = list(initlist)
  File "...\venv\lib\site-packages\telethon-0.18.3-py3.6.egg\telethon\telegram_client.py", line 600, in iter_dialogs
    yield Dialog(self, d, entities, messages)
  File "...\venv\lib\site-packages\telethon-0.18.3-py3.6.egg\telethon\tl\custom\dialog.py", line 69, in __init__
    self.draft = Draft(client, dialog.peer, dialog.draft)
  File "...\venv\lib\site-packages\telethon-0.18.3-py3.6.egg\telethon\tl\custom\draft.py", line 32, in __init__
    self._text = markdown.unparse(draft.message, draft.entities)
AttributeError: 'DraftMessageEmpty' object has no attribute 'message'

It is already investigated and fixed in next releases (the latest is 1.0.3).
See discussion: LonamiWebs/Telethon#844

Socket error: timed out

I had an issue with timeout error.
If you'll have something similar - just need to increase connection timeout from 10 (default) to 40 in telegram_dumper.py.
Solution from here LonamiWebs/Telethon#801 (comment)

If anybody else will also have the same issue, comment here. Maybe we'll need to create pull-request for that.

Error on start (missing module)

 $ python3 -m telegram_messages_dump                                                                                                                                             
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/fox/Code/labyrinth-project/telegram-messages-dump/telegram_messages_dump/__main__.py", line 5, in <module>
    from . import run
  File "/home/fox/Code/labyrinth-project/telegram-messages-dump/telegram_messages_dump/run.py", line 21, in <module>
    from telegram_messages_dump.telegram_dumper import TelegramDumper
  File "/home/fox/Code/labyrinth-project/telegram-messages-dump/telegram_messages_dump/telegram_dumper.py", line 14, in <module>
    from telethon.errors.rpc_errors_420 import FloodWaitError
ModuleNotFoundError: No module named 'telethon.errors.rpc_errors_420'
Version Information
  • Python 3.6.4
  • Telethon 0.17

Incremental dumps

It would be nice for telegram-messages-dump to be able to update existing dumps instead of reprocessing everything (what is very useful if you want to "backup" your chat's history).

Something like this can work:

  1. Read latest message ID from dump file.
  2. Save and append everything after this ID.

Problems:

  • What to do with machine-unfriendly exporters?
  • -l option behavior? Maybe "dump X messages after latest message ID"?

CSV mode, usernames are without quotes

Hi,
First of all, thanks for such an useful utility!

Now, when I tried to export some chat history, I found that one username has ',' in it, and usernames field is not surrounded by quotes. Can you please change CSV writer to put quotes around username field?

Thanks,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.