thephm / message_md Goto Github PK

Classes to hold messages and converting to Markdown and supporting methods.

License: Apache License 2.0

Python 100.00%

message_md's Introduction

message_md

Code to hold Messages and convert them to Markdown.

Also includes all the supporting classes for Person, Group, Setting, String, Config, Attachments so the client code only needs to deal with the app-specific parsing of message files.

Configuration

Read the guide to learn how to configure the library.

Command line options

Any app that uses this library will inherit these command line options.

IMPORTANT: by default the begining date for the parsing is today so that it's easier (and faster) to get results and make sure everything is workling because you only have to look at a day's worth of messages. Once you're ready to parse everything, use something like -b 1970-01-01 to get all the messages.

Argument	Alternate	Description
`-c`	`--config`	Folder where the configuration files are
`-s`	`--sourceFolder`	Folder where the message file is
`-f`	`--file`	The filename of the file containing all of the messages to be converted
`-o`	`--outputFolder`	Where the resulting Markdown files will go
`-l`	`--language`	UI language, defaults to English
`-m`	`--mySlug`	Which person in the config file is me e.g. `bob`
`-d`	`--debug`	Print debug messages
`-b`	`--begin`	The date from which to start converting
`-i`	`--imap`	IMAP server address
`-r`	`--folders`	IMAP folders to retrieve from
`-e`	`--email`	email address to retrieve from
`-p`	`--password`	email password
`-x`	`--max`	The maximum number of messages to process
`-a`	`--add`	Add people to the output even if not in `people.json` config file

License

Apache License 2.0

message_md's People

Contributors

Stargazers

Watchers

message_md's Issues

Add option to include messages from people not in `config.json`

added -a or --add command line option and create-people config file option which (only for signal_sql_lite for now) adds messages for people not in the config.json file.

Add an "ignore list" of people I don't want to convert messages for

Mostly salespeople on linkedIn

Why? So I don't get "person not found" for people I know about and don't want added/tracked

Similar to .gitignore

Move some files into resources subfolder

The MIME types and strings aren't really config items, moreso resrouces so creating a resources folder and moving them there

Add a config folder option

Need a way to point the script to where all of the config files are located (settings.json, strings.json, people.json, groups.json, MIMETypes.json)

Add a -c <folderName> (or --config) option

Create an audit trail of messages processed

Could be as simple as

date, time, source_slug, destination_slug, group_slug, message_id, message_date, message_time, result_code

Where:

date and time are when the message was processed
source_slug is the person who sent the message or blank if not found
destination_slug is who it was sent to, or blank if to a group or not found
group_slug is the group it was sent to, or blank if to a single person or not found
message_date and message_time are the actual date and time the message was sent
result_code set to 0 if no error or some non-zero value if an error e.g. failed to find the person

Could also include the source_filename which might be helpful to remember which file was processed.

This way I could have an interactive processing which I think will be needed for email. Over the years, I had manually copy/pasted many emails into my DB and so I don't want to have duplicates.

Having a log would allow me to process in batches when I have time and not have to remember where I left off.

Similarly, people wouldn't have to remember when the last time they ran one of the tools that uses this library. Right now, they have to use -b YYYY-MM-DD to set the date from when to begin processing.

Need to support message archives that have timestamps new to old

LinkedIn exports messages different than other services. The messages are ordered newest to oldest. Need to add a flag to be able to handle this scenario

Add the ability to convert messages from a specific date forward

Add an option -b YYYY-MM-DD to convert messages from, and including, that date forward

With Signal and SMS I would delete the messages/conversations with people once they've been exported to Markdown but with LinkedIn, there's no easy way to delete messages. Plus, it's handy to keep messages in the native service anyway.

Refactor to move parameter parsing into config.py

Command line parameter parsing should've been in the config

Eventually the clients to the library should be adding their own parameters because right now there are client specific parameters in this generic library (e.g. "imap-server")

Add the newer parameters into `settings.json`

Add the following to the settings file and update the docs with info:

    "my-slug": "spongebob",
    "imap-server": "",
    "email-folders": "",
    "email-account": "",
    "max-messages": 5000

Add the group slug to `tags` in frontmatter for group messages

It would be helpful to include the slug for a group in a chat message so I can find all of the chats with that group more easily. Right now I'd have to parse the people field.

For example, I have a Signal group with my 4 sisters we call the "Bottom of the 9" since we are the bottom 5 of 9 kids :)

---
tags: [chat, bot9]
---

Add in the docs

Migrate the related docs from signal_md to this repo

Add backquote around `#text` so they don't show up as tags

Any text with # will end up being interpreted as a tag in Obsidian which pollutes my tags

A converted message had the wrong people label

A converted message had the wrong people label in it. In my personal export had bob-loblaw but should have been someone else who was in the config.json file and had a valid linkedin-id

---
tags: [chat]
people: [bob-loblaw, me]
date: 2023-11-03
time: 20:09
service: linkedin
---

The bob-loblaw had a blank linkedin-id in the config.json file so maybe that's why

{"person-slug": "bob-loblaw", "first-name": "Bob", "last-name": "Loblaw", "number": "6135551212", "linkedin-id": ""},

Display people who are not found

It's important to know who is not in config\people.json but in the LinkedIn export file so I can add them. In the future this could be automated and/or just use the LinkedIn contacts file.

Allow for multiple LinkedIn IDs

Maybe do this for all services since people could have multiple Twitter accounts.

Why? I notice that as people change their LinkedIn profiles the ID can change in the URL, e.g. theidhere

https://www.linkedin.com/in/theidhere/

Retrieving a Person by their LinkedIn ID doesn't work

getPersonByLinkedInId() doesn't actually find the person

Creating a bunch of "empty" message files

It may be specific to the LinkedIn export but I'm seeing a large number of files

May be because the group conversations aren't implemented in the LinkedIn script

Create an additional file if there's one existing for the date

Currently, when exporting messages and there's an existing YYYY-DD-MM.md file, it moves onto the next date.

This was the case because there was no option to specific from which date and I didn't want messages to be overwritten nor append to existing files and duplicate messages in a dated output file.

Instead, create an additional dated file with a number, incremented for each additional file

For example, if 2023-12-10.md exists, create 2023-12-10 - 1.md. If 2023-12-10 - 1.md exists, create 2023-12-10 - 2.md and so on.

This way, the user will have the messages and be able to manually de-dup the files. If I was smarter, I'd check the contents of the existing file and modify it by injecting the messages but that's overkill for me 😂

Redact passwords

If there is password: blah or password blah or password is blah or pwd: blah or pwd blah or pwd is blah then change blah to *****. Also, Password or Pwd.