GithubHelp home page GithubHelp logo

`--` in titles about denote HOT 57 CLOSED

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024
`--` in titles

from denote.

Comments (57)

pprevos avatar pprevos commented on September 14, 2024 3

Thanks for the elaborate response. The beauty of free software is that we can use it as we please. So there is no point discussing what is better, we should focus on possibility and user choice.

File names are limited because of typographic restrictions. I write prose, not code, so filenames won't suffice.

EXIF is a great tool in my use case. Image viewing software accesses this data when displaying files, PDF readers use it etc.

from denote.

jeanphilippegg avatar jeanphilippegg commented on September 14, 2024 1

On the other hand, you have at least one user who has a use for -- in titles, a desire which is not equivalent to wanting the other tokens in titles precisely because it does not introduce parsing ambiguity.

You are right. If there are no parsing ambiguity, there should not be a restriction. I agree with you.

I did not went the extra mile for simplicity and because it was already how it worked. I will check how we can remove this restriction as well. The only real challenge with that is the Dired fontification (the way file names are displayed in a Dired buffer). While we don't have to parse file names with regexps for our internal functions, Dired fontification requires the use of regexps. This is hard to configure right. Hopefully, we can have -- in file names without breaking it.

from denote.

jeanphilippegg avatar jeanphilippegg commented on September 14, 2024

Just to be sure, you would like to have multiple titles or simply that "--" be allowed in the only title that is currently supported?

Also note that the current development version of Denote collapses multiple consecutive "-" in the title to a single one directly in denote-sluggify.

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

Just to be sure, you would like to have multiple titles or simply that "--" be allowed in the only title that is currently supported?

Simply the latter is sufficient, either is fine (the former would probably force slight code changes on my end but I don't think I mind very much).

Also note that the current development version of Denote collapses multiple consecutive "-" in the title to a single one directly in denote-sluggify.

Thanks for the heads up. I hope by next release it'll support what I want or it'll be easy enough to advise. But I'll carry a full redefinition in my config if I have to.

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

I just noticed that in the current main branch head, denote-title-regex seems to stop at the -- if a second one exists in the file name. What was the reason/motivation for that?

from denote.

jeanphilippegg avatar jeanphilippegg commented on September 14, 2024

There has been a lot of changes to remove most restrictions in the file names. In the next release, filenames will be allowed to contain any characters and sluggification will be optional. The defaults will stay the same.

The only restrictions that will remain are about the tokens ("--", "==", "__") that Denote needs to be able to distinguish between the file name components. Unfortunately, in your case, as I reworked the internal functions and variables, this means that "--" are removed in the title of the filename. This was also the case before, through the default sluggification (that could not be configured).

In principle, it would be possible to implement our internal functions to check only for the first occurrence of "--" in a title, thus allowing for it to be present in a filename's title. I had thought about that, but then it led me to think about the other tokens. What if someone wants to have "__" (double underscores) in a filename's title? How would Denote be able to know when keywords start?

I did not want to complexify things, so I just made sure not to allow that. By the way, "--" should still be allowed in the front matter though, because there is no rules there.

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

What if someone wants to have "__" (double underscores) in a filename's title?

Have you ever had such a user? Did their deeply confused desire persist once you said to them "How would Denote be able to know when keywords start?"

On the other hand, you have at least one user who has a use for -- in titles, a desire which is not equivalent to wanting the other tokens in titles precisely because it does not introduce parsing ambiguity.


(that could not be configured)

Sad but true. Thankfully, function advice go brrrrr

By the way, "--" should still be allowed in the front matter though, because there is no rules there.

Of course, but music and ebooks don't have front matter, and that's like 90% of my current uses of -- in titles. (In general the front matter is secondary to me. It's a nice value add that it's there, but I mainly use Denote for the naming convention, the existing code that helps me rename files according to that convention, and the not-yet-explored hope that the linking support is nice.)

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

It sounds like you feel allowing "--" in the filename title is an arbitrary special case to permit one token while the others are not permitted. And I hear that, it's very relatable. I feel against that kind of thing when designing code as well.

I just see this as an artifact of chosen perspective/frame. To me it can instead seem equally like an arbitrary special case to disallow "--" even though it's just one substring out of all the substrings that don't introduce parsing ambiguity.

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

Thank you, that's great to hear! It means a lot and is very helpful if this ends up supported in-package.

For the Dired regexp, I'll try to give that a helpful think.

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

Is main branch in this repo current for the fontification regexp?

denote/denote.el

Lines 3296 to 3303 in abbe772

(defvar denote-faces--file-name-regexp
(concat "\\(?11:[\t\s]+\\|.*/\\)?"
"\\(?1:[0-9]\\{8\\}\\)\\(?10:T\\)\\(?2:[0-9]\\{6\\}\\)"
"\\(?:\\(?3:==\\)\\(?4:[^.]*?\\)\\)?"
"\\(?:\\(?5:--\\)\\(?6:[^.]*?\\)\\)?"
"\\(?:\\(?7:__\\)\\(?8:[^.]*?\\)\\)?"
"\\(?9:\\..*\\)?$")
"Regexp of file names for fontification.")

If so then this seems like it would just work as-is. Since the signature can't contain --, there's no issue with group 4 greedy matching parts of the title, and then the title's group 6 would greedily match any extra occurrences of --.

..aaand just tested it, yeah it works as-is.

Edit: I just realized you got non-greedy match in those groups. Okay so I don't know why it works, but it seemed to in my brief testing.

Edit 2: oh, I think it works because group 6 ends up having no choice but to include any -- nested in the title: that's the only way the later groups can "reach" what they match.

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

I dug a little and it seems that we don't even have to use regexps for fontification. From the help buffer for variable font-lock-keywords:

MATCHER can be either the regexp to search for, or the
function name to call to make the search (called with one
argument, the limit of the search; it should return non-nil, move
point, and set ‘match-data’ appropriately if it succeeds; like
‘re-search-forward’ would).

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

By the way, I looked into it even more - did you know you don't have to fontify the whole thing with just one regexp?

You can actually split it up! Here's an (incomplete) example:

    (setq denote-faces-file-name-keywords
        (list
            (list
                (concat
                    "\\(?1:[0-9][0-9][0-9][0-9]\\)"
                    "\\(?2:[0-9][0-9]\\)"
                    "\\(?3:[0-9][0-9]\\)"
                    "\\(?7:T\\)"
                    "\\(?4:[0-9][0-9]\\)"
                    "\\(?5:[0-9][0-9]\\)"
                    "\\(?6:[0-9][0-9]\\)")
                '(1 'denote-faces-year)
                '(2 'denote-faces-month)
                '(3 'denote-faces-day)
                '(7 'denote-faces-delimiter)
                '(4 'denote-faces-hour)
                '(5 'denote-faces-minute)
                '(6 'denote-faces-second))
            (list
                "\\..*$"
                '(0 'denote-faces-extension))
            (list
                "__"
                '(0 'denote-faces-delimiter)
                (list "\\(.*?\\)\\([_.]\\)" nil nil
                    '(1 'denote-faces-keywords t)
                    '(2 'denote-faces-delimiter)))))

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

The cool thing about splitting it up is that

  1. [secret reason which will become obvious once I tell you I will soon hit Submit Issue on something titled "Denote names... without ID (cue ominous music)"]

  2. you can get more granular matching and fontification - for example, in the example above, every _ between keywords is also fontified as a delimiter.

from denote.

jeanphilippegg avatar jeanphilippegg commented on September 14, 2024

Yes, I was thinking that we could also use multiple calls to font-lock-add-keywords instead of a single one with one big regexp. I will also look into the way to call it with a function instead of a regexp that you mentionned earlier.

Our intentions are aligned! :)

from denote.

jeanphilippegg avatar jeanphilippegg commented on September 14, 2024
2. you can get more granular matching and fontification - for example, in the example above, every `_` between keywords is also fontified as a delimiter.

I also intend to change the delimiter between keywords to allow single underscores in keywords. I think this character is common enough that it is desirable to support it. I will propose to have keywords separated with "__" (double underscores), just like the token itself. But there are prerequisite changes to be done before to make it a seamless change for users.

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

I will propose to have keywords separated with "__" (double underscores), just like the token itself.

!!! Yes!! I have in the past wanted exactly this! It enables using _ to separate words within a tag and - to separate words within a title! Which in turn

  • makes it easier to visually tell multi-word tags apart from titles, and
  • reconciles multi-word tags with reliably being able to search for a word in just tags or in just titles.

I do feel a slight aesthetic aversion to it, but I think that might be just/mainly because I got used to the naming as it is.

from denote.

jeanphilippegg avatar jeanphilippegg commented on September 14, 2024
1. [secret reason which will become obvious once I tell you I will soon hit Submit Issue on something titled "Denote names... without ID (cue ominous music)"]

You have not hit the "Submit issue" button, but I gave it a thought. We want to be able to reorder the elements of the file name, and there will probably be a new denote-file-name-components and its value will look like '(identifier signature title keywords). This new variable will control the order of the filename components. I now think that it will be nice that if an element is missing, it will not be part of a new note's file name. I asked myself if we could also make the identifier optional and I think the answer is yes, with the limitation that a note without an identifier cannot be linked to.

This will be similar to how denote-prompts works.

from denote.

jeanphilippegg avatar jeanphilippegg commented on September 14, 2024

The cool thing about splitting it up is that

1. [secret reason which will become obvious once I tell you I will soon hit Submit Issue on something titled "Denote names... without ID (cue ominous music)"]

2. you can get more granular matching and fontification - for example, in the example above, every `_` between keywords is also fontified as a delimiter.

I checked how font-locking works and I have given this more thought and I don't think we can split it up (or use a function). The way font-locking works is by searching the buffer for the occurrences of the regexp and fontify accordingly. We really only want to fontify a line that looks like a valid Denote file name. We don't want to fontify everything that starts with "--", for example, if is not inside a note's file name. Maybe there still is a better way, but I am short on ideas.

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

I think you can solve that by nesting the anchoring one more level.

See how in my example above, the __ match has an additional match for the actual keyword field anchored after it?

I suspect you can anchor all of that after the datetime match (untested, just showing the idea):

    (setq denote-faces-file-name-keywords
        (list
            (list
                (concat
                    "\\(?1:[0-9][0-9][0-9][0-9]\\)"
                    "\\(?2:[0-9][0-9]\\)"
                    "\\(?3:[0-9][0-9]\\)"
                    "\\(?7:T\\)"
                    "\\(?4:[0-9][0-9]\\)"
                    "\\(?5:[0-9][0-9]\\)"
                    "\\(?6:[0-9][0-9]\\)")
                '(1 'denote-faces-year)
                '(2 'denote-faces-month)
                '(3 'denote-faces-day)
                '(7 'denote-faces-delimiter)
                '(4 'denote-faces-hour)
                '(5 'denote-faces-minute)
                '(6 'denote-faces-second)
                (list
                    "\\..*$"
                    '(0 'denote-faces-extension))
                (list
                    "__"
                    '(0 'denote-faces-delimiter)
                    (list "\\(.*?\\)\\([_.]\\)" nil nil
                        '(1 'denote-faces-keywords t)
                        '(2 'denote-faces-delimiter))))))

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

This gets entangled with #280 because I think if you allow ID to be optional, then it's harder to know what's a valid denoted file name. Or rather many, probably even most, regular file names in the wild can be interpreted as a denoted name with just a title per the proposal in that issue.

Maybe that just means you don't fontify the files unless they are a linkable denoted name, and that's reasonable enough.

Or maybe you just don't add support for ID being optional, so that you can give users the consistent experience that any denoted name is fontified.

A good middle ground would be if we can get a regex or function that matches just the file part of a denote buffer. And then have that as the top level matcher. So only -- within the file name would get fontified.

from denote.

jeanphilippegg avatar jeanphilippegg commented on September 14, 2024

That is nice! I will test it. Thanks!

Have you started the process of copyright assignment?

from denote.

jeanphilippegg avatar jeanphilippegg commented on September 14, 2024

I cannot get nesting to work. I am testing with :

(setq denote-faces-file-name-keywords
`(("\\(?1:[0-9]\\{8\\}\\)\\(?10:T\\)\\(?2:[0-9]\\{6\\}\\)"
   (1 'denote-faces-date)
   (10 'denote-faces-delimiter)
   (2 'denote-faces-time)
   ("\\..*$"
    (0 'denote-faces-keywords)))))

The identifier is fontified, but the nested extension is not.

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

@jeanphilippegg do you see any fontification errors in *Messages* buffer (C-h e)?

Also at a glance to me your quoting looks different than mine. First thing I'd try to change about your example is add a ' in that inner level: '(0 'denote-faces-keywords) - I know, it's perverse, but I've noticed that each level I nest seems to require an additional '. I suspect font-lock is recursively hitting every level with an eval or something.

(And on that note, I am rapidly developing an intense and deep hatred of font-lock and whoever was responsible for its design. Besides every level of matching apparently requiring more quoting, the thing I particularly despise is that twice now it has caused me to lose some work by locking up all of Emacs, since - despite being written on the by-default single-threaded and blocking base that is Emacs, it has no protections against infinite loops or blocking indefinitely. ...oh, locked up yet again, but I know to just save before every fontify try now.)

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

In the meantime, behold!

Function matcher, which matches the whole file name!

    (progn  ;; progn to help interactive reset with `eval-last-sexp`
    (setq fufl 1)  ;; font lock infinite blocking loop guard
    (defun fufl (end)  ;; backronym left as exercise to reader
        (setq fufl (1+ fufl)) ;; you see where this goes
        (when (> fufl 5)  ;; desperate times -> measures
            (defun fufl (_) nil))  ;; shut it down
        (message "koz1 %S %S" (point) end)  ;; where are we?
        (until (dired-file-name-at-point)
            (dired-next-line 1))
        (dired-move-to-filename)
        (set-match-data
            (list
                (point-marker)
                (prog2
                    (end-of-line)
                    (point-marker)
                    (forward-line 1))))
        (message "koz2 %S" (match-data))
        (point))
    )  ;; eval-last-sexp here to reset

    (setq denote-faces-file-name-keywords
        (list
            (list
                'fufl
                (list 0 ''denote-faces-year))))  ;; years of R&D

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

Ohhhh, ignore my prior advice. I missed a detail. When a matcher is nested ("anchored") is another matcher, it's supposed to also have two lisp forms after the matcher (from font-lock-keywords docstring):

(MATCHER PRE-MATCH-FORM POST-MATCH-FORM MATCH-HIGHLIGHT ...)

So I think (untested, focused testing my own thing) the fix for your example is just:

 (setq denote-faces-file-name-keywords `(("\\(?1:[0-9]\\{8\\}\\)\\(?10:T\\)\\(?2:[0-9]\\{6\\}\\)"
    (1 'denote-faces-date)
    (10 'denote-faces-delimiter)
    (2 'denote-faces-time)
    ("\\..*$"
+    nil  ;; pre-match-form
+    nil  ;; post-match-form
     (0 'denote-faces-keywords)))))

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

Ohhhh, ignore my prior advice. I missed a detail.

Actually, haha, looks like I actually didn't miss it in my first example, it was just so out of view and non-obvious that even I ended up missing it when combing back to it now (note the two nil tucked at the end of the line):

(list "\\(.*?\\)\\([_.]\\)" nil nil
    ...)

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

Yyeeeessssss!

    (defun dired-filename-search-forward (_bound)
        (until (dired-file-name-at-point)
            (dired-next-line 1))
        (dired-move-to-filename)
        (set-match-data
            (list
                (point-marker)
                (prog2
                    (end-of-line)
                    (point-marker)
                    (forward-line 1))))
        (point))
    (defconst date-t-time-regex
        (concat
            "\\(?1:[0-9][0-9][0-9][0-9]\\)"
            "\\(?2:[0-9][0-9]\\)"
            "\\(?3:[0-9][0-9]\\)"
            "\\(?7:T\\)"
            "\\(?4:[0-9][0-9]\\)"
            "\\(?5:[0-9][0-9]\\)"
            "\\(?6:[0-9][0-9]\\)"))
    (defconst point-to-match-beginning-form
        '(progn
            (goto-char (match-beginning 0))
            (point)))
    (defconst point-to-match-end-form
        '(progn
            (goto-char (match-end 0))
            (point)))
    (setq denote-faces-file-name-keywords
        `((dired-filename-search-forward
           (,date-t-time-regex
               ,point-to-match-beginning-form
               ,point-to-match-end-form
               (1 'denote-faces-year)
               (2 'denote-faces-month)
               (3 'denote-faces-day)
               (7 'denote-faces-delimiter)
               (4 'denote-faces-hour)
               (5 'denote-faces-minute)
               (6 'denote-faces-second))
           ("\\..*$"
               ,point-to-match-beginning-form
               ,point-to-match-end-form
               (0 'denote-faces-extension))
           ("==\\|--\\|__"
               ,point-to-match-beginning-form
               ,point-to-match-end-form
               (0 'denote-faces-delimiter))
           ("==\\([^.]*?\\)\\(--\\|__\\|\\.\\|$\\)"
               ,point-to-match-beginning-form
               ,point-to-match-end-form
               (1 'denote-faces-signature))
           ("--\\([^.]*?\\)\\(==\\|__\\|\\.\\|$\\)"
               ,point-to-match-beginning-form
               ,point-to-match-end-form
               (1 'denote-faces-title))
           ("__\\([^.]*?\\)\\(==\\|--\\|\\.\\|$\\)"
               ,point-to-match-beginning-form
               ,point-to-match-end-form
               (1 'denote-faces-keywords)))))

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

Okay, so a little explanation on that last comment.

Not a complete robust solution, just a working demo of function matcher and nested matchers. Basic denoted names work.

  1. It uses a function matcher to match Dired filenames. A function matcher has an interface a lot like like re-search-forward. It receives a "bound" argument to limit how far it goes (I'm not using this above, but maybe there's edge cases where I should. For example, I didn't test with subdirs expanded in the Dired buffer.)

  2. The big benefit of using function matcher is that all the other regexps are scoped within the filename match.

    • Looking forward to #280 - how would we know where to start fontifying the title if the ID and -- become optional? Being scoped inside the file name should make that much easier.
    • All other matches become simpler. For example we can just match --, and it won't affect other Dired fields like rw-r--r--.
    • Other neat effects become easy. For example, the above will fontify all occurrences of YYYYMMDDTHHMMSS in the file name, not just the first one.

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

Victory!!!

I've wrangled fontification to do what I want, and hopefully this gives you ample examples.


Behold!

Notes (jarring colors for demo/testing purposes):

Screenshot_20240314-164709

No-ID (#280) and ---in-title names:

Screenshot_20240314-164139

Note in particular how the "inner" delimiters are fontified: in titles, - is dimmed; in tags, _ is dimmed; in signatures, = is dimmed. This is a personal choice of course, but it took some serious wrangling to make fontify to do this: I need both anchored (nested) matches and multiple top-level function matchers.


The code!

The big picture

    (setq denote-faces-file-name-keywords
        `((dired-filename-search-forward
           (,date-t-time-regex
               ,point-to-match-beginning-form
               ,point-to-match-end-form
               (1 'denote-faces-year)
               (2 'denote-faces-month)
               (3 'denote-faces-day)
               (7 'denote-faces-delimiter)
               (4 'denote-faces-hour)
               (5 'denote-faces-minute)
               (6 'denote-faces-second))
           ("\\..*$"
               ,point-to-match-beginning-form
               ,point-to-match-end-form
               (0 'denote-faces-extension))
           ("==\\|--\\|__"
               ,point-to-match-beginning-form
               ,point-to-match-end-form
               (0 'denote-faces-delimiter)))
          (denote-dired-keywords-search-forward
           ("[^_]"
               ,point-to-match-beginning-form
               ,point-to-match-end-form
               (0 'denote-faces-keywords))
           ("_"
               ,point-to-match-beginning-form
               ,point-to-match-end-form
               (0 'denote-faces-delimiter)))
          (denote-dired-title-search-forward
           ("[^-]"
               ,point-to-match-beginning-form
               ,point-to-match-end-form
               (0 'denote-faces-title))
           ("-"
               ,point-to-match-beginning-form
               ,point-to-match-end-form
               (0 'denote-faces-delimiter)))
          (denote-dired-signature-search-forward
           ("[^=]"
               ,point-to-match-beginning-form
               ,point-to-match-end-form
               (0 'denote-faces-signature))
           ("="
               ,point-to-match-beginning-form
               ,point-to-match-end-form
               (0 'denote-faces-delimiter)))))

The details

Here's what I figured out about matcher functions so far: the behavior and side-effects expected by font-lock are what re-search-forward does with t as the third argument:

  • matches must only be returned if they're fully between point and the bound argument;
  • on a match, point is moved to just after the full match, match data is set, and the new point position is returned;
  • otherwise point is left where it was before the call, match data is left as it was before the call, and nil is returned.

I have made significant effort to ensure every function introduced here follows that, but I'm only human, maybe I missed something. (Everything I implemented before ensuring this basically worked fine though, maybe even perfectly, so maybe things basically work without it.)

First, an improved version of the Dired filename matcher from a couple comments ago:

    (defun dired-filename-search-forward (bound)
        (let ((start-of-search (point)))
            (while (and (not (eobp))
                        (or (eolp)
                            (not (dired-file-name-at-point))))
                (dired-next-line 1))
            (dired-move-to-filename)
            (if (and (not (eobp))
                     (>= (point) start-of-search))
                (let ((start-of-match (point-marker)))
                    (end-of-line)
                    (if (<= (point) bound)
                        (let ((end-of-match (point-marker)))
                            (set-match-data (list start-of-match end-of-match))
                            (point))
                        (goto-char start-of-search)
                        nil))
                (goto-char start-of-search)
                nil)))

↑ Compared to the previous version this is more complex because it implements the spec precisely - my initial version just lied about where the match ended (one character past the actual end, including the new line
after the file name), which was a more convenient way to solve the possible infinite looping edge cases. Now we need those EOLP and EOBP checks. (I'm displeased by how this function is written but after briefly scouting catch/throw and so on, I couldn't figure out a cleaner way.)

Second, we build a few of Denote-specific matchers. A lot of the logic is reused between title, keywords, and signature, and it very neatly and naturally factored out into simpler "-*forward-search" functions. Lower-level functions first, so if you like reading big-to-smalls, tart from the bottom.

They all basically just run the Dired file name matcher, and then narrow down the match inside what that returns. To start laying the groundwork for future order independence, I made them all just stop for any of the four delimiters (-- for title, __ for tags, == for signature, and . for space) or the end of the file name (for files without extensions - not shown in above screenshots but of course also works).

The tag and signature fields are parsed with identical logic, while the title field gets special handling to skip over the leading datetime, if any.

One of the cool things about using mattress functions is that you can match an arbitrary number of times within the same file name. So what you'll notice about each of these functions is that they only match one substring between two of Denote's delimiters. So 20140314T132456--foo--bar actually triggers the title matcher twice: first for foo and then for bar. (This is why the perverse case of a no-ID, title-first, yet-still-kept-the--- file name (--foo.md instead of either DATE--foo.md or foo.md) "just works" with no special handling - the no-ID+no-dashes leading title match branch is immediately stopped by the leading --, but then the matcher is checked again for everything that's left, and the ---prefixed branch catches it.) So we also get future-proofed for changes like allowing == in signatures (and if it goes that way, __ instead of _ for tag separation).

    (defun denote--delimiter-search-forward (end-of-name)
        (let ((start-of-match (point-marker)))
            (if (re-search-forward "==\\|--\\|__\\|\\." end-of-name t)
                (goto-char (match-beginning 0))
                (goto-char end-of-name))
            (let ((end-of-match (point-marker)))
                (set-match-data (list start-of-match end-of-match))))
        (point))

    (defun denote--field-search-forward (separator end-of-name)
        (when (re-search-forward separator end-of-name t)
            (denote--delimiter-search-forward end-of-name)))

    (defun denote--dired-search-forward (separator bound)
        (let ((old-match-data (match-data)))
            (if (dired-filename-search-forward bound)
                (progn
                    (goto-char (match-beginning 0))
                    (denote--field-search-forward separator (match-end 0)))
                (set-match-data old-match-data)
                nil)))

    (defun denote-dired-signature-search-forward (bound)
        (denote--dired-search-forward "==" bound))

    (defun denote-dired-keywords-search-forward (bound)
        (denote--dired-search-forward "__" bound))

    (defun denote-dired-title-search-forward (bound)
        (let ((old-match-data (match-data)))
            (if (dired-filename-search-forward bound)
                (let ((end-of-name (match-end 0)))
                    (goto-char (match-beginning 0))
                    (if (re-search-forward date-t-time-regex end-of-name t)
                        (denote--field-search-forward "--" end-of-name)
                        (denote--delimiter-search-forward end-of-name)))
                (set-match-data old-match-data)
                nil)))

The regular expression to match the datetime is just what y'all are already using, modulo stylistic differences.

    (defconst date-t-time-regex
        (concat
            "\\(?1:[0-9][0-9][0-9][0-9]\\)"
            "\\(?2:[0-9][0-9]\\)"
            "\\(?3:[0-9][0-9]\\)"
            "\\(?7:T\\)"
            "\\(?4:[0-9][0-9]\\)"
            "\\(?5:[0-9][0-9]\\)"
            "\\(?6:[0-9][0-9]\\)"))

The pre-match-form and post-match-form are unchanged from earlier:

    (defconst point-to-match-beginning-form
        '(progn
            (goto-char (match-beginning 0))
            (point)))
    (defconst point-to-match-end-form
        '(progn
            (goto-char (match-end 0))
            (point)))

↑ these allow the nested anchored matchers to start from the beginning of the parent match (so from the beginning of the file that was matched) instead of from wherever the parent matcher left point. (And then to put the point back to the end to resume the match.) (Although I haven't actually followed the source code of font lock to see how it calls these forms, so this part of my knowledge is just going by the documentation and empirical observation.)

But it's too good to be true!

Alas, there is one terrible, shameful flaw: when you implement letting people change the order of the denoted fields in the file name, the title, keywords, and signature matchers in denote-faces-file-name-keyword must be in reverse order. So for example, if the user puts signature last, the signature matcher would need to move before the tag matcher in the list.

I have spent hours trying different things and every other alternative which didn't have this problem led to feature regressions versus what this version accomplishes and enables. In particular, two issue I remember clearly:

  • if you use the "override" flag in the highlight matchers, it works but also suppresses Dired's mark colors;
  • every variant which didn't have these problems instead couldn't fontify past the first nested fontified delimiter in a field (so in the title foo-bar or foo--bar, you couldn't have the dashes fontified, or if you tried then bar wouldn't be fontified, which for me is just too big of a readability feature to pass up).

Other than that it seems to work great with everything I threw at it so far.

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

My last comment's fontification is also slow enough to degrade UX in the edge cases, I'm realizing.

Not slow enough for a human to notice, but it's slow enough to get "caught" by commands that show or manipulate the buffer before font lock can finish.

For example, I have a command which opens the notes dired and then immediately runs consult-line in it.

With the above font lock setup, that consult-line call consistently caught the notes buffer mid-fontifying.

(Locally I fixed this by putting all of the code in that command starting with the consult-line call into a function called by run-with-idle-timer, with zero seconds as the delay, but I wouldn't want to ship something that forces that onto others.)

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

Turns out font lock doesn't pass the end of the top-level match as the bound of the anchored match!

So for example, given a filename that ends in __foo_qux.md, even though the top-level denote-dired-keywords-search-forward matcher correctly matches just __foo_qux, the nested matcher gets all the way through the newline after the filename as the limit on how far it can match.

That explains the overruns I saw (the ones that made the title, keywords, and signature matchers order-dependent).

(I didn't even think to check for that earlier because to me it seems so overwhelmingly obvious that doing so would be the correct move. In retrospect it makes sense because it's meant for any "anchored" matches, not just "contained" matches.)

from denote.

jeanphilippegg avatar jeanphilippegg commented on September 14, 2024

Thanks a lot for your explanations! I think I also understand how it works now. I will do tests on my side and try to get a decent result out of it.

Just a few comments:

  • You have made it so single "-" are fontified as delimiters in filenames. This is something that I don't think should be part of Denote because it is an arbitrary choice to impose on users. It is not given that users chose to put dashes between words in titles. Did you know that, in the latest code, sluggification of all filename components can be disabled/modified? If you don't like "-"s in the titles of your files to be colored, you can change the sluggification to remove the part that replaces spaces with dashes in titles. This way you would not have color between words.
  • This is not a big issue, but your matcher functions apply in Dired buffers only. We also want fontification in backlinks buffers.
  • What happens between consult-line and fontification of Dired buffers?

Thanks for your thorough testing. The most intriguing parts, though, are the names of your personal notes. :P

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

I will do tests on my side and try to get a decent result out of it.

Cheers! In the meantime, I wrote an improvement that I believe solves both the order-dependence and will be faster (I hope significantly). Hopefully I can finish testing and share that shortly.

This is something that I don't think should be part of Denote because it is an arbitrary choice to impose on users.

100% agree.

And I don't expect you to spend time or add complexity to support it as an option either. Cool if I can have it without all of the above in my config, but totally fine if that's what it takes.

Did you know that, in the latest code, sluggification of all filename components can be disabled/modified?

Yep, you've mentioned. But I actually want the dashes, because I want my file names to also be really convenient to work with when I'm in eshell, zsh, and so on.

your matcher functions apply in Dired buffers only. We also want fontification in backlinks buffers.

Oh! Good point. I don't use the backlinks feature (yet?) so I didn't even think about it. Well, so long as there's a clean way to know where each file starts and ends in the backlinks buffer, you can pick between that and the dired method based on the mode of the buffer I guess.

The start detection there's more reason to use Dired's own function, since Dired can have a somewhat flexible format depending on configured ls switches, and it can have the details hidden or shown. That's why I reused Dired's function to move point to the start of the file name.

What happens between consult-line and fontification of Dired buffers?

As far as I can tell, as soon as the Dired buffer is created, fontification starts. I suspect it tries to run as much "in the background" as it can by doing a little bit of fontifying and then yielding back to the command/event loop. Then when consult-line runs, it seems to somehow stop/preempt fontifying until it's done. I don't know if this is intentional or a side-effect of something else. So with the above approach, in that split second between (dired denote-directory) and (consult-line), my fontification approach doesn't get to finish (at least on my phones, I haven't checked on my laptops yet). So all lines are partially fontified, but only the first few are completely fontified.

Incidentally, it is because of this that I know that each top-level matcher gets to run over the entire buffer before the next one does. (As opposed to each one running on a chunk of the buffer first.)

The most intriguing parts, though, are the names of your personal notes. :P

Thank you, thank you. I'm open to questions.

from denote.

jeanphilippegg avatar jeanphilippegg commented on September 14, 2024

I have been successful at making fontification work with file name components out of order (including the identifier). Any component can be absent as well (including identifier).

One issue that you probably had is that you first fontify the tokens (--,__,==) and then fontify the components. This means that nested tokens are fontified first and, since you cannot fontify something twice, when your title matcher maches, there is already something fontified in it and it won't work. You should fontify tokens last, once you have already done the fontification of other components.

from denote.

jeanphilippegg avatar jeanphilippegg commented on September 14, 2024

@mentalisttraceur Have you started copyright assignment to the FSF? The process is simple and it would allow you to contribute with code directly.

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

@jeanphilippegg yep (I technically started the process in May of last year, haha; today I finally printed+signed+scanned+PDF'ed+sent it back)

from denote.

pprevos avatar pprevos commented on September 14, 2024

... music and ebooks don't have front matter

A lot of digital file formats have room for meta data. I have played around with using exiftool as the front matter.

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

A lot of digital file formats have room for meta data. I have played around with using exiftool as the front matter.

Yes. But!

A system where each file hopefully has some file-format-dependent metadata, which needs to be handled by file-format-dependent code, that programs need to go out of their way to understand, is not a system I want to depend on for my life's documents.

Maybe one day, when the ecosystem is better. ugrep+ is a promising example in that direction. So is Recoll. I hope one day someone writes a tool to manipulate a standard/common subset of metadata for every format I might want to title, date, and tag (and whatever I end up using denoted filename signatures for). I hope one day every single app that lets you search for files in any way for any reason can also search the metadata of any possibly relevant file formats.

I do have some hope for that day. In many ways that's better.

Until that day comes, I'm putting everything that matters into file names - I want an approach that I can reliably use with the tools on-hand, that I can reliably and easily manipulate with code I can quickly write.

from denote.

jeanphilippegg avatar jeanphilippegg commented on September 14, 2024

@jeanphilippegg yep (I technically started the process in May of last year, haha; today I finally printed+signed+scanned+PDF'ed+sent it back)

That's good news! Keep us posted.

from denote.

pprevos avatar pprevos commented on September 14, 2024

Changing front matter in digital files that use exif is almost trivial. I am working on a package to do this. This is an example function that extracts the creation date from images, PDF etc and generates a Denote ID (if no meta data is available, then it reverts to the file-creation date:

(defun denote-explore--exiftool ()
  "Return non-nil if exiftool is available."
  (executable-find "exiftool"))

(defun denote-explore--retrieve-attachment-identifier (file)
  "Extract creation date from FILE using exiftool or the file attributes."
  (if-let* ((exiftool (denote-explore--exiftool))
	    (file-esc (shell-quote-argument (expand-file-name file)))
	    (call (format "%s -s -s -s -a -CreateDate %s" exiftool file-esc))
	    (exif-date (string-trim (shell-command-to-string call)))
	    (check (and (> (length exif-date) 1)
			(not (string-prefix-p "Warning:" exif-date))))
	    (rfc822-date (concat
			  (replace-regexp-in-string
			   ":" "-"
			   (substring exif-date 0 11))
			  (substring exif-date 11 19))))
      (format-time-string denote-id-format (date-to-time rfc822-date))
    (denote--file-attributes-time file)))

Exiftool can equally be used to write metadata into Exif format. File names will always be a workaround to store data due to the inherent limitations and the additional Denote syntax.

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

@pprevos sounds like a promising move in the right direction, thanks for contributing to a better ecosystem for us all!

That said, I disagree with this:

File names will always be a workaround to store data due to the inherent limitations and the additional Denote syntax.

[click to expand]

So far, the data I store in denoted names is the same information I would store in those names anyway. The constraints of denoted names which I follow are ones I would self-impose on my file naming anyway.

As a thought exercise: okay great, we can use exiftool to look into and change metadata of some files. Maybe it even provides search, maybe even recursive search within a directory to find all files whose metadata matches a certain regex (what if I want to match two different possibilities - either author field contains x or tags contains y, etc?). Even if it has all that (or ugrep+ does), what happens when I realize I would benefit from UX like Orderless to search that metadata? And so on and so on.

The ability to reuse the same things for as many different things as possible is one of the most powerful tools and design principles we have.

I personally don't want O(n) dependencies and code paths (where n is how many different file metadata formats exist in my files) just to manage one kind of metadata which has zero inherent coupling to the file type. If those dependencies and code paths are all concentrated in one tool/library, that's certainly better, but that's still more complexity, it's just better-managed complexity.

When I have to drop down to using a shell, I still have 100% of the power of denoted naming at my fingertips with a generic orthogonal tool like find or fd, because file names are universal, every improvement I write or find that works for file names is an improvement that works for all my files, forever, no matter what formats come and go or what use-case some working group deigned worth standardizing.

I can get the metadata out of a PNG or PDF with the right exiftool invocation or a hex dump and some persistence, but I can get the metadata out of a file name with my eyes. I have never been on a computer or phone that didn't have a way to see file names, but I have been on computers and phones that did not have exiftool.

I get orders of magnitude more value out of learning how to invoke find than exiftool, because the former pays off across all possible files (or anything with filesystem presence), while exiftool only pays off across the strict subset of files that have EXIF metadata.

When I started using denoted tags to store tag-like information, I automatically gained all the power of Orderless+consult-line, evil-ex-search, isearch, my customizations on top of those, and literally any future text-oriented search UX improvement, for searching that file metadata.

The next time I think of some improvement to my workflows or UX with that metadata, I won't need to learn or write anything about O(n) file formats or their tools and libraries, or convince O(n) devs of libraries/tools that some necessary enabling change upstream is worthwhile.

As long as file names remain a user-facing part of computing, I will want something like Denote's naming convention with the enhancements I currently use (-- allowed in titles, - allowed in tags, and datetime made optional), no matter how good the support for metadata inside the files gets.

Basically,

  1. If you wouldn't feel fine with me renaming every one of your files to a random UUID (and updating any links/shortcuts/references to match), that's a sign that metadata support isn't yet good enough to cover all the possible benefits we can get out of putting some of that information into the file name.

  2. Data in file names is not a workaround, it is a way to gain O(n) benefit for a tiny O(1) effort and complexity cost.

from denote.

protesilaos avatar protesilaos commented on September 14, 2024

Just to note that I merged the pull request of @jeanphilippegg. Please test this and we are good to go!

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

Thanks @protesilaos & @jeanphilippegg !

Sadly, I'm not sure how soon I'll be able to help with testing (it's a little more involved for me than just grabbing the new code, since by now I'm using my own reimplementions of most of the file naming stuff and just calling lower-level Denote APIs).

from denote.

jeanphilippegg avatar jeanphilippegg commented on September 14, 2024

Hello @mentalisttraceur,

I know you are using your own modified version of the code now. I would have a
couple questions.

1- What did you end up using for the fontification of your files in Dired?
With your modifications, you reported experiencing performance issues. I am
asking because I also made my own version of the fontification. It works
great, but it also uses anchored-highlighters.

2- I think you have completed your FSF copyright assignment. Do you confirm?
If we need it, do you agree that we use some of the code that you have shared
in Denote's issues and pull requests? I don't think we will have to use it
directly, but it can be useful.

Thank you!

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

@jeanphilippegg

  1. Yep! Copyright papers done (I was going to mention it as soon as I got around to finding which of these 10+ issues we discussed that in 😅). Yes, I agree that you can use any of the code I've shared in these issues/PRs.

  2. My current fontification is here: https://github.com/mentalisttraceur/home/blob/fcde8fe56a630be4788be09b67ebc6097bf7a07f/.emacs.el#L4311-L4373 . It's more rigid/limited, and also fontifies a couple things that are outside of Denote's scope, but it fixed the performance issues: I'm now doing everything as submatches of just one top-level Dired-aware pass instead of the four which I had previously.

from denote.

jeanphilippegg avatar jeanphilippegg commented on September 14, 2024

Thank you!

I will look at your current fontification in case I get inspired.

Also, I think most of your issues will be resolved soon. You might want to check Denote's code then!

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

Cheers. If it helps, here's a visual of what I'm currently doing with my fontification:

Screenshot_20240516-090707

(This is a recurring "task file". Kinda like a calendar event. The datetime (Denote ID) is when it's scheduled to start. The green is compact ISO8601 duration for how long the task/event is (this one's an hour) and red is ISO8601* repetition for how often it happens (this one's once a day). (Other stuff in the signature would still use Denote's signature face, but that's not shown here and I don't have any other signature uses for now.)

* with the redundant+inconvenient separator after the "R[n]" omitted)

from denote.

protesilaos avatar protesilaos commented on September 14, 2024

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

@protesilaos re: diredfl:

I barely understand any of the moving parts here and I don't use diredfl, but my first thought is to try

  1. making sure Denote's font lock keywords are applied before diredfl's, and

  2. changing the diredfl-mode function (which uses (setq font-lock-defaults ...) to use font-lock-add-keywords.

from denote.

protesilaos avatar protesilaos commented on September 14, 2024

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

About the keyword separator tangent (maybe time to split off a new issue for that?):

I will propose to have keywords separated with "__" (double underscores), just like the token itself.

!!! Yes!! I have in the past wanted exactly this! It enables using _ to separate words within a tag and - to separate words within a title! Which in turn

  • makes it easier to visually tell multi-word tags apart from titles, and

  • reconciles multi-word tags with reliably being able to search for a word in just tags or in just titles.

I do feel a slight aesthetic aversion to it, but I think that might be just/mainly because I got used to the naming as it is.

The more I've thought about it and used denoted naming, the more I think I'd personally stick with the current way for my files: one _ delimiting each tag (and - to separate words within tags).

It helps me visually parse when

  1. words within one tag are at least as tightly bound as the words within a title (- is a "stronger" visual link than _), and when
  2. the individual tags are more tightly bound to each other than to the rest of the name (_ is tighter than __).

(I suspect this is optimizing for how human vision works, but either way, it seems more optimal for me.)

Secondary reasons:

  1. _ "hides" when text is highlighted;
  2. single underscore is more space-efficient (I often look at denoted filenames with multiple tags on phone screens where I get <60 characters per line).

So if you do go forward with double-underscore between every keyword, I guess I hope/recommend keeping it optional.

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

I've now carefully read over the merged PR linked above: it looks good to me, and I'm confident it won't break anything on my end. So I think this issue can be closed. 👍

from denote.

jeanphilippegg avatar jeanphilippegg commented on September 14, 2024

Cheers. If it helps, here's a visual of what I'm currently doing with my fontification:

That's a nice result! Here is what I have got as well:

notes

The more I've thought about it and used denoted naming, the more I think I'd personally stick with the current way for my files: one _ delimiting each tag (and - to separate words within tags).

Actually, I had come to the same conclusion, for the same reasons! I have abandoned this idea and will not propose to change this.

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

Re:

Looks promising!

&

That's a nice result!

Thank you both! Though I was just sharing that picture to clarify p-duration-regexp and r-repeat-p-duration-regexp in my current fontification, since those aren't part of Denote's naming and the visual is a lot more obvious than those dense regexps.

from denote.

mentalisttraceur avatar mentalisttraceur commented on September 14, 2024

I'm now testing/using this regexp change from the above PR:

(defconst denote-title-regexp "--\\([^.]*?\\)\\(==.*\\|__.*\\|\\..*\\)*$"

I'll report if I see any issues but from initial testing it's working great.

from denote.

jeanphilippegg avatar jeanphilippegg commented on September 14, 2024

@mentalisttraceur I just shared my version of the fontification in pull request #359 if you would like to give it a try. I don't know if it introduces any performance issues, but I also use a single top-level matcher.

from denote.

jeanphilippegg avatar jeanphilippegg commented on September 14, 2024

This issue can be closed because consecutive hyphens are permitted if the user chooses to modify/disable the default sluggification.

from denote.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.