GithubHelp home page GithubHelp logo

Comments (13)

bluetech avatar bluetech commented on July 17, 2024

Hi David

For reference here's some background first, telling you stuff you probably already know.

The multiple keysym support which we have in libxkbcommon is single key -> multiple keysyms; the biggest use case for it I'd say is unicode combining characters (in case there isn't a precomposed one to use).
Compose is the "dual": multiple keys -> unicode string or keysym. This allows to input many more keysyms than you'd want to have in a layout/shift level.

X does this entirely on the client side in the input method. For the X input method, all the supporting code is in libX11. On most systems the keysym sequence -> string mappings are defined in files at /usr/share/X11/locale/ . The particular file used is locale-dependent, but almost all of them are exactly like /usr/share/X11/locale/en_US.UTF-8/Compose :
http://cgit.freedesktop.org/xorg/lib/libX11/tree/nls/en_US.UTF-8/Compose.pre

You can define Multi_Key with an XKB option like "compose:ralt". As key presses come in, the resulting keysyms are fed into some state machine which produces the string on the right when the appropriate sequence is matched.
Preferably there is some indication to the user that he's in the middle of a match. Otherwise it can be confusing, why pressing e.g. ` (which produces say XKB_KEY_dead_macron) but nothing happens.

The format itself is regular and easy to parse, here's the grammar:
http://cgit.freedesktop.org/xorg/lib/libX11/tree/modules/im/ximcp/imLcPrs.c?id=6cb02b166361200da35ba14f52cd9aaa493eb0ea#n63
(I've never seen the MODIFIER part being used though).

But there are many different implementations of the concept. Some relevant ones are:

Gtk uses a hard-codes table derived at some point from libX11 Compose:
https://git.gnome.org/browse/gtk+/tree/gtk/gtkimcontextsimpleseqs.h

Qt5 has its own implementation, which does read libX11 Compose files:
https://qt.gitorious.org/qt/qtbase/source/1d039184543c3c1079a56e98ca22d9774166ed3f:src/plugins/platforminputcontexts/compose

Hard-coded table for Wayland input methods:
http://cgit.freedesktop.org/wayland/weston/tree/clients/weston-simple-im.c

from libxkbcommon.

bluetech avatar bluetech commented on July 17, 2024

Now for support in libxkbcommon.

As seen above, most client won't be using it. So clearly we shouldn't force it on the user. Luckily this support, I think, is mostly orthogonal to what libxkbcommon currently does, which is to produce keysyms. So if we want it in libxkbcommon we can put it in a separate helper library/header.

Also, any proper support for Compose would require some adjustments and support from the application. The issues you mentioned are some of what I mean, but different applications will want to handle those differently. So libxbkcommon shouldn't act as a middle man but just provide the boring stuff:

  1. Finding the appropriate Compose files: looking at the appropriate path, locale, ~/.XCompose, includes, etc. (Btw, since the Compose files are shipped with libX11 we'd probably want to move them to xkeyboard-config or similar).
  2. Parsing the file into some data structure which allows fast sequence matching and querying. Hopefully we can find something which doesn't use too much memory.
  3. (Optional) The state machine, which is fed keysyms and reports matches/prefixes or whatever else is of interest.

Basically providing the mechanism while still allowing the application/toolkit to do whatever it wants.
And this can all be entirely separate from the keymap handling part, or in a new library entirely. Once there's a decent API it should be relatively straightforward to code.

from libxkbcommon.

dvdhrm avatar dvdhrm commented on July 17, 2024

Thanks a lot for the info. The format is indeed regular, should be easy to write a parser by hand. I'd really like to see that added to xkbcommon as otherwise everyone needs to implement it on their own (and I see no reason for that). Obviously, it should be optional. I was thinking of something like:

  • xkb_context looks up the compose-path additionally to include-paths. I don't see any real reason to not re-use the same context for keymaps and compose, do you?
  • An xkb_compose object to load a given compose-file, query it and iterate over the single rules
  • Maybe a state-machine that takes keysyms and returns keysyms. But it's fairly trivial so maybe not worth it.

I will be working on a rough API this week and send it to the list.
Thanks Ran!

from libxkbcommon.

bluetech avatar bluetech commented on July 17, 2024

Oh so you'll be working on it, great :)

I had actually written a parser for this file format about a year ago, and your comment prompted me to brush it up and integrate with current libxkbcommon code. I've also added some boilerplate API in order to run some tests; it's based almost entirely on the xkb_keymap / xkb_state duo. Unfortunately I think it's still missing some stuff, and I never got to writing the data structure for the sequences (which ought to be the fun part). So it's just a constant-sized array now :)

Anyway I've put it on a branch now, feel free to do (or not) whatever you want with it.
https://github.com/bluetech/libxkbcommon/commits/compose

If you post something to the list, I'll take some time to review it.

from libxkbcommon.

dvdhrm avatar dvdhrm commented on July 17, 2024

Awesome! I hate writing lexers/parsers and you obviously have a fable for it :) Will pick that up but might have to wait for the weekend.

from libxkbcommon.

bluetech avatar bluetech commented on July 17, 2024

I had some time yesterday between exams, and Gatis reminded me about this. So had some fun hacking on this. I got the basic trie to work and wrote some tests. Needs lots of work, and the API is just provisional so I can see something happen :). But I'll try to iterate on this when I get some more time.

I put the branch in the main repo now:
https://github.com/xkbcommon/libxkbcommon/commits/compose

from libxkbcommon.

bluetech avatar bluetech commented on July 17, 2024

OK, since I ran out of ideas for improving the API, I tried to use it with kmscon (as a "real life" example). It's here:
bluetech/kmscon@b61e07a
The relevant part is in uxkb_dev_process(). Doesn't look too bad to me, and works nicely as far as my testing goes.

There's still stuff I ignored (like modifiers in Compose files -- not sure if it's even worth a look) and non-UTF-8 Compose files (which would require parsing a bunch more libX11 files and messing with locales and iconv..).

from libxkbcommon.

dvdhrm avatar dvdhrm commented on July 17, 2024

Some comments:

  • I kinda feel bad for having done nothing on it even though I promised.. Sorry!
  • Your example looks wrong: "<`> key (the dead key) and then may produce the symbol
    ú" shouldn't this be <´>?
  • Why is this picked by the locale? I know, X11 legacy.. but this seems weird to me. Shouldn't it be part of the RMLVO? Anyhow, I guess we cannot change it.
  • I dislike the *_get_one_sym() thing. We have the multiple-keysym API and I tried to use it consistently. No-one else does, I know.. but I do! For a proof-of-concept your patch seems fine, though.
  • https://bugs.freedesktop.org/show_bug.cgi?id=67167 << ugh? That's annoying.. but I'd prefer if we do that in kmscon instead of falling back to get_one_sym()..
  • Can you add xkb_compose_state_get_syms()?
  • what does xkb_compose_get_utf8() do? Is this because "Compose" files define UTF8 output? And you just try to map it to keysyms internally? What happens if the UTF8 output uses combining-characters? What is the maximum size to pass to this function? If there's not maximum, I'd prefer it if it returns an allocated zero-terminated buffer instead. Or use a state-internal buffer that is overwritten on each call to get_utf8()..

Otherwise the API looks really nice! It should be very easy to integrate into input-methods (if they don't want to do that themselves..) and it also provides an easy way for people that don't want input-methods.

Thanks a lot Ran!
David

from libxkbcommon.

bluetech avatar bluetech commented on July 17, 2024

(replying from my mail client, hope it comes out fine...)

On Thu, Feb 13, 2014 at 03:08:26AM -0800, David Herrmann wrote:

Some comments:

  • I kinda feel bad for having done nothing on it even though I promised.. Sorry!
  • Your example looks wrong: "<`> key (the dead key) and then may produce the symbol
    ú" shouldn't this be <´>?

Unless I'm misunderstanding your question, I think it's fine. Here's the sequence:

<dead_acute> <u>                        : "ú"   uacute # LATIN SMALL LETTER U WITH ACUTE

And the us(intl) keymap has this:

key <TLDE> { [dead_grave, dead_tilde,         grave, asciitilde ] }; }
  • Why is this picked by the locale? I know, X11 legacy.. but this seems weird to me. Shouldn't it be part of the RMLVO? Anyhow, I guess we cannot change it.

Since the locale is baked into the file search procedure, and into the file format itself (with %L include-statement expansion), I must use it. Since I didn't want to do setlocale() or other tricks from within the
library, I made it an explicit parameter to the functions. If you don't want to setlocale(), you can what I did in the kmscon patch.

But even though no one likes locales, I think just using the name is convenient, it provides a reasonable default without needing explicit configuration (and most map to en_US.UTF-8 anyway). I would actually not have considered it too unreasonable if the RMLVO used it as well...

If you want, you can add an configuration, so the user can explicitly choose which file to use. For normal applications, you wouldn't need to, since ~/.XCompose takes priority. But I'm not sure if $HOME is relevant for kmscon.

  • I dislike the *_get_one_sym() thing. We have the multiple-keysym API and I tried to use it consistently. No-one else does, I know.. but I do! For a proof-of-concept your patch seems fine, though.

Hmm, OK, so what we have now is:

single key -> single keysym (basic keymap)
single key -> multiple keysyms (extended xkbcommon keymaps)
multiple keys -> single keysym (basic compose)

What you're proposing is:

multiple keys -> multiple keysyms (extended xkbcommon compose)

That's very flexible :)
I suppose it certainly makes sense to have combining characters in Compose files, i.e., non-precomposed (sorry) unicode characters. Also the format naturally extends itself for that (just allow multiple keysyms on the right-hand side).

But:

  • I don't think it will ever by used.
  • There's already the utf8 thing with partly overlapping functionality.
  • Complicates the API, most people use the get_one_sym() variant or otherwise handle just the single-keysym case.
  • If we support it generically the trie will consume more memory.

So my feeling is it's not worth it, and can be added later if we want it. But I can add a get_syms() which always returns 1 or 0 for consistency, that makes sense. What do you think?

The RFC period on that one has ended I'm afraid :)
I still plan to fix the few keymaps which need it, so this becomes entirely a non-issue. For kmscon I would just ignore it, but it fitted nicely with the compose stuff, so I didn't see a reason not to. But you
can do whatever you feel like here.

  • Can you add xkb_compose_state_get_syms()?
  • what does xkb_compose_get_utf8() do? Is this because "Compose" files define UTF8 output? And you just try to map it to keysyms internally? What happens if the UTF8 output uses combining-characters? What is the maximum size to pass to this function? If there's not maximum, I'd prefer it if it returns an allocated zero-terminated buffer instead. Or use a state-internal buffer that is overwritten on each call to get_utf8()..

Yes, each sequence can result in either a string, a keysym, or both. If there isn't a string I return the keysym's utf8 representation, but most of the time they have both. See Xutf8LookupString(3) for what Xlib does there. (There's no string -> keysym mapping though).

And yes, the string can be arbitrary; try adding this to your ~/.XCompose, run xterm and type 1:

<1> : "hello"

I'll fix the function some way.

Otherwise the API looks really nice! It should be very easy to integrate into input-methods (if they don't want to do that themselves..) and it also provides an easy way for people that don't want input-methods.

Nah, don't think IMs would want to use this, they have their own. Maybe some lightweight fallback one (and they'd need more API for sure).

Thanks for the comments!

Ran

from libxkbcommon.

bluetech avatar bluetech commented on July 17, 2024

Status update: the xkbcommon-compose implementation is complete, I've updated the https://github.com/xkbcommon/libxkbcommon/commits/compose branch.

Next I'll send for comments, and try to split the Compose files from libX11 to a separate repo.

from libxkbcommon.

dvdhrm avatar dvdhrm commented on July 17, 2024

This looks really cool! I will play around with it later and report back if anything goes unexpected.

Regarding the compose files: Why not include them in xkeyboard-config? It's not really a keyboard configuration, but the compose files are pretty useless if used without keyboards.. so kinda co-dependent.

from libxkbcommon.

bluetech avatar bluetech commented on July 17, 2024

Both xkeyboard-config and "xlocale-config" can conceivably be usefel independently, so I wouldn't call them co-dependent. Also, the compose files must drag along some Xlib locale nonsense (and Xlib will depend on it). So I don't see either the Xlib or xkeyboard-config maintainers going for that. Path of least resistance...

from libxkbcommon.

bluetech avatar bluetech commented on July 17, 2024

The branch has been merged now; should be a part of v0.5.0.

from libxkbcommon.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.