GithubHelp home page GithubHelp logo

arp242 / uni Goto Github PK

View Code? Open in Web Editor NEW
780.0 12.0 20.0 3.71 MB

Query the Unicode database from the commandline, with good support for emojis

License: MIT License

Go 71.48% Shell 3.20% HTML 2.62% Python 0.17% JavaScript 13.00% Awk 9.00% Vim Script 0.54%
go unicode emoji emoji-picker golang

uni's Introduction

uni queries the Unicode database from the commandline. It supports Unicode 15.1 (September 2023) and has good support for emojis.

There are four commands: identify codepoints in a string, search for codepoints, print codepoints by class, block, or range, and emoji to find emojis.

There are binaries on the releases page, and packages for a number of platforms. You can also run it in your browser.

Compile from source with:

% go install module zgo.at/uni/v2@latest

which will give you a uni binary in ~/go/bin.

README index:

Integrations

  • dmenu, rofi, and fzf script at dmenu-uni. See the top of the script for some options you may want to frob with.

  • For a Vim command see uni.vim; just copy/paste it in your vimrc.

Usage

Note: the alignment is slightly off for some entries due to the way GitHub renders wide characters; in terminals it should be aligned correctly.

Identify

Identify characters in a string, as a kind of a unicode-aware hexdump:

% uni identify €
             Dec    UTF8        HTML       Name
'€'  U+20AC  8364   e2 82 ac    €     EURO SIGN

i is a shortcut for identify:

% uni i h€ý
             Dec    UTF8        HTML       Name
'h'  U+0068  104    68          h     LATIN SMALL LETTER H
'€'  U+20AC  8364   e2 82 ac    €     EURO SIGN
'ý'  U+00FD  253    c3 bd       ý   LATIN SMALL LETTER Y WITH ACUTE

It reads from stdin:

 % head -c5 README.md | uni i
      CPoint  Dec    UTF8        HTML       Name
 '`'  U+0060  96     60          `    GRAVE ACCENT [backtick, backquote]
 'u'  U+0075  117    75          u     LATIN SMALL LETTER U
 'n'  U+006E  110    6e          n     LATIN SMALL LETTER N
 'i'  U+0069  105    69          i     LATIN SMALL LETTER I
 '`'  U+0060  96     60          `    GRAVE ACCENT [backtick, backquote]

% echo 'U+1234 U+1111' | uni p
     CPoint  Dec    UTF8        HTML       Name
'ᄑ' U+1111  4369   e1 84 91    ᄑ   HANGUL CHOSEONG PHIEUPH [P]
'ሴ'  U+1234  4660   e1 88 b4    ሴ   ETHIOPIC SYLLABLE SEE

You can use -compact (or -c) to suppress the header, and -format (or -f) to control the output format:

% uni i -f '%unicode %name' a€🧟
Unicode Name
1.1     LATIN SMALL LETTER A
2.1     EURO SIGN
10.0    ZOMBIE

If the format string starts with + it will automatically be prepended with the character, codepoint, and name:

% uni i -f +%unicode a€🧟
             Name                 Unicode
'a'  U+0061  LATIN SMALL LETTER A 1.1
'€'  U+20AC  EURO SIGN            2.1
'🧟' U+1F9DF ZOMBIE               10.0

You can add more advanced options with %(name flags); for example to generate an aligned codepoint to X11 keysym mapping:

% uni i -c -f '0x%(hex l:auto f:0): %(keysym l:auto q:":",) // %name' h€ý
0x6800: "h",        // LATIN SMALL LETTER H
0x20ac: "EuroSign", // EURO SIGN
0xfd00: "yacute",   // LATIN SMALL LETTER Y WITH ACUTE

See uni help for more details on the -format flag; this flag can also be added to other commands.

Search

Search description:

% uni search euro
             Dec    UTF8        HTML       Name
'₠'  U+20A0  8352   e2 82 a0    ₠   EURO-CURRENCY SIGN
'€'  U+20AC  8364   e2 82 ac    €     EURO SIGN
'𐡷'  U+10877 67703  f0 90 a1 b7 𐡷  PALMYRENE LEFT-POINTING FLEURON
'𐡸'  U+10878 67704  f0 90 a1 b8 𐡸  PALMYRENE RIGHT-POINTING FLEURON
'𐫱'  U+10AF1 68337  f0 90 ab b1 𐫱  MANICHAEAN PUNCTUATION FLEURON
'🌍' U+1F30D 127757 f0 9f 8c 8d 🌍  EARTH GLOBE EUROPE-AFRICA
'🏤' U+1F3E4 127972 f0 9f 8f a4 🏤  EUROPEAN POST OFFICE
'🏰' U+1F3F0 127984 f0 9f 8f b0 🏰  EUROPEAN CASTLE
'💶' U+1F4B6 128182 f0 9f 92 b6 💶  BANKNOTE WITH EURO SIGN

The s command is a shortcut for search. Multiple words are matched individually:

% uni s globe earth
             Dec    UTF8        HTML       Name
'🌍' U+1F30D 127757 f0 9f 8c 8d 🌍  EARTH GLOBE EUROPE-AFRICA
'🌎' U+1F30E 127758 f0 9f 8c 8e 🌎  EARTH GLOBE AMERICAS
'🌏' U+1F30F 127759 f0 9f 8c 8f 🌏  EARTH GLOBE ASIA-AUSTRALIA

Use shell quoting for more literal matches:

% uni s rightwards black arrow
             Dec    UTF8        HTML       Name
'➡'  U+27A1  10145  e2 9e a1    ➡   BLACK RIGHTWARDS ARROW
'➤'  U+27A4  10148  e2 9e a4    ➤   BLACK RIGHTWARDS ARROWHEAD
…

% uni s 'rightwards black arrow'
             Dec    UTF8        HTML       Name
'⮕'  U+2B95  11157  e2 ae 95    ⮕   RIGHTWARDS BLACK ARROW

Add -or or -o to combine the search terms with "OR" instead of "AND":

% uni s -o globe milky
             Dec    UTF8        HTML       Name
'🌌' U+1F30C 127756 f0 9f 8c 8c 🌌  MILKY WAY
'🌍' U+1F30D 127757 f0 9f 8c 8d 🌍  EARTH GLOBE EUROPE-AFRICA
'🌎' U+1F30E 127758 f0 9f 8c 8e 🌎  EARTH GLOBE AMERICAS
'🌏' U+1F30F 127759 f0 9f 8c 8f 🌏  EARTH GLOBE ASIA-AUSTRALIA
'🌐' U+1F310 127760 f0 9f 8c 90 🌐  GLOBE WITH MERIDIANS

Print

Print specific codepoints or groups of codepoints:

% uni print U+2042
             Dec    UTF8        HTML       Name
'⁂'  U+2042  8258   e2 81 82    ⁂   ASTERISM

Print a custom range; U+2042, U2042, and 2042 are all identical:

% uni print 2042..2044
             Dec    UTF8        HTML       Name
'⁂'  U+2042  8258   e2 81 82    ⁂   ASTERISM
'⁃'  U+2043  8259   e2 81 83    ⁃   HYPHEN BULLET
'⁄'  U+2044  8260   e2 81 84    ⁄    FRACTION SLASH [solidus]

You can also use hex, octal, and binary numbers: 0x2024, 0o20102, or 0b10000001000010.

General category:

% uni p Po
Showing category Po (Other_Punctuation)
             Dec    UTF8        HTML       Name
'!'  U+0021  33     21          !     EXCLAMATION MARK [factorial, bang]
…

Blocks:

% uni p arrows 'box drawing'
Showing block Arrows
Showing block Box Drawing
             Dec    UTF8        HTML       Name
'←'  U+2190  8592   e2 86 90    ←     LEFTWARDS ARROW
'↑'  U+2191  8593   e2 86 91    ↑     UPWARDS ARROW
…

Print as table, and with a shorter name:

% uni p -as table box
Showing block Box Drawing
         0   1   2   3   4   5   6   7   8   9   A   B   C   D   E   F
       ┌────────────────────────────────────────────────────────────────
U+250x │ ─   ━   │   ┃   ┄   ┅   ┆   ┇   ┈   ┉   ┊   ┋   ┌   ┍   ┎   ┏
       │
U+251x │ ┐   ┑   ┒   ┓   └   ┕   ┖   ┗   ┘   ┙   ┚   ┛   ├   ┝   ┞   ┟
       │
U+252x │ ┠   ┡   ┢   ┣   ┤   ┥   ┦   ┧   ┨   ┩   ┪   ┫   ┬   ┭   ┮   ┯
       │
U+253x │ ┰   ┱   ┲   ┳   ┴   ┵   ┶   ┷   ┸   ┹   ┺   ┻   ┼   ┽   ┾   ┿
       │
U+254x │ ╀   ╁   ╂   ╃   ╄   ╅   ╆   ╇   ╈   ╉   ╊   ╋   ╌   ╍   ╎   ╏
       │
U+255x │ ═   ║   ╒   ╓   ╔   ╕   ╖   ╗   ╘   ╙   ╚   ╛   ╜   ╝   ╞   ╟
       │
U+256x │ ╠   ╡   ╢   ╣   ╤   ╥   ╦   ╧   ╨   ╩   ╪   ╫   ╬   ╭   ╮   ╯
       │
U+257x │ ╰   ╱   ╲   ╳   ╴   ╵   ╶   ╷   ╸   ╹   ╺   ╻   ╼   ╽   ╾   ╿
       │

Or more compact table:

% uni p -as table box -compact
         0   1   2   3   4   5   6   7   8   9   A   B   C   D   E   F
       ┌────────────────────────────────────────────────────────────────
U+250x │ ─   ━   │   ┃   ┄   ┅   ┆   ┇   ┈   ┉   ┊   ┋   ┌   ┍   ┎   ┏
U+251x │ ┐   ┑   ┒   ┓   └   ┕   ┖   ┗   ┘   ┙   ┚   ┛   ├   ┝   ┞   ┟
U+252x │ ┠   ┡   ┢   ┣   ┤   ┥   ┦   ┧   ┨   ┩   ┪   ┫   ┬   ┭   ┮   ┯
U+253x │ ┰   ┱   ┲   ┳   ┴   ┵   ┶   ┷   ┸   ┹   ┺   ┻   ┼   ┽   ┾   ┿
U+254x │ ╀   ╁   ╂   ╃   ╄   ╅   ╆   ╇   ╈   ╉   ╊   ╋   ╌   ╍   ╎   ╏
U+255x │ ═   ║   ╒   ╓   ╔   ╕   ╖   ╗   ╘   ╙   ╚   ╛   ╜   ╝   ╞   ╟
U+256x │ ╠   ╡   ╢   ╣   ╤   ╥   ╦   ╧   ╨   ╩   ╪   ╫   ╬   ╭   ╮   ╯
U+257x │ ╰   ╱   ╲   ╳   ╴   ╵   ╶   ╷   ╸   ╹   ╺   ╻   ╼   ╽   ╾   ╿

Emoji

The emoji command (shortcut: e) is is the real reason I wrote this:

% uni e cry
	Name                      CLDR
🥹	face holding back tears  [admiration, angry, aw, aww, cry, embarrassed, feelings, grateful, gratitude, please, proud, resist, sad, sadness, tears of joy]
😢	crying face              [awful, feels, miss, sad, tear, triste, unhappy]
😭	loudly crying face       [bawling, sad, sob, tear, tears, unhappy]
😿	crying cat               [animal, crying cat face, face, sad, tear]
🔮	crystal ball             [fairy tale, fairytale, fantasy, fortune, future, magic, tool]

By default both the name and CLDR data are searched; the CLDR data is a list of keywords for an emoji; prefix with name: or n: to search on the name only:

% uni e smile
	Name                              CLDR
😀	grinning face                    [cheerful, cheery, happy, laugh, nice, smile, smiling, teeth]
😃	grinning face with big eyes      [awesome, happy, mouth, open, smile, smiling, smiling face with open mouth, teeth, yay]
…

% uni e name:smile
	Name                 CLDR
😼	cat with wry smile  [animal, cat face with wry smile, face, ironic]

As you can see, the CLDR is pretty useful, as "smile" only gives one result as most emojis use "smiling".

Prefix with group: to search by group:

% uni e group:hands
	Name                CLDR
👏	clapping hands     [applause, approval, awesome, congrats, congratulations, excited, good job, great, homie, nice, prayed, well done, yay]
🙌	raising hands      [celebration, gesture, hooray, praise, raised]
🫶	heart hands        [<3, love, love you]
👐	open hands         [hug, jazz hands, swerve]
🤲	palms up together  [cupped hands, dua, pray, prayer, wish]
🤝	handshake          [agreement, deal, meeting]
🙏	folded hands       [appreciate, ask, beg, blessed, bow, cmon, five, gesture, high 5, high five, please, pray, thank, thank you, thanks, thx]

Group and search can be combined, and group: can be abbreviated to g::

% uni e g:cat-face grin
	Name                             CLDR
😺	grinning cat                    [animal, face, mouth, open, smile, smiling cat face with open mouth]
😸	grinning cat with smiling eyes  [animal, face, grinning cat face with smiling eyes, smile]

Like with search, use -or to OR the parameters together instead of AND:

% uni e -or g:face-glasses g:face-hat
	Name                           CLDR
🤠	cowboy hat face               [cowgirl]
🥳	partying face                 [birthday, celebrate, celebration, excited, happy bday, happy birthday, hat, hooray, horn]
🥸	disguised face                [eyebrow, glasses, incognito, moustache, mustache, nose, person, spy, tache, tash]
😎	smiling face with sunglasses  [awesome, beach, bright, bro, chillin, cool, eye, eyewear, fly, rad, relaxed, shades, slay, smile, stunner, style, swag, swagger, win, winning, yeah]
🤓	nerd face                     [brainy, clever, expert, geek, gifted, glasses, intelligent, smart]
🧐	face with monocle             [classy, fancy, rich, stuffy, wealthy]

Apply skin tone modifiers with -tone:

% uni e -tone dark g:hands
	Name                                CLDR
👏🏿	clapping hands: dark skin tone     [applause, approval, awesome, congrats, congratulations, excited, good job, great, homie, nice, prayed, well done, yay]
🙌🏿	raising hands: dark skin tone      [celebration, gesture, hooray, praise, raised]
🫶🏿	heart hands: dark skin tone        [<3, love, love you]
👐🏿	open hands: dark skin tone         [hug, jazz hands, swerve]
🤲🏿	palms up together: dark skin tone  [cupped hands, dua, pray, prayer, wish]
🤝🏿	handshake: dark skin tone          [agreement, deal, meeting]
🙏🏿	folded hands: dark skin tone       [appreciate, ask, beg, blessed, bow, cmon, five, gesture, high 5, high five, please, pray, thank, thank you, thanks, thx]

The handshake emoji supports setting individual skin tones per hand since Unicode 14, but this isn't supported, mostly because I can't really really think a good CLI interface for setting this without breaking compatibility (there are some other emojis too, like "holding hands" and "kissing" where you can set both the gender and skin tone of both sides individually). Maybe for uni v3 someday.

The default is to display only the gender-neutral "person", but this can be changed with the -gender option:

% uni e -gender man g:person-gesture
	Name               CLDR
🙍‍♂️	man frowning      [annoyed, disappoint, disgruntled, disturbed, frustrated, gesture, irritated, not happy, person frowning, upset, woman frowning]
🙎‍♂️	man pouting       [disappoint, downtrodden, frown, gesture, grimace, person pouting, scowl, sulk, upset, whine, woman pouting]
🙅‍♂️	man gesturing NO  [exclude, forbidden, gesture, hand, no, nope, not, not a chance, person gesturing NO, prohibit, prohibited, woman gesturing NO]
🙆‍♂️	man gesturing OK  [exercise, gesture, hand, omg, person gesturing OK, woman gesturing OK]
💁‍♂️	man tipping hand  [fetch, gossip, hair flick, hair flip, help, information, person tipping hand, sarcasm, sarcastic, sassy, seriously, whatever, woman tipping hand]
🙋‍♂️	man raising hand  [gesture, hands, happy, I can help, i know, me, over here, person raising hand, pick me, question, raised, right here, woman raising hand]
🧏‍♂️	deaf man          [accessibility, deaf person, ear, hear]
🙇‍♂️	man bowing        [apology, beg, forgive, gesture, meditate, meditation, person bowing, pity, regret, sorry]
🤦‍♂️	man facepalming   [disbelief, exasperation, not again, oh no, omg, person, person facepalming, shock, smh]
🤷‍♂️	man shrugging     [doubt, dunno, i dunno, I guess, idk, ignorance, indifference, maybe, person, person shrugging, whatever, who knows]

Both -tone and -gender accept multiple values. -gender women,man will display both the female and male variants, and -tone light,dark will display both a light and dark skin tone; use all to display all skin tones or genders:

% uni e -tone light,dark -gender f,m shrug
	Name                               CLDR
🤷🏻‍♂️	man shrugging: light skin tone    [doubt, dunno, i dunno, I guess, idk, ignorance, indifference, maybe, person, person shrugging, whatever, who knows]
🤷🏻‍♀️	woman shrugging: light skin tone  [doubt, dunno, i dunno, I guess, idk, ignorance, indifference, maybe, person, person shrugging, whatever, who knows]
🤷🏿‍♂️	man shrugging: dark skin tone     [doubt, dunno, i dunno, I guess, idk, ignorance, indifference, maybe, person, person shrugging, whatever, who knows]
🤷🏿‍♀️	woman shrugging: dark skin tone   [doubt, dunno, i dunno, I guess, idk, ignorance, indifference, maybe, person, person shrugging, whatever, who knows]

Like print and identify, you can use -format:

% uni e g:cat-face -c -format '%(name): %(emoji)'
grinning cat: 😺
grinning cat with smiling eyes: 😸
cat with tears of joy: 😹
smiling cat with heart-eyes: 😻
cat with wry smile: 😼
kissing cat: 😽
weary cat: 🙀
crying cat: 😿
pouting cat: 😾

See uni help for more details on the -format flag.

JSON

With -as json or -as j you can output the data as JSON:

% uni i -as json h€ý
[{
	"aliases": "",
	"char":    "h",
	"cpoint":  "U+0068",
	"dec":     "104",
	"html":    "h",
	"name":    "LATIN SMALL LETTER H",
	"utf8":    "68"
}, {
	"aliases": "",
	"char":    "€",
	"cpoint":  "U+20AC",
	"dec":     "8364",
	"html":    "€",
	"name":    "EURO SIGN",
	"utf8":    "e2 82 ac"
}, {
	"aliases": "",
	"char":    "ý",
	"cpoint":  "U+00FD",
	"dec":     "253",
	"html":    "ý",
	"name":    "LATIN SMALL LETTER Y WITH ACUTE",
	"utf8":    "c3 bd"
}]

All the columns listed in -f will be included; you can use -f all to include all columns:

% uni i -as json -f all h€ý
[{
	"aliases": "",
	"bin":     "1101000",
	"block":   "Basic Latin",
	"cat":     "Lowercase_Letter",
	"cells":   "1",
	"char":    "h",
	"cpoint":  "U+0068",
	"dec":     "104",
	"digraph": "h",
	"hex":     "68",
	"html":    "h",
	"json":    "\\u0068",
	"keysym":  "h",
	"name":    "LATIN SMALL LETTER H",
	"oct":     "150",
	"plane":   "Basic Multilingual Plane",
	"props":   "",
	"refs":    "U+04BB, U+210E",
	"script":  "Latin",
	"unicode": "1.1",
	"utf16be": "00 68",
	"utf16le": "68 00",
	"utf8":    "68",
	"width":   "neutral",
	"xml":     "h"
}, {
	"aliases": "",
	"bin":     "10000010101100",
	"block":   "Currency Symbols",
	"cat":     "Currency_Symbol",
	"cells":   "1",
	"char":    "€",
	"cpoint":  "U+20AC",
	"dec":     "8364",
	"digraph": "=e",
	"hex":     "20ac",
	"html":    "€",
	"json":    "\\u20ac",
	"keysym":  "EuroSign",
	"name":    "EURO SIGN",
	"oct":     "20254",
	"plane":   "Basic Multilingual Plane",
	"props":   "",
	"refs":    "U+20A0",
	"script":  "Common",
	"unicode": "2.1",
	"utf16be": "20 ac",
	"utf16le": "ac 20",
	"utf8":    "e2 82 ac",
	"width":   "ambiguous",
	"xml":     "€"
}, {
	"aliases": "",
	"bin":     "11111101",
	"block":   "Latin-1 Supplement",
	"cat":     "Lowercase_Letter",
	"cells":   "1",
	"char":    "ý",
	"cpoint":  "U+00FD",
	"dec":     "253",
	"digraph": "y'",
	"hex":     "fd",
	"html":    "ý",
	"json":    "\\u00fd",
	"keysym":  "yacute",
	"name":    "LATIN SMALL LETTER Y WITH ACUTE",
	"oct":     "375",
	"plane":   "Basic Multilingual Plane",
	"props":   "",
	"refs":    "",
	"script":  "Latin",
	"unicode": "1.1",
	"utf16be": "00 fd",
	"utf16le": "fd 00",
	"utf8":    "c3 bd",
	"width":   "narrow",
	"xml":     "ý"
}]

This also works for the emoji command:

% uni e -as json -f all 'kissing cat'
[{
	"cldr":      "animal, eye, face, kissing cat face with closed eyes",
	"cldr_full": "animal, cat, eye, face, kiss, kissing cat, kissing cat face with closed eyes",
	"cpoint":    "U+1F63D",
	"emoji":     "😽",
	"group":     "Smileys & Emotion",
	"name":      "kissing cat",
	"subgroup":  "cat-face"
}]

All values are always a string, even numerical values. This makes things a bit easier/consistent as JSON doesn't support hex literals and such. Use jq or some other tool if you want to process the data further.

ChangeLog

Moved to CHANGELOG.md.

Development

Re-generate the Unicode data with go generate unidata. Files are cached in unidata/.cache, so clear that if you want to update the files from remote. This requires zsh and GNU awk (gawk).

Alternatives

Note this is from ~2017/2018 when I first wrote this; I don't re-evaluate every program every year, and I don't go finding newly created tools every year either.

CLI/TUI

  • https://github.com/philpennock/character

    More or less similar to uni, but very different CLI, and has some additional features. Seems pretty good.

  • https://github.com/sindresorhus/emoj

    Doesn't support emojis sequences (e.g. MAN SHRUGGING is PERSON SHRUGGING + MAN, FIREFIGHTER is PERSON + FIRE TRUCK, etc); quite slow for a CLI program (emoj smiling takes 1.8s on my system, sometimes a lot longer), search results are pretty bad (shrug returns unamused face, thinking face, eyes, confused face, neutral face, tears of joy, and expressionless face ... but not the shrugging emoji), not a fan of npm (has 1862 dependencies).

  • https://github.com/Fingel/tuimoji

    Grouping could be better, doesn't support emojis sequences, only interactive TUI, feels kinda slow-ish especially when searching.

  • https://github.com/pemistahl/chr

    Only deals with codepoints, not emojis.

GUI

  • gnome-characters

    Uses Gnome interface/window decorations and won't work well with other WMs, doesn't deal with emoji sequences, I don't like the grouping/ordering it uses, requires two clicks to copy a character.

  • gucharmap

    Doesn't display emojis, just unicode blocks.

  • KCharSelect

    Many KDE-specific dependencies (106M). Didn't try it.

  • https://github.com/Mange/rofi-emoji and https://github.com/fdw/rofimoji

    Both are pretty similar to the dmenu/rofi integration of uni with some minor differences, and both seem to work well with no major issues.

  • gtk3 emoji picker (Ctrl+; or Ctrl+. in gtk 3.93 or newer)

    Only works in GTK, doesn't work with GTK_IM_MODULE=xim (needed for compose key), for some reasons the emojis look ugly, doesn't display emojis sequences, doesn't have a tooltip or other text description about what the emoji actually is, the variation selector doesn't seem to work (never displays skin tone?), doesn't work in Firefox.

    This is so broken on my system that it seems that I'm missing something for this to work or something?

  • https://github.com/rugk/awesome-emoji-picker

    Only works in Firefox; takes a tad too long to open; doesn't support skin tones.

Didn't investigate (yet)

Some alternatives people have suggested that I haven't looked at; make an issue or email me if you know of any others.

uni's People

Contributors

arp242 avatar esdnm avatar m-cz avatar muesli avatar priner avatar rolandwalker avatar shuuji3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

uni's Issues

Allow copying to clipboard

From HN comment https://news.ycombinator.com/item?id=21780608

echo -n "$selected_symbol" | xclip -i

And then fake a middle click to insert in the current application:

xdotool click 2o

Also maybe send as direct input (xdotool can do this, but is apparently not reliable across all apps/characters).

Can also link to libX11, but this requires cgo and makes builds harder.

allow search to return all

I'd prefer to use search for listing emojis, as it includes, just for example what I was missing now: '☐' U+2610 9744 e2 98 90 ☐ BALLOT BOX (Other_Symbol). emoji does not include this, because it is not deemed an emoji, not in certain emoji codepoint blocks, or so?

Zip with Windows release

I see here

https://github.com/arp242/uni/releases

that all releases are GZ, even Windows. This isnt a huge deal, as you can get
tools to extract GZ on Windows. However to my knowledge they arent built in,
even with Windows 10.

So it might make sense to offer ZIP builds instead to Windows users.

Give control over which variation selectors to print

The U+FE0F and U+FE0E can control if a character is displayed in "character style" or "emoji style"; this matters for some characters that have representations in both:

↔︎ (fe0e, text)
↔️ (fe0f, emoji)

These things are kind "hidden" and tricky, because many editors don't display them at all, including Vim, and different environments have different defaults. I lost quite a bit of time tracking down a problem with this a few weeks ago :-/

Either way, there should be a flag for the emoji command to say which variation selector to use. This also applies to the other commands, but those all assume there is just one codepoint now in various columns, so meh.

Show character codes in emoji mode

Thanks for an excellent tool!

I wish there were a direct way to show character codes in emoji mode. Right now we have:

% uni e switzerland
🇨🇭 flag: Switzerland  Flags  country-flag

and I have to copy & paste that character to uni i. Maybe add a verbose flag to get:

% uni e --verbose switzerland
🇨🇭 flag: Switzerland  Flags  country-flag
    '🇨'  U+1F1E8 127464 f0 9f 87 a8 🇨  REGIONAL INDICATOR SYMBOL LETTER C (Other_Symbol)
    '🇭'  U+1F1ED 127469 f0 9f 87 ad 🇭  REGIONAL INDICATOR SYMBOL LETTER H (Other_Symbol)

Add aarch64 and arm binaries for Termux

Package: golang
Version: 2:1.13.5
Maintainer: Fredrik Fornwall @fornwall
Installed-Size: 325 MB
Depends: clang
Homepage: https://golang.org/
Download-Size: 53.3 MB
APT-Sources: https://termux.org/packages stable/main aarch64 Packages
Description: Go programming language compiler

Go lang is 325 MB and I don't have that much free space(My Termux has already occupied 3GB 😢 ), it would be great if you add a aarch64 & arm binaries. :)

`uni list` - `from` and `to` are identical

I have a strong suspicion the error happens here -

uni/uni.go

Lines 531 to 532 in 7084420

"from": fmt.Sprintf(fmtCp, b.Range[0]),
"to": fmt.Sprintf(fmtCp, b.Range[0]),
and should be b.range[1] - but I don't know enough about go to be sure and PR it.

Thank you

installation fails with " t.Cleanup undefined (type *testing.T has no field or method Cleanup)"

hi :)

Env

go version go1.13.8 linux/amd64

Bug

running go build produces the following output and exits with a non-zero code

go: downloading zgo.at/zstd v0.0.0-20210322015326-ca7824321150
go: downloading zgo.at/zli v0.0.0-20210330134141-b5f2a73532d6
go: extracting zgo.at/zstd v0.0.0-20210322015326-ca7824321150
go: extracting zgo.at/zli v0.0.0-20210330134141-b5f2a73532d6
go: downloading golang.org/x/term v0.0.0-20210317153231-de623e64d2a6
go: extracting golang.org/x/term v0.0.0-20210317153231-de623e64d2a6
go: downloading golang.org/x/sys v0.0.0-20210315160823-c6e025ad8005
go: extracting golang.org/x/sys v0.0.0-20210315160823-c6e025ad8005
go: finding zgo.at/zstd v0.0.0-20210322015326-ca7824321150
go: finding zgo.at/zli v0.0.0-20210330134141-b5f2a73532d6
go: finding golang.org/x/term v0.0.0-20210317153231-de623e64d2a6
go: finding golang.org/x/sys v0.0.0-20210315160823-c6e025ad8005
# zgo.at/zli
../../.go/pkg/mod/zgo.at/[email protected]/test.go:84:3: t.Cleanup undefined (type *testing.T has no field or method Cleanup)

Give `print` an option to not sort characters (or make it not sort by default?)

$ uni print 43 42 41 43
     CPoint  Dec    UTF8        HTML       Name (Cat)
'A'  U+0041  65     41          A     LATIN CAPITAL LETT… (Uppercase_Let…)
'B'  U+0042  66     42          B     LATIN CAPITAL LETT… (Uppercase_Let…)
'C'  U+0043  67     43          C     LATIN CAPITAL LETT… (Uppercase_Let…)
'C'  U+0043  67     43          C     LATIN CAPITAL LETT… (Uppercase_Let…)

As you can see, the output from print sorts characters by codepoints rather than outputting them in the order given on the command line. This is a problem for scripted use, and it's inconsistent with how uni identify works. It also seems strange that the codepoints are sorted but duplicates are not removed.

Support listing by derived properties

Unicode has a concept of "derived properties" (the canonical list is here) which are defined as a combination of other character classes. For example, the first entry in that list is:

# Derived Property: Math
#  Generated from: Sm + Other_Math

It would be nice to be able to pass these as arguments to print.

Make v2 installable via "go install"

Currently, go install arp242.net/uni@latest installs the latest 1.x version. Also, https://pkg.go.dev/arp242.net/uni claims version 1.1.1 is the latest.

With the way Go module versioning works, I believe the module name should be updated to arp242.net/uni/v2 now, and this should be reflected in the go.mod. Possibly https://arp242.net/uni/v2 also needs to be enabled, not sure.

See https://golang.org/doc/modules/major-version, https://golang.org/doc/modules/release-workflow#breaking.

Search by utf8?

Recently had a problem with some code I copied from a coworker from Slack. For some reason, lines showed up as having been changed in git even though I couldn't see what was different. Put it through a hex editor and saw e2 80 8b. Went to my usual tool for this type of thing, FileFormat.info, typed that in, and it came up with the right answer, that there were zero-width spaces inserted.

I'd like to be able to use uni to search by utf8 text like that.

FR: use CLDR Character Annotation “Keywords” too when searching characters/emoji

Unicode CLDR Character Annotation has provided a list of keywords for some characters (especially emoji) that is to enhance the search experience of them.

The remaining phrases are keywords (labels), separated by “|”. The keywords plus the words in the short name are typically used for search and predictive typing.
— CLDR Character Annotations description

I would like to suggest to include these keywords too when searching for both Unicode characters and emojis.

A List of these annotations can be found here:
https://www.unicode.org/cldr/charts/36/annotations/romance.html

Computer-friendly character annotation data in XML for each language can be found here: https://github.com/unicode-org/cldr/tree/master/common/annotations

Test failures

With the new release (or current master), I get the following tests failures:

--- FAIL: TestPrint (0.11s)
    --- FAIL: TestPrint/[-q_p_OtherPunctuation] (0.00s)
        uni_test.go:144: wrong # of lines
            out:  593
            want: 588
    --- FAIL: TestPrint/[-q_p_Po] (0.00s)
        uni_test.go:144: wrong # of lines
            out:  593
            want: 588
    --- FAIL: TestPrint/[-q_p_all] (0.11s)
        uni_test.go:144: wrong # of lines
            out:  33797
            want: 32841
--- FAIL: TestEmoji (0.01s)
    --- FAIL: TestEmoji/[e_-tone_mediumlight_bride] (0.00s)
        uni_test.go:217: wrong output
            out:  []string{""}
            want: []string{"👰🏼"}
--- FAIL: TestAllEmoji (0.01s)
    uni_test.go:248: different length: want 3078, got 3195
    uni_test.go:252: 
FAIL
FAIL	arp242.net/uni	0.160s
ok  	arp242.net/uni/isatty	0.004s
?   	arp242.net/uni/terminal	[no test files]
?   	arp242.net/uni/unidata	[no test files]
FAIL

Private use characters are output as replacement characters

When I use uni to query information on a private use character — such as the Apple logo, defined as U+F8FF on Apple computers — uni actually outputs U+FFFD, the replacement character. For example:

$ uni identify 
     CPoint  Dec    UTF8        HTML       Name (Cat)
'�'  U+F8FF  63743  ef a3 bf    &#xf8ff;   <Private Use, Last> (Private_Use)

The command above ends with U+F8FF, but the character output by uni is . Evidence:

$ uni identify  -c -f '%(char)' | hexdump -C
00000000  ef bf bd 0a                                       |....|
00000004

ef bf bd is the UTF-8 encoding for U+FFFD, not for U+F8FF.

clear whitespace at start of args

" search euro" verus "search euro" gives differernt results.

Basically just need to remove white space from start of args. Simple one, but makes things a little less error prone.
Wonder if fuzzing would have picked this up ? anyways


$ uni  search euro
Usage: uni [command] [flags]

uni queries the unicode database. https://github.com/arp242/uni

Flags:
    -f, -format    Output format.
    -a, -as        How to print the results: list (default), json, or table.
    -c, -compact   More compact output.
    -r, -raw       Don't use graphical variants or add combining characters.
    -p, -pager     Output to $PAGER.
    -o, -or        Use "or" when searching instead of "and".

Commands:
    list           List blocks, categories, or properties.
    identify       Identify all the characters in the given strings.
    search         Search description for any of the words.
    print          Print characters by codepoint, category, or block.
    emoji          Search emojis.

Use "uni help" or "uni -h" for a more detailed help.
$ uni search euro
     CPoint  Dec    UTF8        HTML       Name (Cat)
'₠'  U+20A0  8352   e2 82 a0    &#x20a0;   EURO-CURRENCY SIGN (Currency_Symbol)
'€'  U+20AC  8364   e2 82 ac    &euro;     EURO SIGN (Currency_Symbol)
'𐡷'  U+10877 67703  f0 90 a1 b7 &#x10877;  PALMYRENE LEFT-POINTING FLEURON (Other_Symbol)
'𐡸'  U+10878 67704  f0 90 a1 b8 &#x10878;  PALMYRENE RIGHT-POINTING FLEURON (Other_Symbol)
'𐫱'  U+10AF1 68337  f0 90 ab b1 &#x10af1;  MANICHAEAN PUNCTUATION FLEURON (Other_Punctuation)
'🌍' U+1F30D 127757 f0 9f 8c 8d &#x1f30d;  EARTH GLOBE EUROPE-AFRICA (Other_Symbol)
'🏤' U+1F3E4 127972 f0 9f 8f a4 &#x1f3e4;  EUROPEAN POST OFFICE (Other_Symbol)
'🏰' U+1F3F0 127984 f0 9f 8f b0 &#x1f3f0;  EUROPEAN CASTLE (Other_Symbol)
'💶' U+1F4B6 128182 f0 9f 92 b6 &#x1f4b6;  BANKNOTE WITH EURO SIGN (Other_Symbol)

FZF integration

Adding an integration to FZF could make this tool replace emoji-fzf. I'm up for implementing this and opening a PR after exams finish.

identify: Uni is discarding Space, Tab and Newline characters

$ printf "a b\tc\n" | hexyl 
┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
│00000000│ 61 20 62 09 63 0a       ┊                         │a b_c_  ┊        │
└────────┴─────────────────────────┴─────────────────────────┴────────┴────────┘

$ printf "a b\tc\n" | uni i
     CPoint  Dec    UTF8        HTML       Name (Cat)
'a'  U+0061  97     61          &#x61;     LATIN SMALL LETTER A (Lowercase_Letter)
'b'  U+0062  98     62          &#x62;     LATIN SMALL LETTER B (Lowercase_Letter)
'c'  U+0063  99     63          &#x63;     LATIN SMALL LETTER C (Lowercase_Letter)


Add a -v (version) flag

Hi Martin,

I stumbled upon Uni again, after using it quite awhile ago. I was thinking, I wonder if that's been updated?

But uni -v just returns standard help, and not a release or version number, as per convention.

So this issue serves to say, -v is useful to users who come back to uni after moderate period of time.

I may submit a PR for this, but just leaving this here in the meantime. Proost!

Custom columns

Thanks for writing uni, I find it very useful!

I've found myself trying to make an easy reference to Unicode-related data over time, such as

  • Vim digraphs
  • X11 keysyms

and more will probably be added in the future. I could put each one in Markdown format but it would be nice if they could integrate with uni.

Hardcoding these into the program itself would probably be feature creep, but a way to integrate different data would be really useful.

Use embeded SQLite?

Just a random idea I'll jot down: what if instead of using Go structs in unidata for all the codepoints etc. we would use a SQLite database compiled in the binary?

This would be more structured, easier to reuse, and most of all: easy to query. While the current CLI is convenient, sometimes you want to know things like "which characters are full-width"?

Need to look at the feasibility of this, and how it will affect performance, binary size, etc. Also SQLite requires cgo which makes the builds a bit harder, but this may be a nice use case for e.g. https://modernc.org/sqlite

Possible to add support for detecting fonts containing the specified character?

This might be outside the scope of what you have in mind, and there may not be sufficient libraries in Go to make this something that is possible to implement in a reasonable amount of time (and it would probably only work on linux / require different solutions for different platforms), but... It would be really nice if it were possible to also detect fonts that contain the queried character, e.g. something like the perl example from this Polybar wiki:

use strict;
use warnings;
use Font::FreeType;
my ($char) = @ARGV;
foreach my $font_def (`fc-list`) {
    my ($file, $name) = split(/: /, $font_def);
    my $face = Font::FreeType->new->face($file);
    my $glyph = $face->glyph_from_char($char);
    if ($glyph) {
        print $font_def;
    }
}

Just a thought..

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.