GithubHelp home page GithubHelp logo

Unicode source support about ciao HOT 14 CLOSED

ciao-lang avatar ciao-lang commented on August 23, 2024
Unicode source support

from ciao.

Comments (14)

jfmc avatar jfmc commented on August 23, 2024 2

Couldn't this issue be handled by a Prolog linter (or other general purpose tool) that warns about ambiguous character representations? I'd not like to sacrifice Unicode source support everywhere.

In any case, Ciao design can coexists (up to some point) with multiple de-facto or de-jure standards. Modules declared with module/3 accept a list of packages in the 3rd argument which can enable/disable many language extensions. I think this is a very reasonable approach to experiment and make the language evolve in a controlled way. Ciao never gave up about standards and compatibility!

Note that programs using module/2 and compiled with the --iso flag are expected to be ISO compliant (any non-expected behavior can be considered a bug).

from ciao.

jfmc avatar jfmc commented on August 23, 2024 1

Thank you! We got some initial support that manages to work with these examples. It still not in the repo, we'll commit as soon as possible:

?- конфликт(О1, О2, б), цвет(О1, Ц, б).

О1 = 2,
О2 = 4,
Ц = 'синий' ? n

О1 = 4,
О2 = 2,
Ц = 'синий' ? n

Some code in razbor can be at least read (although it may not run due to incompatibilities with SWI).

from ciao.

mherme avatar mherme commented on August 23, 2024 1

from ciao.

jfmc avatar jfmc commented on August 23, 2024

Thanks for the suggestion. Currently there is no utf8 support for identifiers. We could start with simple support (similar to http://lua-users.org/wiki/UnicodeIdentifers, which treats any code >=128 as alphanumeric) but it will not be able to distinguish between Prolog variables (uppercase) and constants (lowercase).

This report http://www.unicode.org/reports/tr31/#Case_and_Stability deals with this problem in several programming languages. Perhaps porting some existing utf8 uppercase check code (e.g. https://golang.org/src/unicode/letter.go?s=5143:5168#L170) would be enough in our case.

We'd really appreciate some source code for testing if support for unicode identifiers is as expected (both constants and variables).

from ciao.

suhr avatar suhr commented on August 23, 2024

Well, personally I'm more interested in unicode ops rather than unicode names. There's a usage example: https://github.com/razbor-rs/razbor/blob/types/doc/razbor.pl#L148.

I can test Cyrillic variable names though.

from ciao.

jfmc avatar jfmc commented on August 23, 2024

Could you provide some examples of variable names? We're finishing some preliminary support for UTF8 identifiers and we'd like to test it a little more before it is released. Thanks!

from ciao.

suhr avatar suhr commented on August 23, 2024

This is a translation of https://www.cpp.edu/~jrfisher/www/prolog_tutorial/2_1.html into Russian: https://gist.github.com/suhr/ddd5c66263e47dbddeb534ec0c665663

from ciao.

jfmc avatar jfmc commented on August 23, 2024

We've pushed the changes that enable UTF8 source code. I'm closing the issue but please feel free to reopen if needed.

from ciao.

triska avatar triska commented on August 23, 2024

One important guarantee that is ensured by the Prolog ISO standard is that we can print the source code of a program (I mean physically print it, on a piece of paper) or take a screenshot, and see everything that is necessary to type it from the paper to obtain the exact same program.

For example, the standard disallows layout other than space in quoted tokens. For instance, otherwise " \n" (with a real newline) would be indistinguishable from any other sequences of spaces.

Is this property ensured by the new support of UTF-8 source code? For example, what about NO-BREAK SPACE (codepoint 160)?

I am very interested in how you addressed this, because other systems are now also running into these questions. For example, please see: mthom/scryer-prolog#459

This is one of the implementation questions I would like to address in our upcoming WG17 meeting, to ensure portability between different Prolog systems. I hope you are interested in discussing this topic.

from ciao.

jfmc avatar jfmc commented on August 23, 2024

Thanks Markus. That is an interesting point but I wonder if that is even possible. See for example:

?- X = '\u0043', Y = '\u0421'.

X = 'C',
Y = 'С' ?

One is the Latin representation of C, the other the Cyrillic. They look exactly the same on screen and on paper. Once unicode is accepted in strings, quoted atoms, or program identifiers, funny things like:

?- С=a,C=b.

C = b,
С = a ?

may happen (it works if you copy-paste the text above both in SWI and Ciao). Similar behavior appears in Python:

>>> a="С";b="C"
>>> a==b
False

from ciao.

triska avatar triska commented on August 23, 2024

In Scryer Prolog, I get:

?- С=a,C=b.

caught: error(syntax_error(unexpected_char),read_term/3:0)

SWI has given up support for the standard several years ago. I hope that Ciao does not go in the same direction, because that would mean that programs written for Ciao will not be portable to Prolog systems.

from ciao.

pmoura avatar pmoura commented on August 23, 2024

Porting Prolog programs is threading water. Writing portable code that doesn't require any porting and can be run as-is in any compliant system should be the goal.

from ciao.

pmoura avatar pmoura commented on August 23, 2024

Hi Manuel,

check compliance mechanically as much as possible

I'm already running Logtalk's Prolog standards compliance suite (134 test sets; 1678 tests) automated using the current git versions of Ciao and Logtalk, thanks to the great work of @jfmc There are currently a few bugs/roadblocks (one with a fix already available) that José expects to take care as earlier as today.

from ciao.

mherme avatar mherme commented on August 23, 2024

from ciao.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.