GithubHelp home page GithubHelp logo

verbatim delimiter about pg HOT 16 CLOSED

openwebwork avatar openwebwork commented on August 20, 2024
verbatim delimiter

from pg.

Comments (16)

dpvc avatar dpvc commented on August 20, 2024

How about ASCII 127 (U+007F, DELETE), since the DELETE character is hard to get into a student's answer string (pressing delete causes an action, rather than inserts the character)?

Alternatively, lib/Value/String.pm could be modified to select a delimiter character based on the content of the string it is typesetting. E.g., find the smallest n > 32 where chr(n) is not in the string to be delimited, and use that. One could split the string into an array of characters, sort it (discarding duplicates), and find the first index i where the i-th character is not chr(i + 33) and use char(i + 32) as the delimiter.

For example

sub verb {
  my $self = shift;
  my $string = shift;
  my $i = 33;                                         # starting ASCII character to look for
  my @has{split(//, $string)} = ();                   # hash with keys equal to the characters in the string
  my @c = num_sort(map {ord($_)} (keys %has));        # sorted list of (unique) character numbers
  while ($c[0] < $i) {shift(@c)};                     # remove control characters and space
  while (shift(@c) == $i) {$i++}                      # find first unused character number
  return "\\verb" . chr($i) . $string . chr($i);
}

should do it. This will usually end up with ! as tech delimiter, followed by " if ! is used.

from pg.

Alex-Jordan avatar Alex-Jordan commented on August 20, 2024

I gather from this reference that ASCII 127 is also not OK to use in XML (or more specifically in an attribute which has the most restrictions in general.)

We do something similar in PreTeXt to what you describe for choosing a delimiter with the aim to replace the 0x85 with something more friendly to more output forms. (That's after the XML validation takes place, so it's not the case that we can repeat this with 0x1F.) There are a few complications, like * can't be used as the delimiter for \verb, and we do not want to not XML control characters. There is the unlikely scenario to think about where every usable character is in the string to be typeset.

I'd be inclined to go that way, except I want to check about using some other unicode character first. I don't understand the character encoding issues much more than surface level. Is a time coming when we could just do like the following?

#
#  Mark a string to be display verbatim
#
sub verb {shift; return "\\verb🕸️".(shift)."🕸️"}

I guess WeBWorK hardcopy would need to switch to use xelatex, but should it be moving that direction anyway to support more characters in PG problems?

from pg.

dpvc avatar dpvc commented on August 20, 2024

I gather ... that ASCII 127 is also not OK to use in XML.

OK, the ranges seemed to indicate that U+007F was OK, but the non-restricted list seems to indicate not. Too bad.

There are a few complications, like * can't be used as the delimiter

True. I suppose you could use

while (shift(@c) == $i || $i == 42) {$i++}

to avoid the star.

and we do not want to not XML control characters

I had an earlier version that used foreach $i (33..126) {...}, but changed it. That would have limited to ASCII characters, but you are right, this doesn't. A little more work could fix that.

There is the unlikely scenario to think about where every usable character is in the string to be typeset.

Yes, I though of that, but no matter what you end up doing along these lines, that will be a possibility, so you are going to crash one way or another.

It looks like U+0085 is allowed, so chr(0x85) would have been a good choice for XML. But I guess Geoff's concern is that this would be a two-character string in UTF-8, though I'm not sure what the problem is with that. It seems the change to chr(0x1F) was to keep it one character.

Another possibility would be to use U+000D (RETURN), which LaTeX will allow as a delimiter for \verb, and remove any \r characters in the string (or handle them differently). The \verb macro can't actually handle arbitrary strings; in fact, the string can't contain newlines, tabs are treated as spaces, and returns are ignored, so to really handle arbitrary strings, assuming you want \n (and \r) to be treated as line breaks, you would have to process them specially anyway.

The ArbitraryString context actually does handle \n (but not \r), via

sub quoteTeX {
  my $self = shift; my $s = shift;
  return $self->verb($s) unless $s =~ m/\n/;
  my @tex = split(/\n/,$s);
  foreach (@tex) {$_ = $self->verb($_) if $_ =~ m/\S/}
  "\\begin{array}{l}".join("\\\\ ",@tex)."\\end{array}";
}

so that \n produces line breaks (and can be displayed by MathJax). You could do something similar that handles both \n and \r as line breaks, or just removes \r and handles \n as above, then use \r as the delimiter in the verb() method. This would guarantee that it was a proper delimiter, and still gets you line breaks.

If using literal returns in the attributes is problematic (though it seems to work in my hand testing), then you could perhaps encode it as &#xC; in the attributes:

<tag attr="\verb&#xC;abc&#xC;">

This might be a possible solution.

from pg.

Alex-Jordan avatar Alex-Jordan commented on August 20, 2024

One quick note. While an XML file can have \n and \r in the file, they are not allowed in attribute values. So it would be the same validation issue to use them as to use 0x1F.

from pg.

dpvc avatar dpvc commented on August 20, 2024

I couldn't find the specification for what's allowed in an attribute. Can you provide a link?

The best I can find is the definition for AttValue, which seems to indicate that there are no restrictions other than no literal & or <, but any other valid XML character. If you track down the meaning of reference, these do seem to include the three special control characters, #x9, #xA, and #xD, the later being the one I suggested. So I'm not sure where you are getting that they are not allowed in the attribute values.

If attributes are limited in what they can support, are you considering changing to using a container whose contents is the value instead? Essay answers and ones for ArbitraryString certainly can include newlines as part of the student and correct answers. How are these handled in your attributes?

from pg.

Alex-Jordan avatar Alex-Jordan commented on August 20, 2024

I think you are right. Sorry, I was mixing up memories. In July, Sean Fitzpatrick and I spent some time thinking about this, and characters \t, \n, and \r were singled out as the only characters in range 0--31 that were legal in XML. But we ruled them out as delimiters. I was mis-remembering why we ruled them out, attributing that (no pun intended) to illegal attribute values. But now I'm remembering it's because of the behavior of \verb. Testing on a simple .tex file, \t just plain doesn't work for a delimiter (it causes a compilation error). With the other two, it works in the sense that it doesn't throw an error. But it gobbles up space. I'm finding that This is \verb\nverbatim\n text comes out as "This is verbatimtext" losing the space that follows the second delimiter. That's with pdflatex and also xelatex, not sure about MathJax.

from pg.

dpvc avatar dpvc commented on August 20, 2024

'm finding that This is \verb\nverbatim\n text comes out as This is verbatimtext losing the space that follows the second delimiter.

Try This is {\verb\nverbatim\n} text; that should preserve the space following it.

from pg.

Alex-Jordan avatar Alex-Jordan commented on August 20, 2024

Wonderful. That passes my testing too.

Of the proposals, using \r as the delimiter (and removing any \r from the string being typeset) sounds nicest to me.

Should I poll people for red flags? @mgage , @goehle , @taniwallach ? I think you can skip reading this whole thread. The proposal is for this code in lib/Value/String.pm:

#
#  Mark a string to be display verbatim
#
sub verb {shift; return "\\verb".chr(0x1F).(shift).chr(0x1F)}

to become (where 0x0D = CR = carriage return = \r)

#
#  Mark a string to be display verbatim
#
sub verb {shift; return "\\verb".chr(0x0D).(shift).chr(0x0D)}

except it would also process the input string to strip any 0x0D that somehow ended up in the string. My understanding is that \r is never used alone as a line break (not since very old Mac OS's); Windows uses \r\n but stripping the \r will leave \n, which Windows etc. will still understand.

If no one sees a red flag, I will open a PR for this.

from pg.

dpvc avatar dpvc commented on August 20, 2024

You could include the braces in the verb() method as well; this won't hurt the output on line or in hard copy, and that way, you don't need any special processing in PreTeXt.

Also, note that \n is not valid within \verb, so you will still want to do something about that if you intend to support newlines within String objects (the current String doesn't, but ArbitraryString does). If so, then the code I gave above that splits the string on \n and uses an align environment to get multi-line output could be used to make String completely general.

from pg.

taniwallach avatar taniwallach commented on August 20, 2024

I read the thread.

I would avoid using any multi-byte characters yet in latex generated by WeBWorK by default unless it comes in from a UTF-8 encoded problems. Simply not everyone is using xelatex or something else which expects UTF-8 encoded tex files, and such an approach is likely to cause trouble on many sites who are in no rush to support UTF-8.

I strongly support the proposal to change to \r as the delimiter, as that is certainly an option which should cause no or minimal issues on the TeX side of things, and should not cause any UTF-8 issues.

I do think it might be advisable to preprocess the string being put inside the verbatim \verb command to make sure there are no occurrences of \r (maybe just replace them by space characters) and to add the extra braces around the \verb block to prevent any unusual surprises.

I cannot speak to the history behind the choice of ASCII 31 as the new \verb delimiter beyond the minimal comment that Geoff left in 539406c
and the fact the 0x8f is reserved by UTF-8 for continuation bytes: https://en.wikipedia.org/wiki/UTF-8 but I am guessing that it was assumed that this character which is valid in UTF-8 was not likely to ever appear in anything passed into the verbatim code.

BTW I needed to add \catcode^_=12into thehardcopyPreamble.texfiles I made for XeLaTeX support with Hebrew, as the choice of0x1F= ASCII 31 would otherwise trigger compilation problems, Using\r` would avoid that.

from pg.

Alex-Jordan avatar Alex-Jordan commented on August 20, 2024

Writing up a PR for this.

Davide, the ArbitraryString snippet breaks the input string into an array split at instances of \n, and joins it back together using \\\\. For the current purpose, is there a reason not to do something like:

sub verb {
  shift;
  my $verb = shift;
  $verb =~ 's/\r/ /g';
  $verb =~ 's/\n/\\\\/g';
  return "{\\verb\r$verb\r"};
}

where regex does the replacement of \n instead?

from pg.

dpvc avatar dpvc commented on August 20, 2024

is there a reason not to do something like ...

Yes. Because the string will be printed verbatim, the \\ will be shown as \\, not interpreted as a line break. That is why the ArbitraryString context goes through the work that it does to get a multi-line display. Also, while \\ will cause a line break anywhere in MathJax's math mode, that is not the case in actual LaTeX. So a multi-line structure is needed.

from pg.

Alex-Jordan avatar Alex-Jordan commented on August 20, 2024

Oh I see. I lost how the ArbitaryString re-applies verb() to the split pieces.

from pg.

Alex-Jordan avatar Alex-Jordan commented on August 20, 2024

I opened #422.

from pg.

taniwallach avatar taniwallach commented on August 20, 2024

@Alex-Jordan - I merged in #422 and unless you want to leave the issue open to track the multi-line case for future attention - I think this issue can be closed.

from pg.

taniwallach avatar taniwallach commented on August 20, 2024

#424 patched the changes in #422 to use different delimiters in different contexts.

I'm closing the issue. We can address the special case of multi-line strings in the future, should the need arise.

from pg.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.