https://github.com/udapi/udapi-python/blob/master/udapi/block/write/textmodetrees.py#L20
We need some similar feature for printing a sentence as a tree. This trees are very useful for visualize the data:
─┮
╰─┮ Gosto VERB root
│ ╭─╼ de ADP mark
├─┾ levar VERB xcomp
│ │ ╭─╼ a ADP case
│ ├─┶ sério NOUN xcomp
│ │ ╭─╼ o DET det
│ │ ├─╼ meu DET det
│ ╰─┾ papel NOUN obj
│ │ ╭─╼ de ADP case
│ ╰─┾ consultor NOUN nmod
│ ╰─╼ encartado VERB acl
╰─╼ . PUNCT punct
alternatives:
convert conllu to tex and compile it
udapy write.Tikz attributes=form,lemma,upos < my.conllu > my.tex
If needed I can add more features to
https://github.com/udapi/udapi-python/blob/master/udapi/block/write/tikz.py
e.g. printing multiword tokens and some default colors.
Of course, for camera-ready pictures a bit of manual fine-tuning of the layout will be needed.
You can try also
udapy write.TextModeTrees color=1 < my.conllu | less -R
output above.
There is a button for SVG export and you can use
inkscape -D -z --file=image.svg --export-pdf=image.pdf --export-latex
to export it to pdf and tex:
\begin{figure}
\centering
\def\svgwidth{\columnwidth}
\input{image.pdf_tex}
\end{figure}
Can I outout to LaTeX the
second command :
udapy write.TextModeTrees color=1 < my.conllu | less -R
Yes, but without the colors:
echo '\begin{verbatim}' > my.tex
udapy write.TextModeTrees < my.conllu >> my.tex
echo '\end{verbatim}' >> my.tex
and then use
\input{my.tex}
It would not be difficult to write a subclass of write.TextModeTrees
which would use some LaTeX markup like \lemma{I}, \upos{PRON}
instead of the ANSI color codes. So then you could define the colors&style
\def\lemma#1{\textcolor{red}{#1}}
If you are interested, I can implement it.
what I really missing is a simple way to display a fragment of a sentence
Now, I've added a Udapi block which allows to delete all nodes in a document
except for the subtrees matching a given condition, e.g.
udapy -s util.Filter subtree='node.upos == "NOUN"' < in.conllu > filtered.conllu
will print only noun phrases.
So you can use
udapy util.Filter subtree='node.form == "dog"' write.TextModeTrees < in.conllu
to get the subtree(s) headed by word "dog", or
udapy util.Filter subtree='node.ord == 2 and node.root.address() == "3"' write.TextModeTrees < in.conllu
to get the subtree headed by the second word in tree with sent_id = 3.
Yet another alternative to Tikz, Html and TextModeTrees would be to
use paste the CoNLL-U to the online Brat rendered
(e.g. click "edit" here http://universaldependencies.org/sandbox.html#pirate-example).
But then you would need to zoom, take a screenshot and include it as bitmap (png) into LaTeX,
which is not optimal.
If needed I can implement write.Sdparse which would print something like
Dogs run
nsubj(run-2, Dogs-1)
which would allow easier manual editing than the CoNLL-U format.