Quite a few pieces of ACLPUB are dedicated to generation of HTML, but as far as I know

This functionality more properly belongs in <a href="https://github.com/acl-org/acl-pu

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hm, the plot thickens. I think I would suggest (if <a class="user-mention notranslate"

Remove HTML generation code? about aclpub HOT 7 CLOSED

acl-org commented on July 28, 2024

Remove HTML generation code?

from aclpub.

Comments (7)

davidweichiang commented on July 28, 2024 1

I think that sounds good. But I'm still unsure how to get all the different pieces of the system to share code when possible.

(When I tag @desilinguist does he see it, or does he have to be connected to the repository somehow?)

from aclpub.

mjpost commented on July 28, 2024

This functionality more properly belongs in acl-pub and could be removed from here, but we should make sure that the current round of pub chairs, at least, since we might be interrupting something. Tagging the ones whose handles I could find:

NAACL: @slukin, Alla Roskovskaya
ACL: @ivulic, @kgimpel, Douwe Kiela, Shay Cohen

from aclpub.

slukin commented on July 28, 2024

@desilinguist and I just discussed this the other day actually:

From EMNLP 2018, he was expecting .csv files with the author index, session chair information, anthology paper id mappings, and abstracts. I do not know the script that was used to create these .csv files for him, so we decided for this year that Nitin would write his own scraper from these .html files that live in the cdom/ generated part of the proceedings.tgz (authors.html, index.html, and program.html).

@davidweichiang, if there is another way to obtain this information/create these .csv relevant files without using these .html files, it's fine to remove that code from our perspective, if we can have that code to create the .cvs files either be generated directly when we pull the proceedings.tgz or run after it's been pulled.

from aclpub.

davidweichiang commented on July 28, 2024

Hm, the plot thickens. I think I would suggest (if @desilinguist has not written this scraper already) that these .csv files be generated straight from the db, meta, and abstracts files, which are plain text and simpler to deal with than HTML. That would smooth the road to removing HTML generation from ACLPUB.

However, the db, meta, and abstracts files will contain TeX control sequences. I would suggest, if it's practical, that Nitin's scraper use the same TeX-to-XML/HTML code that @mbollmann is writing for the Anthology. Is that possible?

(Currently, there are about a half-dozen pieces of code across three Git repositories that do this same task, and each covers a different subset of TeX. I'm hoping that they can be unified into one really good one.)

from aclpub.

slukin commented on July 28, 2024

Here is some information from Nitin: "The only .csv file I created via scraping was anthology-mapping.csv. The rest were provided to me by the EMNLP program chairs. I haven't started to work on any script yet." He also said that the anthology-mapping.csv was created by scraping one of these .html files.

So it seems we can shift towards removing the html generation to directly generate the files into an easier-to-read format, like separate .csv files or a single YAML file that can be parsed (a suggestion from Nitin). For a concrete example, the .csv files that Nitin used for EMNLP are here: https://github.com/emnlp2018/emnlp2018.github.io/tree/master/scripts/data

from aclpub.

desilinguist commented on July 28, 2024

I am the current owner of the acl-org organization so I see everything. Bwahahaha. Sorry :)

from aclpub.

rrgerber commented on July 28, 2024

I would suggest leaving the HTML generation code in, since it is very useful for people to download the "all" target and check that everything is OK by looking at the CD rom directory.

from aclpub.

Remove HTML generation code? about aclpub HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs