GithubHelp home page GithubHelp logo

google-deepmind / ithaca Goto Github PK

View Code? Open in Web Editor NEW
541.0 17.0 55.0 2.44 MB

Restoring and attributing ancient texts using deep neural networks

License: Apache License 2.0

Python 17.79% Jupyter Notebook 82.11% Shell 0.10%

ithaca's Introduction

Ithaca logo

Restoring and attributing ancient texts using deep neural networks

Yannis Assael1,*, Thea Sommerschield2,3,*, Brendan Shillingford1, Mahyar Bordbar1, John Pavlopoulos4, Marita Chatzipanagiotou4, Ion Androutsopoulos4, Jonathan Prag3, Nando de Freitas1

1 DeepMind, United Kingdom
2 Ca’ Foscari University of Venice, Italy
3 University of Oxford, United Kingdom
4 Athens University of Economics and Business, Greece
* Authors contributed equally to this work


Open In Colab

Ancient History relies on disciplines such as Epigraphy, the study of inscribed texts known as "inscriptions", for evidence of the thought, language, society and history of past civilizations. However, over the centuries many inscriptions have been damaged to the point of illegibility, transported far from their original location, and their date of writing is steeped in uncertainty. We present Ithaca, the first Deep Neural Network for the textual restoration, geographical and chronological attribution of ancient Greek inscriptions. Ithaca is designed to assist and expand the historian’s workflow: its architecture focuses on collaboration, decision support, and interpretability.

Restoration of damaged inscription
Restoration of damaged inscription: this inscription (IG I3 4B) records a decree concerning the Acropolis of Athens and dates 485/4 BCE. (CC BY-SA 3.0, WikiMedia)

While Ithaca alone achieves 62% accuracy when restoring damaged texts, as soon as historians use Ithaca their performance leaps from 25% to 72%, confirming this synergistic research aid’s impact. Ithaca can attribute inscriptions to their original location with 71% accuracy and can date them with a distance of less than 30 years from ground-truth ranges, redating key texts of Classical Athens and contributing to topical debates in Ancient History. This work shows how models like Ithaca can unlock the cooperative potential between AI and historians, transformationally impacting the way we study and write about one of the most significant periods in human history.

Ithaca architecture
Ithaca's architecture processing the phrase "δήμο το αθηναίων" ("the people of Athens"). The first 3 characters of the phrase were hidden and their restoration is proposed. In tandem, Ithaca also predicts the inscription’s region and date.

References

When using any of this project's source code, please cite:

@article{asssome2022restoring,
  title = {Restoring and attributing ancient texts using deep neural networks},
  author = {Assael*, Yannis and Sommerschield*, Thea and Shillingford, Brendan and Bordbar, Mahyar and Pavlopoulos, John and Chatzipanagiotou, Marita and Androutsopoulos, Ion and Prag, Jonathan and de Freitas, Nando},
  doi = {10.1038/s41586-022-04448-z},
  journal = {Nature},
  year = {2022}
}

Ithaca inference online

To aid further research in the field we created an online interactive python notebook, where researchers can query one of our trained models to get text restorations, visualise attention weights, and more.

Ithaca inference offline

Advanced users who want to perform inference using the trained model may want to do so manually using the ithaca library directly.

First, to install the ithaca library and its dependencies, run:

pip install .

Then, download the model via

curl --output checkpoint.pkl https://storage.googleapis.com/ithaca-resources/models/checkpoint_v1.pkl

An example of using the library can be run via

python inference_example.py --input_file=example_input.txt

which will run restoration and attribution on the text in example_input.txt.

To run it with different input text, run

python inference_example.py --input="..."
# or using text in a UTF-8 encoded text file:
python inference_example.py --input_file=some_other_input_file.txt

The restoration or attribution JSON can be saved to a file:

python inference_example.py \
  --input_file=example_input.txt \
  --attribute_json=attribute.json \
  --restore_json=restore.json

For full help, run:

python inference_example.py --help

Dataset generation

Ithaca was trained on The Packard Humanities Institute’s "Searchable Greek Inscriptions" public dataset. The processing workflow for generating the machine-actionable text and metadata, as well as further details on the train, validation and test splits are available at I.PHI dataset.

Training Ithaca

See train/README.md for instructions.

License

Apache License, Version 2.0

ithaca's People

Contributors

bshillingford avatar hawkinsp avatar iassael avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ithaca's Issues

Folium Tiles Water Color / Deprecated

Hi all

I noticed that the Folium team has removed the Stamen Water Color map from their tiles. This results in an unwanted result within the geographical reference indication (tl;dr: the map is just blank grey drab).

The discussion about the removal by the Folium team can be found here.

Suggested quick-fix:

folium.Map (tiles="openstreetmap")

(I prefer the old water color style, so I will try to look more closely to the changelogs of the Folium and XYZ teams, but short term this will do the trick).

Understanding Ithaca's Evaluation/Checkpointing System

Hello all,

I'm seeking clarification on how Ithaca's evaluation/checkpointing system works.

From my understanding, the evaluate function should calculate the evaluation metrics and store the checkpoint's pickle file on disk. However, I'm uncertain about when this function is called.

Currently, when I execute the code, I notice that it just generate a log file containing the training loss and the accuracy. However, it doesn't include information about the validation loss, nor a checkpoint is produced.

Also when I try to run:
python3 experiment.py --config=config.py --jaxline_mode=eval --logtostderr
it says:
Checkpoint None invalid or already evaluated, waiting.

Thank you for your time and assistance.

Best regards,
Alessandro

add web demo/model to Huggingface

Hi, would you be interested in adding ithaca to Hugging Face? The Hub offers free hosting, and it would make your work more accessible and visible to the rest of the ML community. There is already a deepmind organization on Hugging Face (https://huggingface.co/deepmind) or it can be under a personal account similar to github.

Example from other organizations:
Keras: https://huggingface.co/keras-io
Microsoft: https://huggingface.co/microsoft
Facebook: https://huggingface.co/facebook

Example spaces with repos:
github: https://github.com/salesforce/BLIP
Spaces: https://huggingface.co/spaces/akhaliq/BLIP

github: https://github.com/facebookresearch/omnivore
Spaces: https://huggingface.co/spaces/akhaliq/omnivore

and here are guides for adding spaces/models/datasets to your org

How to add a Space: https://huggingface.co/blog/gradio-spaces
how to add models: https://huggingface.co/docs/hub/adding-a-model
uploading a dataset: https://huggingface.co/docs/datasets/upload_dataset.html

Please let us know if you would be interested and if you have any questions, we can also help with the technical implementation.

ERROR installing ithaca

When I try to install itacha I get this error.

ERROR: Could not find a version that satisfies the requirement jaxlib>=0.1.37 (from chex) (from versions: none)
ERROR: No matching distribution found for jaxlib>=0.1.37

Colab Runtime Error - Jaxlib version mismatch (strikes again)

Hi all

When running the Colab Notebook, at the import step, the Jax and Jaxlib modules are causing an error.

RuntimeError: jaxlib is version 0.3.25, but this version of jax requires version >= 0.4.2.

We've had a similar issue around the 12th of September with Jax, Jaxlib and Flax. Probably an update in Colab or one of those dependencies has made this resurface.

In the meantime, users who are experiencing issues can circumvent this error using following janky solution:

!pip install -qqq jax==0.4.2
!pip install -qqq jaxlib==0.4.2

Limitation: This will run the beam search on the CPU of the Colab, thus increasing inference times.
You can make the GPU kick in again by using instead:

!pip install -qqq jax==0.4.2
!pip install -qqq jaxlib==0.4.2
!pip install -qqq "jax[cuda11_cudnn82]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

I have used the following examples to test inference using the aforementioned janky solutions:

ωρανια.   αστ?ρας ηρε?νησα σο??ι φρ?νι π??ρι τε εοι?ος   ουν?μα εχ?. λε??μαι δε η δι?ς ουρα?ιη.   ονε?του.

and

λ??ουσιν ἃ θέλου??ν λεγέ??σαν οὐ μέλι ??ι σὺ φίλι με συνφέ?ι σοι

Kind regards!

Colab Error - AttributeError: module 'jaxlib.pocketfft' has no attribute 'pocketfft'

Hi,

I noticed an error propping up in the Colab referenced in this GitHub page, and in the version I've been using with my student.
The error originates in the "Imports" code cell of the reference Colab Notebook and is (probably) due to an upgrade in the Colab dependencies.

The error occurs when running:

import flax

The error:

AttributeError: module 'jaxlib.pocketfft' has no attribute 'pocketfft' 

Solution (a bit janky but seems to work):

!pip install jax==0.2.21
!pip install jaxlib==0.1.69

I still have to do some speed tests to check if this approach has no adverse effects.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.