Comments (6)
Good point.
In my case German would be interesting.
from lambeq.
Hi @nlpirate and @chirico85,
Our parser works based on the Combinatory categorial grammar (CCG) formalism, which is a bit different to context-free grammar (CFG). While CFGs are generative, i.e produce valid sentences, CCG models are used to infer grammar trees from well-formed sentences. Hence, CCGs are parsable, which we leverage using our BobcatParser
.
Bobcat works in two stages: First, we apply a BERT model to determine the most likely CCG types per word. The outcome of that step is a weighted list of the k
most likely types. After that, we apply a deterministic chart parser that aims to find the most probable CCG reduction tree from the possible word types. You can read more about our parser here.
As you can see, we use a statistical model for the first step, which needs to be trained on data. The data we use to train Bobcat is the CCGbank, which is a translation from the Penn treebank. Hence, to support multiple languages, we need to have such CCG banks for each language, which might require a lot of work.
However, if you don't require a fully-comprehensive CCG parser, you can always create your own (deterministic) parser based on our abstract CCGParser
class:
lambeq/lambeq/text2diagram/ccg_parser.py
Line 34 in 70a1fe8
I hope this helps!
from lambeq.
Also, DisCoPy supports CFG grammars (https://docs.discopy.org/en/0.5/discopy/grammar.cfg.html?highlight=cfg#module-discopy.grammar.cfg), therefore CFGs are also supported by lambeq. Furthermore, CCGs can express CFGs, i.e. also be used to generate sentences.
from lambeq.
I think there was an effort for creating an Italian CCGBank in the past (Turin univ.?) , not sure however what happened with that project.
from lambeq.
yes, indeed it exists and is available on the site (tut-ccg), but the annotation is different than the one used in lambeq
from lambeq.
Not sure if there is anything more to say here, since as @Thommy257 explained above, without an annotated corpus like CCGBank you can't train a statistical parser. However we are very much interested in adding to lambeq support for languages other than English (and we welcome any community work towards this goal), so this issue will be converted into a Discussion to stay alive.
from lambeq.
Related Issues (20)
- Method or class for composing more than 1 free wires into one HOT 1
- Add more tutorials and example notebooks in the documentation HOT 2
- Ansatz for performing amplitude encoding - Enhancement HOT 4
- Bobcat fails with extra space tokens HOT 3
- BobCat fails to parse with extra addition of "the" to a sentence. HOT 2
- lambeq pytest: No module named lambeq.version HOT 2
- IQPAnsatz: shape error as changing number of qubits for atomic types HOT 4
- Lambeq installation Error HOT 2
- Error whem training Classical Pipeline with Spider Ansatz HOT 4
- Key error in Accuracy function HOT 2
- PicklingError HOT 6
- Anastz Customization HOT 5
- TypeError when construct quantum circuits for multi-classification task HOT 11
- Python 3.12 Type Error in Mac Environment when Loading Library HOT 5
- PennyLane training problem HOT 2
- parameterization tutorial example failing HOT 5
- [unitaryHACK 2024] Implement ASCII drawing for all lambeq diagrams
- [unitaryHACK 2024] Improve RemoveCupsRewriter
- [unitaryHACK 2024] Add frames in lambeq HOT 1
- [unitaryHACK 2024] Make PytorchModel work with quantum circuits HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lambeq.