Comments (3)
Thanks I was thinking to do a ticket with those :-D
I was thinking to add also the new speech of the president.
About Trilussa we need to check, as Poet he was wiring a lot of stuff in roman dialect and for our needs is not suitable.
Anyway we should check for content that is like discussion or wrote in first person, so journalism stuff is perfect.
The file to add those stuff is for wikisource: https://github.com/MozillaItalia/DeepSpeech-Italian-Model/blob/master/MITADS/assets/wikisource_books.txt
In the italian wikimedia community they are discussing to move that new stuff in the website so we have to wait a bit.
from deepspeech-italian-model.
I checked also for gutenberg, we can add those new books:
- 34983
- 49231
from deepspeech-italian-model.
Facebook in 2020 released the cleaned common-crawl-data which they used to train XLM-R Model
http://data.statmt.org/cc-100/
Italian dataset - 7.8G
http://data.statmt.org/cc-100/it.txt.xz
from deepspeech-italian-model.
Related Issues (20)
- MITADS - Transcript roman numbers HOT 4
- Readme improvements
- Not clear how to do a simple speech recognition HOT 9
- deepspeech - lm.binary and trie: how to? HOT 4
- Create the "contributing" file HOT 1
- Experiment on creating a new dataset audio+text HOT 3
- Voxforge bad samples, help for cleaning up HOT 3
- MITADS - convert numbers to their literal expression HOT 2
- LIST OF AUDIO+TEXT DATASETS HOT 10
- Really bad results on Raspberry Pi 4 HOT 1
- Other italian models for transfer learning HOT 4
- MLS and MAILABS: considerations and issues ( Have you seen my apostrophe?) HOT 9
- Building a custom external scorer (extending the Italian text corpus) HOT 4
- ERROR: Model provided has model identifier 'K�+�', should be 'TFL3' HOT 5
- Project license HOT 3
- Migrate to Coqui
- Docker build fail HOT 2
- Documentation about how to run the various bash script alone
- DOCKERFILE Merge flag TRANSFER_LEARNING and DROP_SOURCE_LAYER HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepspeech-italian-model.