Comments (2)
The information is available here.
Minimum number of sentences:
- Band A: 750
- Band B: 2000
- Band C: 5000
from common-voice.
These categories are primarily based on demographics, but also seem to reflect the technical requirements of language models on the scope and quality of datasets ("affect the health of the dataset for model training purposes").
Clearly, the more sentences the better. Still, could someone estimate the number of spoken sentences so that datasets can also be categorized into three levels e.g. basic, good, excellent in terms of machine learning needs?
from common-voice.
Related Issues (20)
- [FR] Add missing major "sentence_domain"s
- Change language name of 'gom' from "Goan Konkani" to "Konkani (Romi)" HOT 3
- Multi-orthography for Konkani HOT 14
- [BUG] Delta for v10.0 & v11.0 are buggy and should be removed
- LOCALISATION REQUEST: nqo_Nkoo HOT 2
- [BUG] Should purge voted sentences in "review" from local storage
- [BUG] On changing the language on review page, sentences from previous language appear even after refresh HOT 3
- LOCALISATION REQUEST for Shan (ISO-639-3: shn) language HOT 12
- Support bulk-ban or bulk-remove sentences HOT 4
- LOCALISATION REQUEST: Adding Tarifit (Tamazight language) to Pontoon for Localisation HOT 4
- LOCALISATION REQUEST: Dargwa HOT 1
- docs/sentences/correcting existing data: more information needed + migrations docs needed[DOCS] HOT 2
- Konkani (Devanagari) 'knn' language code must be changed to "kok" HOT 11
- [BUG] "https://commonvoice.mozilla.org/de/review" not working HOT 5
- [BUG] Please add Lower Sorbian (dsb) to the language list of the user profile HOT 3
- [BUG] HOT 2
- [Enhancement] Adjust written sentence to spoken text HOT 3
- [BUG] Cannot download selected dataset on mobile; Latest dataset gets downloaded instead HOT 5
- Specify bulk sentence min requirements in Github docs and in bulk submission UI[DOCS] HOT 2
- [BUG] Develoment setup - Cannot build docker and run common voice HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from common-voice.