GithubHelp home page GithubHelp logo

paddlestroke / bilingual-ebook-maker Goto Github PK

View Code? Open in Web Editor NEW
3.0 0.0 0.0 31 KB

I created this script in order to create automatically bilingual ebooks from two ebooks in different languages.

License: GNU General Public License v3.0

Python 100.00%

bilingual-ebook-maker's Introduction

Bilingual-ebook-maker

I created this script in order to create automatically bilingual ebooks from two ebooks in different languages. Bilingual ebook with one paragraph in one language, then the same paragraph in another language. Those kind of books are great to learn a foreign language. This script need .fb2 file format for the ebooks.

This task which seems easy is not so because different translations of the same book usually don't have exactly the same paragraphs and inner fb2 settings may not be arranged the same way.

The main principle is that books can be considered as binary strings if you say that dialogs are 1 and paragraphs are 0. The script try to match the 2 binary strings as well as possible. It's not perfect but you can improve it maybe. On my test you get about 80-90% of the book matched correctly.

The script is a mess, partly in french and without much comments. Sorry about that.

Requisit : ebooks should be .FB2 format. Have python installed.

Step 1 : Books file names should be set in the script line 380 :

texteA = ouvrelelivre("garri.fb2") #grosse str texteB = ouvrelelivre("harry.fb2")#grosse str

Step 2 : Put the ebook files in same folder than the script.

Step 3 : Run the script.

Step 4 : hopefully it worked!

Note you have the files LivreDesScores.txt and LivreDesScores2.txt which gives you some feedback on the process.

Enjoy.

bilingual-ebook-maker's People

Contributors

paddlestroke avatar

Stargazers

 avatar  avatar  avatar

bilingual-ebook-maker's Issues

Index out of range

Hi, I really appreciate your project; I've been looking for something similar for a long time since I am into learning languages by reading ebooks in two idioms.

However I am encountering some problems, in particular:
IndexError: list index out of range.
In this section:

balisesDesSectionsA = findSectionBalises(listeTexteA)
i=findBodyBalise(listeTexteA)
print("find body balise A : i = "+str(i))
while i<len(listeTexteA):
	ispORv = ispORvMethode(listeTexteA[i])
	isParole = 0
	if ispORv == 1:
		isParole = isParoleMethode(listeTexteA[i], strParoleTypeA)
		lenLigne=len(listeTexteA[i])
		DataA.append([i,ispORv,isParole,lenLigne])
	
	i+=1

balisesDesSectionsB = findSectionBalises(listeTexteB)
i=findBodyBalise(listeTexteB)
print("find body balise B : i = "+str(i))
while i<len(listeTexteB):
	ispORv = ispORvMethode(listeTexteB[i])
	isParole = 0
	if ispORv == 1:
		isParole = isParoleMethode(listeTexteB[i], strParoleTypeB)
		lenLigne=len(listeTexteB[i])
		DataB.append([i,ispORv,isParole,lenLigne])
		
	i+=1

LongueurTexteA = determineLongueurSTR(DataA)
LongueurTexteB = determineLongueurSTR(DataB)	
print( str(LongueurTexteA[0]) + " " +str(LongueurTexteA[1]) + "     " +  str(LongueurTexteB[0]) + " " +str(LongueurTexteB[1]) )	
print("datas len : " + str(len(DataA))+"  "  +str(len(DataB)))
print("Section numbers of A & B : " + str(len(balisesDesSectionsA))+"  "  +str(len(balisesDesSectionsB)))
iii=0
while iii<len(balisesDesSectionsA) and iii<len(balisesDesSectionsB):
	print("section "+ str(iii)+"  for A : "+ str(balisesDesSectionsA[iii][0]) + " tot=" + str(balisesDesSectionsA[iii][0][1]-balisesDesSectionsA[iii][1][1]) + " for B : " + str(balisesDesSectionsB[iii][0]) + " tot=" + str(balisesDesSectionsB[iii][0][1]-balisesDesSectionsB[iii][1][1]))
	iii+=1
	

Any ideas? Thx very much

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.