GithubHelp home page GithubHelp logo

acecentre / textaloud Goto Github PK

View Code? Open in Web Editor NEW
7.0 4.0 2.0 21.66 MB

iOS app. Built in Swift. Reads out text - sentence by sentence, paragraph by paragraph or word by word

Home Page: https://testflight.apple.com/join/D8VRhWqr

License: GNU General Public License v3.0

Swift 4.76% Objective-C 20.96% C 15.58% C++ 58.66% JavaScript 0.04%
aac assistive-technology tts

textaloud's People

Contributors

gavinhenderson avatar taananas avatar willwade avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

textaloud's Issues

Caching of azure audio

Two reasons we would want to cache audio streamed.

  • It saves another request and hence $$$
  • many people would use the app for a presentation. They would have prepared their speech and won't change it after a certain point.

2 ideas

  • by default cache everything on any play.
  • provide an option near save button to cache all text.

Nb. No need to have this option for apple tts.
Also : I realise this is quite fiddly to get right. But it would be useful if we can do it!

Autoscroll

If the text runs off the end of the view - it doesn't then autoscroll up to see the next parts of the text.

I think it should scroll up on reading so that the next part is readable.

Add preview speech button

In settings speech section is it possible to have a "Preview" button which reads a standard sentence out so the user can hear what it sounds like? E.g "I'm [NAME of Voice] - I sound like this. Do you like me?"

Live Read mode

some people may want to speak and have access to the keyboard st the same time. Speak as you type mode maybe this could be called. This means we would have to:

  • Provide on/off toggle button in settings for "live" mode
  • When "Edit" then selected, it would provide a "play" button during edit mode - and speak word/para/sentence as the person types. (Would not imagine needing to show reading type picker in this mode)
  • "Save" becomes "Done"

Something like

image

Check word / document formats it supports

From a end user

"The first word doc I did I couldn't add it in got error message "Failed to upload, try again" several times.

  • have a screenshot - can't work out where in the app to send this feedback to you the developers!
    Made a 2nd word doc, that worked ok.
    Tried the 1st word doc again and it did upload but it did line breaks where I hadn't had them. Weird!
    This might be because I'd originally done the text for this one in Notes and when I couldn't upload from Notes into the app I copied and pasted it into a Word doc.
    Be good to give instruction on whether you can use Notes or if you have to use Word?"

playing unspeakable element makes a crash

So I tried writing some random string of letters in Gujarti and pressing play. Something in the first 3 characters worked fine but the last half didn't - and we got a crash.

Line 73 - getSentenceRangeForLocation "Thread 1: Fatal error: Range requires lowerBound <= upperBound"

Expected: To fail silently I guess

Screenshot 2023-02-27 at 20 58 43

Screenshot 2023-02-27 at 20 58 00

bugonRead.mov

Saving - clarification

We have iCloud docs turned on in the developer settings. But what's it doing? When pressing save is there a file being saved to the users iCloud account? Can they edit it elsewhere?

Dealing with right to left languages

If you choose a language that is right to left (eg Arabic) (I mean choose a voice but also a keyboard) rather than left to right (eg English) it does strange things. If it's empty text area it does the right thing. It starts typing on the right side. But once you save that and edit again It swaps.

Not sure how we fix. We could do some auto language detection (eg using data from something like this https://www.w3.org/International/questions/qa-scripts - but I feel iOS can do this already? So maybe we just need to make some changes) or maybe we just have a right to left - left to right toggle.

Nb. I used Arabic keyboard below

imageimage

Addition of help area in settings

Screenshot from signal as inspiration.

image

Buggy reading

"I have had a play with ~TextAloud but it seems quite buggy in that sometimes it refuses to read any text no matter which button I press and when I set it off reading in the middle of a long piece of text it did read it but it scrolled up and down the4 document very fast as it was reading in a peculiar way."

Preferred voices for languages

We have users who are multilingual

They will enter text using, say an English keyboard, play that, clear it, then switch keyboards (using the globe icon) - and then Write in their other language - e.g. Gujariti. Right now they would have to go into settings - change the voice - in between changing keyboards. That is messy.

So what we need to do is, depending on the keyboard they enter the text with automatically change the voice to the preferred language. Can we do this?

Plus have a way of setting preferred languages.

Move help to bottom

Now its sitting under language and speech. Looks wierd that customisation is next.

  • Change Customisation heading to "Text area Customisation"
  • Move help to its own block under customisation

Drop text in settings

Think we can drop this now

"Note to change voice settings go to Settings ->
Accessibility -> Spoken Content -> Voices"

Will be available in our docs

Reformat azure voice names

We need a Regex to make the names similar to apple

Eg

CA-ca-VoiceNameneural

Should be "name (neural)"

It's useful to know which ones are neural and not.

different languages support

Problem overview

Now, we don't have support for different languages. We need to fix this, because the application can be used by people from different countries

Task

  • Implement automation language detection using
    NLLanguageRecognizer. Documentation.
  • Add picker to pick custom voices for different languages.

[TRIAGE REQUIRED]

Description

It works well

NOTE: This was created from a form submission

Rate slider

Can we change rate of speaking for online and offline? I know we had a rate changer for offline but that's gone now.

I'm fine if we can only change rate for offline - or we just reflect what the system voice rate is set to. Online will be fixed when we support ssml.

Support Azure TTS

  • Along side iOS built in voices we want to use an online system. There are lots but we might as well start with one that is well documented. Microsoft Azure (and Ace Centre already pay $$ for Azure so why not..)
  • We need to:
  • In settings you can pick a voice - grouped by Language. 
- Choose your voice
- Provide a voice picker from this system or IOS (and the in the future other voice systems)
  • Now in the speak screen we want a way of “caching” the audio. So it should by default cache the text. Any changes it will need to rerun the caching system. This means storing a Mp4 or other file format in the background. This will allow for offline access. Of course only needs to be done for this online voice but could I guess work for system voices too
  • Would need building into https://github.com/AceCentre/TextAloud/blob/main/TextAloud/Models/SpeechSynthesizer.swift

Here’s a starter to get started. It will need a cocoapod adding from Microsoft - and you’ll need our key. See details below.

  1. https://learn.microsoft.com/en-gb/azure/cognitive-services/speech-service/quickstarts/setup-platform?tabs=linux%2Cubuntu%2Cdotnet%2Cjre%2Cmaven%2Cnodejs%2Cios%2Cpypi&pivots=programming-language-swift
  2. https://learn.microsoft.com/en-gb/azure/cognitive-services/speech-service/get-started-text-to-speech?tabs=windows%2Cterminal&pivots=programming-language-swift

Sending text from notes

I can't easily get text from notes app. If we made it accessible via the share sheet to receive text would that solve it?

Combine speech sources list

I think most users will be confused if we ask them to pick which 'speech service', or even if they do understand they are more likely to want to be able to sort by 'All French Voices' than they are by 'All Apple Voices'.

I think it would be nicer if all the voices were combined into the single picker, although maybe a label at the end of the source might be helpful.

From a user perspective, you probably aren't overly fussed about the source of your voice, just the language and the voice.

Play "All" feature in string options dropdown

Right now - if nothing is selected and you press play - it plays all. Thats fine. but it can be confusing if you tap on something and do want it all read aloud.

So suggest having a "All" option in alongside Word/Sentence/Paragraph.

If device is Offline & Voice = Azure then: Only allow "Play all" as an option (as it will play the cached audio). This might fix a problem with expected functionality of our caching system,.

Dealing with being offline

There are some issues if your voice is an azure voice and your device goes offline. RIght now, it caches the whole audio file if you play the whole thing. But It won't cache sentences. Its too difficult to do this and I don't think we need to if we can support ssml (#2 ) But if you go offline and haven't cached the thing - or just want to play a word/sentence/para it just won't. You don't get any alert or anything. Should we? Maybe a small banner that temporarily sits at the top that tells you that is offline. But only if you swap to a azure voice.

Tappable words

The current situation - to tap on words in the text editor view - is a bit of a workaround. To be clear - you cant "tap" on words - it then brings up the keyboard. If you remove the keyboard - the tapped word / cursor goes.

So options

I'll leave it open. Not sure if we can do anything better than what we have right now

In-app settings voice picker

Bearing in mind #1 that we will do in the future I think we need to make choosing a voice a bit easier and replicate the iOS system voice settings. Ie

image

Keyboard shortcuts

So this is what we need ideally

Speak type: 1: word, 2: sentence, 3: paragraph
Stop: Esc
Play/Pause: Space (I'm not sure if we can easily do "Pause" - we cant do that visually yet anyway)

Twice in a row.

Need to investigate this.

"Can't repeat the same sentence twice in a row - need to click elsewhere - either on to another sentence - which you won't really want if you're not wanting to say that - or you can click on an empty row if you've got one of these in your document. But you can't click on the bottom of the doc in the white space."

Sentence bounds.

image

So this says

TextAloud is a different kind of AAC app. It’s about playing text rather than writing text. bore da. Sut wyt ti?

why is ". bore da." Seen as the previous sentence - not as a stand-alone sentence?

[TRIAGE REQUIRED]

Description

I am reading/ listening to a book that I imported in the app on my phone. It keeps crashing. And than goes back to a previous part that I already heard 10 times.
I bought the paid version to read this book for a course. Please help me with this problem.

NOTE: This was created from a form submission

Play button - Usability issue

I think I need to clarify the play button. Its a bit confusing..

  • If you tap on a word - and then press play it will read the word/sentence or paragraph. Thats fine
  • But if nothing is selected it reads the entire page.

I think what it should do is

  • if a string element tapped on - it should read it as it currently does

  • And then find the next element and stop. (Or - we should provide a Next and Previous button to allow the user to step through each item)

  • If nothing selected read the first word/sentence/paragraph. Then move to the next and stop.

  • Pressing play again will read the next one.

Simulator.Screen.Recording.-.iPhone.14.Pro.-.2023-02-24.at.15.10.58.mp4

Pasting relatively long string caused it to fail.

Pasting text is definitley funky. Sometimes it pastes it twice. But In this case it really struggled

Full log here

2023-02-27 23:36:55.071141+0000 TextAloud[59198:5781890] [claims] Upload preparation for claim 78F96489-3930-452F-9538-0C726F0A6FBC completed with error: Error Domain=NSCocoaErrorDomain Code=260 "The file “dbdc3267e451b4caf6b59972daad48652ceca071” couldn’t be opened because there is no such file." UserInfo={NSURL=file:///Users/willwade-personal/Library/Developer/CoreSimulator/Devices/B4E7C980-0619-4FC9-BEA0-CB94CEAE849B/data/Library/Caches/com.apple.Pasteboard/eb77e5f8f043896faf63b5041f0fbd121db984dd/dbdc3267e451b4caf6b59972daad48652ceca071, NSFilePath=/Users/willwade-personal/Library/Developer/CoreSimulator/Devices/B4E7C980-0619-4FC9-BEA0-CB94CEAE849B/data/Library/Caches/com.apple.Pasteboard/eb77e5f8f043896faf63b5041f0fbd121db984dd/dbdc3267e451b4caf6b59972daad48652ceca071, NSUnderlyingError=0x600001c51770 {Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"}} 2023-02-27 23:36:55.071696+0000 TextAloud[59198:5781890] [claims] Claim 78F96489-3930-452F-9538-0C726F0A6FBC failed during preparing for uploading due to error: Error Domain=NSCocoaErrorDomain Code=260 "The file “dbdc3267e451b4caf6b59972daad48652ceca071” couldn’t be opened because there is no such file." UserInfo={NSURL=file:///Users/willwade-personal/Library/Developer/CoreSimulator/Devices/B4E7C980-0619-4FC9-BEA0-CB94CEAE849B/data/Library/Caches/com.apple.Pasteboard/eb77e5f8f043896faf63b5041f0fbd121db984dd/dbdc3267e451b4caf6b59972daad48652ceca071, NSFilePath=/Users/willwade-personal/Library/Developer/CoreSimulator/Devices/B4E7C980-0619-4FC9-BEA0-CB94CEAE849B/data/Library/Caches/com.apple.Pasteboard/eb77e5f8f043896faf63b5041f0fbd121db984dd/dbdc3267e451b4caf6b59972daad48652ceca071, NSUnderlyingError=0x600001c51770 {Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"}} 2023-02-27 23:36:55.420745+0000 TextAloud[59198:5781756] [claims] Upload preparation for claim CADB236A-EBB4-494D-A393-BA9B2ED81FF1 completed with error: Error Domain=NSCocoaErrorDomain Code=260 "The file “dbdc3267e451b4caf6b59972daad48652ceca071” couldn’t be opened because there is no such file." UserInfo={NSURL=file:///Users/willwade-personal/Library/Developer/CoreSimulator/Devices/B4E7C980-0619-4FC9-BEA0-CB94CEAE849B/data/Library/Caches/com.apple.Pasteboard/eb77e5f8f043896faf63b5041f0fbd121db984dd/dbdc3267e451b4caf6b59972daad48652ceca071, NSFilePath=/Users/willwade-personal/Library/Developer/CoreSimulator/Devices/B4E7C980-0619-4FC9-BEA0-CB94CEAE849B/data/Library/Caches/com.apple.Pasteboard/eb77e5f8f043896faf63b5041f0fbd121db984dd/dbdc3267e451b4caf6b59972daad48652ceca071, NSUnderlyingError=0x600001c40840 {Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"}} 2023-02-27 23:36:55.421287+0000 TextAloud[59198:5781756] [claims] Claim CADB236A-EBB4-494D-A393-BA9B2ED81FF1 failed during preparing for uploading due to error: Error Domain=NSCocoaErrorDomain Code=260 "The file “dbdc3267e451b4caf6b59972daad48652ceca071” couldn’t be opened because there is no such file." UserInfo={NSURL=file:///Users/willwade-personal/Library/Developer/CoreSimulator/Devices/B4E7C980-0619-4FC9-BEA0-CB94CEAE849B/data/Library/Caches/com.apple.Pasteboard/eb77e5f8f043896faf63b5041f0fbd121db984dd/dbdc3267e451b4caf6b59972daad48652ceca071, NSFilePath=/Users/willwade-personal/Library/Developer/CoreSimulator/Devices/B4E7C980-0619-4FC9-BEA0-CB94CEAE849B/data/Library/Caches/com.apple.Pasteboard/eb77e5f8f043896faf63b5041f0fbd121db984dd/dbdc3267e451b4caf6b59972daad48652ceca071, NSUnderlyingError=0x600001c40840 {Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"}} 2023-02-27 23:36:55.522460+0000 TextAloud[59198:5781890] [claims] Upload preparation for claim A6500EC6-2A8A-4091-ADCD-4338260FEDA9 completed with error: Error Domain=NSCocoaErrorDomain Code=260 "The file “dbdc3267e451b4caf6b59972daad48652ceca071” couldn’t be opened because there is no such file." UserInfo={NSURL=file:///Users/willwade-personal/Library/Developer/CoreSimulator/Devices/B4E7C980-0619-4FC9-BEA0-CB94CEAE849B/data/Library/Caches/com.apple.Pasteboard/eb77e5f8f043896faf63b5041f0fbd121db984dd/dbdc3267e451b4caf6b59972daad48652ceca071, NSFilePath=/Users/willwade-personal/Library/Developer/CoreSimulator/Devices/B4E7C980-0619-4FC9-BEA0-CB94CEAE849B/data/Library/Caches/com.apple.Pasteboard/eb77e5f8f043896faf63b5041f0fbd121db984dd/dbdc3267e451b4caf6b59972daad48652ceca071, NSUnderlyingError=0x600001c509f0 {Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"}} 2023-02-27 23:36:55.523043+0000 TextAloud[59198:5781890] [claims] Claim A6500EC6-2A8A-4091-ADCD-4338260FEDA9 failed during preparing for uploading due to error: Error Domain=NSCocoaErrorDomain Code=260 "The file “dbdc3267e451b4caf6b59972daad48652ceca071” couldn’t be opened because there is no such file." UserInfo={NSURL=file:///Users/willwade-personal/Library/Developer/CoreSimulator/Devices/B4E7C980-0619-4FC9-BEA0-CB94CEAE849B/data/Library/Caches/com.apple.Pasteboard/eb77e5f8f043896faf63b5041f0fbd121db984dd/dbdc3267e451b4caf6b59972daad48652ceca071, NSFilePath=/Users/willwade-personal/Library/Developer/CoreSimulator/Devices/B4E7C980-0619-4FC9-BEA0-CB94CEAE849B/data/Library/Caches/com.apple.Pasteboard/eb77e5f8f043896faf63b5041f0fbd121db984dd/dbdc3267e451b4caf6b59972daad48652ceca071, NSUnderlyingError=0x600001c509f0 {Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"}} 2023-02-27 23:36:55.624429+0000 TextAloud[59198:5782367] [claims] Upload preparation for claim BB0A3B69-1526-4470-B2A8-69237A161EF1 completed with error: Error Domain=NSCocoaErrorDomain Code=260 "The file “dbdc3267e451b4caf6b59972daad48652ceca071” couldn’t be opened because there is no such file." UserInfo={NSURL=file:///Users/willwade-personal/Library/Developer/CoreSimulator/Devices/B4E7C980-0619-4FC9-BEA0-CB94CEAE849B/data/Library/Caches/com.apple.Pasteboard/eb77e5f8f043896faf63b5041f0fbd121db984dd/dbdc3267e451b4caf6b59972daad48652ceca071, NSFilePath=/Users/willwade-personal/Library/Developer/CoreSimulator/Devices/B4E7C980-0619-4FC9-BEA0-CB94CEAE849B/data/Library/Caches/com.apple.Pasteboard/eb77e5f8f043896faf63b5041f0fbd121db984dd/dbdc3267e451b4caf6b59972daad48652ceca071, NSUnderlyingError=0x600001c0a250 {Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"}} 2023-02-27 23:36:55.624959+0000 TextAloud[59198:5782367] [claims] Claim BB0A3B69-1526-4470-B2A8-69237A161EF1 failed during preparing for uploading due to error: Error Domain=NSCocoaErrorDomain Code=260 "The file “dbdc3267e451b4caf6b59972daad48652ceca071” couldn’t be opened because there is no such file." UserInfo={NSURL=file:///Users/willwade-personal/Library/Developer/CoreSimulator/Devices/B4E7C980-0619-4FC9-BEA0-CB94CEAE849B/data/Library/Caches/com.apple.Pasteboard/eb77e5f8f043896faf63b5041f0fbd121db984dd/dbdc3267e451b4caf6b59972daad48652ceca071, NSFilePath=/Users/willwade-personal/Library/Developer/CoreSimulator/Devices/B4E7C980-0619-4FC9-BEA0-CB94CEAE849B/data/Library/Caches/com.apple.Pasteboard/eb77e5f8f043896faf63b5041f0fbd121db984dd/dbdc3267e451b4caf6b59972daad48652ceca071, NSUnderlyingError=0x600001c0a250 {Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"}} update

Bug on play

So if you tap - it speaks in the language voice selected just fine

but if you press play - its not consisent. its playing the same voice.

the play button seems to be getiing stuck too (like it says stop when not playing and vice versa. But it doesnt look critical)

IMG_5883.MOV

Edit SSML/Non-SSML markers in-app (inputAccessoryView for SSML and non-SSML)

Reminder: This app is primarily designed to help people who can't speak present long streams of communication. Typically this can sound monotone. People control this by doing it sentence by sentence or paragraph - but we'd like to give people more control. E.g. within the text somehow indicate pauses, tone and expression.

Some Voices Support something called SSML. It's a XML markup language that tells the synthesiser to read the text differently. Its neat - but it its not supported by all voice engines. Particularly not the built in iOS engine. So for this we need to first detect what the engine is being used and then provide a textView.inputAccessoryView with options. These options differ with the engine is SSML compatible or not. (https://daddycoding.com/2019/10/30/ios-tutorial-input-accessory-view/)

  • If User chooses a voice with no SSML we need to show the following options in the InputAccessoryView:
    Speech Rate (so we change rate for a portion of the text), Speech volume, Spelling mode (12345 gets read as 1 2 3 4 5 - i.e. it puts spaces into the text), and Silence (n ms). (NB: Wrise does this well - look at the pics https://www.assistiveware.com/products/wrise ). Graphic markers would exist in the text to identify these points elements, and behind the scenes, it would have to create some format that the voice synthesiser reads and uses.

  • For SSML-compatible voices - provide a similar-looking inputAccessoryView - which does something different - creates SSML compatible XML (but only shows text and some graphic markers to individual). This would be neat - there are no apps that I'm aware of that allow you to mark up and play SSML marked up speech

  • See here for ideas https://ssml-editor.azurewebsites.net - or https://www.getwoord.com/ssml-editor or those from Microsoft, Google and Amazon, IBM (See their own product pages)

Note: It may be that we choose NOT to support SSML as the key aspects of timing and rate are good. Which is fine - but going forward there are a lot more elements of SSML that are useful including eg. style. See https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup-voice

I'm not sure if this is really any easier - but we could just look to support Speech Markdown - see the JS library which we could use with JavascriptCore

There are a number of steps to get this done. Here's one idea

  • Be able to read in and play a SSML file
  • Be able to show markers from ssml in the app in a visual way
  • Be able to edit ssml

Build and release for Apple TV

Michael raised this the other week. That we come across users with a large tv and want to use that instead of a smaller device. Strikes me we could do that fairly easily by adding the tv target.

Right to Left if keyboard changed OR some kind of flag in text file

We have a small problem. When using a Right to left language - and swapping to a Left to Right language file it will only change if the keyboard is a language that is right to left. But if the user loads a english txt file (Left to right) - and then say a Urdu txt file (right to Left) and doesn't change the keyboard - it won't change direction. Some users may not bring the keyboard up.

A couple of ideas.

If the txt file that is read in has a file name of a lang code that is RtoL - then use that. eg. urd_SomeFilename.txt => Urdu => Right To Left

or if the text file starts with a comment and lang code eg. //urd
use that?

Or maybe we just force it with //rtol or //ltor in the first line.

Resizing of buttons

For eyegaze users or pointer users, there would be a need for a larger button size for things like play and maybe even a more straightforward way of navigating back and forth a sentence/word/para rather than using the standard cursor control. e.g. a Next and Previous button.

Buttons would need an amount of space around them to make it easier to target.

Crash on read

  • Use Azure voice
  • Load up a random file from iCloud docs. Something a couple of pages long
  • Loads the document.
  • And then tries to read and crashes

Screenshot 2023-02-18 at 21 12 06

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.