quachtina96 / scratch-vui Goto Github PK
View Code? Open in Web Editor NEWAn accessible, audio user interface for Scratch.
An accessible, audio user interface for Scratch.
In the linear project creation process, a child is asked
"What do you want to call it[the project]?"
"What's the next step"
They might not know.....
Current behavior:
1)
"What do you want to call it[the project]?"
"I don't know"
// Scratch ignores anything that doesn't start with Scratch.
2)
"What do you want to call it[the project]?"
"Scratch I don't know"
"Cool! When you say Scratch I don't know, I'll play the project. What's the first step"
"Scratch I don't know"
"I heard you say I don't know. That doesn't match any Scratch commands"
Desired behavior:
"What do you want to call it[the project]?"
"Scratch I don't know"
"That's okay, before you leave the project, I'll ask you to name it. What's the first step?"
"Scratch I don't know"
"A random Scratch command you could use is: ______. <skippable explanation?>"
Someone who is very intentional or quiet, might want to not have to say "Scratch" before every Scratch command.
Examples of Supported Interactions (from user):
"Stop making me say Scratch"
"Stop making me say Scratch before every scratch command when I'm editing a project"
"Stop making me say Scratch when I'm editing a project"
"Only listen when I say Scratch"
"Always listen to me even if I don't say Scratch"
"Play note fifty for zero point two five beats" is recognized as "play note 54.25 beats"
"play note fifty for zero point two five beats is recognized as "play note 54 0.25 beats"
basically, for is misrecognized as "four"
Some things we can try
This makes me think that there could be some way to probabilistically determine the most likely speech or desired form of the speech... but how would you generalize this? how might i build a model for this?
Indicating context changes
Indicating state
Indicating errors
e.g. When I make a project called "say my name in cat language" and the instructions are "play the meow sound", Can I make a new project called "i really think you are a cat", and have its instruction be "say my name in cat language 5 times".
This also raises another question. Beyond using the should projects be able to be recursive? Thinking about recursive functions...useful recursive functions usually take inputs or operate on some variable that is maintained and updated outside of the recursive helper function. Currently, the projects do not provide a way for the user to provide input.... but that would be doable!
In the same way that well documented code often has a README file and in-line comments, our system could support the same thing. The README may be analogous to the project page.
User: "Add a description to the project" or "Add a message to the viewer/user" or "add a project description" or "Here's what the user has to know" or "add project instructions"
Scratch: "Tell me about the project" or "What's the project description"
User: "This project is about ...."
so that ScratchNLP parses the instruction properly (it expects every instruction to end with thats' it)
"if one plus one equals two, play the meow sound (that's it)" .
One argument against (multicommand conditional case):
That utterance could then be combined to create the right output? Is this expected behavior? Or one that is harder to understand.
based on the work we did in ScratchNLP and also in lab 3 of 6.863, can we integrate that system into that of the scratch vui coding interface?
It seems like the perfect use case because it's where we want the flexibility and knowledge to lie. By being able to handle different parts of speech and questions, we can provide users with access to a set data model that can be really meaningful and intuitive.
Some things I anticipate/some questions to explore
commands to support
STAGE 1:
Talking to figure out something or wanting to rephrase is a natural occurrence in conversational speech. The system will need a way to handle this variation in a non-frustrating way.
IDEAS
When designing the scratch-vui system, I imagined that the projects would be modular and composable so that users could reuse their projects and create even more complex behavior. The way projects are activated in Scratch-VUI ("Scratch, <project_name>") frames projects as commands or things that Scratch might do. Which kind of makes sense... but doesn't actually provide a framework for allowing users to provide input (when programming and when interacting with the project itself)
I propose a two-part solution.
There is more to figure out with actually modifying these values (plus considering how these fit into the Scratch project representation.
unclear why this is happening, but current user workaround is to press the space bar to toggle speech recognition on/off
"play [artist]"
"play [song] by [artist]"
"play [song]"
maybe also support filters/sound editing
During project execution, only listen for words like
This is necessary because user speech --> ambiguous... and right now we don't distinguish between possible parses. we pick the simplest parse.
questions
with ambiguous description... how would the user know what to change?
in referencing project number, step number, and asset number
e.g. "scratch what is the first step"
based on certain metrics for knowing how experienced a user is, I could introduce certain kinds of vocabulary as scaffolding and give the user the ability to skip this if they so desire.
one element of scaffolding I'd like to create is a sort of "hello" getting started sequence.
design:
" Hi, I’m Scratch, a tool for you to build and interact with Scratch projects. Scratch projects are computer programs that you can play, interact with, and share. You can create Scratch projects by telling me instructions. I keep track of these instructions and when you say the name of the project, I will follow the instructions step by step."
Maintain state for when Scratch was said by itself in the last utterance. For example:
"Scratch"
"How many projects do I have"
should get the response to "scratch how many projects do i have" instead of no response since "How many projects do I have" did not begin w/ the trigger word.
Problem:
you can name a project
e.g. project name = “scratch create a new project” and when you're done, you can’t call the project because “scratch create a new project” will always trigger the project creation.
Expected Behavior:
The system responds, saying that you can't use that name and asks for a different name.
For new users, it's important to give guidance and hear Scratch-VUI out (for guidance). However, things can be long. we want to user to be empowered and engaged and be able to go at their own pace without having to hear Scratch-VUI out (especially if its repetitive).
On audio cues
when user gets a command wrong...they have to hear over and over...
"i heard you say ____, thats not a scratch command." this was initially designed assuming 2 failure modes:
some missing failure modes
3) you say something, but you know you messed up so you want to start over
4) you say something, but you're still trying to figure it out as you're talking...while you pause it processes what you said
4 can be addressed by only listening when user says "scratch" or triggers listening. If this were to be a mobile app, icld imagine it working with a touch screen by it being one big screen and the entire surface could be a button.
On skippable speech synthesis
inspired by T.E.D. as you navigate through options, you're able to cut off the last thing being said (Because of arrow keys and changes in focus). In our system, we are assuming a screenless experience, but doesn't mean there can't be buttons. maybe there can be a skip button or "skip" cue.
Not sure if this is needed, but when the user is inside a project, editing a project they might want to test the project without having to "See Inside" again after. The "Test the project" command would make it so the user doesn't have to "Play the project" and then "see inside"
"When you've given an instruction that has more than one meaning, I will ask for clarification."
User says something....
Scratch says "I understand that as # different things. 1. [insert some representation of the program that is unambiguous. 2. [insert another representation that is not ambiguous]]
link to issue #35
Right now, the state machine maintains a set of contexts on the general system navigation level and on the project level. What about on a finer scale? Or in a context that touches on both the project and the general system?
Is my current implementation based on contexts too rigid?
For example, say we want to support the behavior of confirming a (dangerous) act before executing:
User: Scratch, delete the say hello project.
Scratch: Are you sure you want to delete the say hello project?
User: Yes.
User: Scratch, say hello.
Scratch: LIke this? hello
User: No. Scratch, say jello.
Scratch: LIke this? jello
User: yes.
Scratch: Okay, what's the next step?
User: Scratch, how many projects do I have?
Scratch: You have 3 projects.
User: What are they called?”
Scratch: Give me a compliment, big water bottle, and get ready for the dance party.
For Scratch command capabilities that only make sense in context, create tutorials that take advantage of example programs (from #38) and show how to create them step by step.
When someone is inside a project or creating a project, they might ask
and want to dive deeper
I want to explore workflows / ideal flows through the interface and document them...
//stopTalking
ScratchAction.General.stopTalking = new Action({
"trigger":/stop talking playing|stop playing (?:the)? ?project|stop (?:the)? ?project/,
"idealTrigger":"stop the project",
"description":"skip what Scratch is saying"
});
Currently, this is difficult to resolve because speech synthesis will get picked up by speech recognition. To handle this, the microphone is turned off, but this means that the user will not be heard via voice if they try to interrupt Scratch.
instead of saying, "i don’t know how to do that” give more helpful error messages at the lowest level of parsing that succeeded,.
e.g.
user's goal: play project called give me water bottle
actual problem: scratch doesn't have a project called give me water bottle
scratch says: I heard you say scratch give me water bottle. I don’t know how to do that.
Should scratch say...
I heard you say scratch give me water bottle. There is no project called give me water bottle.
Even better, can we follow up with
Do you want to create one?
This is tricky because we want projects to be treated like commands... but not the other way around.
Ideas:
Examples:
do the following 10 times play the meow sound play the chomp sound thats it
listen and wait. if the speech is knock knock say who's there thats it
if the speech is knock knock say who's there thats it
when the project starts listen and wait and then if the speech is knock knock say who's there thats it thats it.
looking at example 1:
do the following 10 times. we see that we can generally...collect commands until "thats it"...and then send that to scratchNLP for the parse
plan:
prototype.js:136 nevermind
scratch_instruction.js:59 Error: Scratch does not know how to 'nevermind'
at Function.jsonToScratch (scratch_instruction.js:137)
at ScratchInstruction.getSteps (scratch_instruction.js:56)
at new ScratchInstruction (scratch_instruction.js:20)
at cstor.handleUtterance (scratch_project.js:121)
at ScratchProjectManager.handleUtterance (scratch_project_manager.js:188)
at cstor.handleUtterance (scratch_state_machine.js:67)
at SpeechRecognition.recognition.onresult (prototype.js:138)
scratch_project_manager.js:87 sayingI heard you say nevermind
scratch_project_manager.js:87 sayingThat doesn't match any Scratch commands.
prototype.js:136 go back
scratch_instruction.js:59 Error: Scratch does not know how to 'go'
at Function.jsonToScratch (scratch_instruction.js:137)
at ScratchInstruction.getSteps (scratch_instruction.js:56)
at new ScratchInstruction (scratch_instruction.js:20)
at cstor.handleUtterance (scratch_project.js:121)
at ScratchProjectManager.handleUtterance (scratch_project_manager.js:188)
at cstor.handleUtterance (scratch_state_machine.js:67)
at SpeechRecognition.recognition.onresult (prototype.js:138)
scratch_project_manager.js:87 sayingI heard you say go back
scratch_project_manager.js:87 sayingThat doesn't match any Scratch commands.
prototype.js:136 I'm done
scratch_instruction.js:59 Error: Scratch does not know how to 'i'm'
at Function.jsonToScratch (scratch_instruction.js:137)
at ScratchInstruction.getSteps (scratch_instruction.js:56)
at new ScratchInstruction (scratch_instruction.js:20)
at cstor.handleUtterance (scratch_project.js:121)
at ScratchProjectManager.handleUtterance (scratch_project_manager.js:188)
at cstor.handleUtterance (scratch_state_machine.js:67)
at SpeechRecognition.recognition.onresult (prototype.js:138)
scratch_project_manager.js:87 sayingI heard you say i'm done
scratch_project_manager.js:87 sayingThat doesn't match any Scratch commands.
After they try n times, prompt them to ask for help based on what state they are in.
the 'n' times could vary also based on the state they are in.
The state transitions don't actually make sense....
When a user HOME --> Inside Existing Project, the project state is in 'create'
step and stop (step #)
the inside versus see inside
How might I verify that the grammar is actually helping the situation? OR improve the grammar?
a scratch nlp problem
"What sounds do you have"/"What sounds do I have"/"What sounds do you know"
"Can you make the/a ___ sound"
direct match to name (pull request #27)
automatically map a kind of sound to the existing library (via tags)
lots of sounds... pick 3 random categories... [to implement, need categorization, need way to explore categories of sound]
Another idea:
utilize freesound api to search for sounds and return those
[the blank could be a SOUND_NAME or what the sound is like. could implement by using the freesoundapi to search for and get particular sounds. Also consider synonyms for describing or searching sound.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.