quachtina96 / scratch-vui Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 1.0 4.62 MB

An accessible, audio user interface for Scratch.

JavaScript 95.71% HTML 3.69% Python 0.60%

scratch-vui's People

Contributors

Stargazers

Watchers

Forkers

annoogy

scratch-vui's Issues

Support "I don't know"

In the linear project creation process, a child is asked

"What do you want to call it[the project]?"
"What's the next step"

They might not know.....

Current behavior:
1)
"What do you want to call it[the project]?"
"I don't know"
// Scratch ignores anything that doesn't start with Scratch.
2)
"What do you want to call it[the project]?"
"Scratch I don't know"
"Cool! When you say Scratch I don't know, I'll play the project. What's the first step"
"Scratch I don't know"
"I heard you say I don't know. That doesn't match any Scratch commands"

Desired behavior:
"What do you want to call it[the project]?"
"Scratch I don't know"
"That's okay, before you leave the project, I'll ask you to name it. What's the first step?"
"Scratch I don't know"
"A random Scratch command you could use is: ______. <skippable explanation?>"

Change listening mode: Toggle requiring use to say "Scratch" before everything

Someone who is very intentional or quiet, might want to not have to say "Scratch" before every Scratch command.

Examples of Supported Interactions (from user):
"Stop making me say Scratch"
"Stop making me say Scratch before every scratch command when I'm editing a project"
"Stop making me say Scratch when I'm editing a project"
"Only listen when I say Scratch"
"Always listen to me even if I don't say Scratch"

Support more forms of questions

"What is happening?"
"What can I do here?" (in current state)
"What can I do here?" (in specific project)

Improve the speed at which programs run

only include the sounds that are necessary for running the project.
figure out how to not have to "start" semantic.py all the time?

Support "play project number 1"

User should be able stop Scratch program execution + Scratch speech.

stop program execution w/ keycontrols
stop speech w/ keycontrols
stop program execution w/ speech

Reduce number of misrecognized commands.

"Play note fifty for zero point two five beats" is recognized as "play note 54.25 beats"
"play note fifty for zero point two five beats is recognized as "play note 54 0.25 beats"

basically, for is misrecognized as "four"

Some things we can try

add Scratch command structures to the grammar
attempt to match rhymes when a comma cannot be parsed validly?

This makes me think that there could be some way to probabilistically determine the most likely speech or desired form of the speech... but how would you generalize this? how might i build a model for this?

Use audio cues (earcons) to confirm actions/indicate state.

Indicating context changes

see inside project
stop seeing inside

Indicating state

play project
project finishes playing

Indicating errors

hightom drum noise

Cache the project.json to reduce calls to server & make this faster/more efficient

Support use of projects as instructions to be made in other projects.

e.g. When I make a project called "say my name in cat language" and the instructions are "play the meow sound", Can I make a new project called "i really think you are a cat", and have its instruction be "say my name in cat language 5 times".

This also raises another question. Beyond using the should projects be able to be recursive? Thinking about recursive functions...useful recursive functions usually take inputs or operate on some variable that is maintained and updated outside of the recursive helper function. Currently, the projects do not provide a way for the user to provide input.... but that would be doable!

Support project descriptions!

In the same way that well documented code often has a README file and in-line comments, our system could support the same thing. The README may be analogous to the project page.

User: "Add a description to the project" or "Add a message to the viewer/user" or "add a project description" or "Here's what the user has to know" or "add project instructions"
Scratch: "Tell me about the project" or "What's the project description"
User: "This project is about ...."

speech recognition --> text saved as description; speech synthesis
users can record their own project descriptions

Should the system automatically insert "that's it" for simple conditional/loops?

so that ScratchNLP parses the instruction properly (it expects every instruction to end with thats' it)
"if one plus one equals two, play the meow sound (that's it)" .

One argument against (multicommand conditional case):

a user might not say "That's it" until the NEXT utterance.
"if one plus one equals two, play the meow sound"
"play the chomp sound"
"play the bing sound 2 times"
"that's it"
^ Using questioning / guidance can be a way to guide the flow of creating conditionals and loops (for multicommand conditionals"

That utterance could then be combined to create the right output? Is this expected behavior? Or one that is harder to understand.

Speech synthesis is recognized by speech recognition

Explore using syntactic-semantic rule structure to create a more flexible interface.

based on the work we did in ScratchNLP and also in lab 3 of 6.863, can we integrate that system into that of the scratch vui coding interface?

It seems like the perfect use case because it's where we want the flexibility and knowledge to lie. By being able to handle different parts of speech and questions, we can provide users with access to a set data model that can be really meaningful and intuitive.

Some things I anticipate/some questions to explore

handling punctuation
running on the client-side (javascript based version/port of the code? --> could be better in the long run for not having to have the server take care of the computation?
what would the semantic rules be? need to explore how lab 3 worked again to understand the data model that was being worked with
how would this rule interface interact with the state machine and the ability to match actions etc.

Give user feedback and properly set project when seeing inside specific project

Support state + context-based "What can I do now?", "What do I do"?

commands to support
STAGE 1:

Handle the variation caused by thinking out loud and children's speech

Talking to figure out something or wanting to rephrase is a natural occurrence in conversational speech. The system will need a way to handle this variation in a non-frustrating way.

IDEAS

remove filler words
ignore incomplete sentences in utterances/commands
"Scratch, hold on" (to pause any interpretation of what is being said). "Ready, Scratch?"

"play a loud sound" unsupported... also misinterpreted as an attempt to play a project

Allow users to create projects that take in inputs (basically functions).

When designing the scratch-vui system, I imagined that the projects would be modular and composable so that users could reuse their projects and create even more complex behavior. The way projects are activated in Scratch-VUI ("Scratch, <project_name>") frames projects as commands or things that Scratch might do. Which kind of makes sense... but doesn't actually provide a framework for allowing users to provide input (when programming and when interacting with the project itself)

I propose a two-part solution.

Explicitly provide the values at the time of the project being called/referenced, "Scratch, <project_name> with <variable_name> as (and ... and <variable_name> as )"
When running a project that wants inputs, Scratch will ask the user for the values if they were not provided when the project was being run.
3)) When creating a project that uses a project that wants inputs as an instruction, Scratch will ask the user for the values while the user builds up the project.

There is more to figure out with actually modifying these values (plus considering how these fit into the Scratch project representation.

fuzzy matching mishandles project names

Sometimes the mic doesn't turn on after Scratch is done talking

unclear why this is happening, but current user workaround is to press the space bar to toggle speech recognition on/off

Integrate Spotify experimental extension!

"play [artist]"
"play [song] by [artist]"
"play [song]"

maybe also support filters/sound editing

Microphone picks up speech from the project execution

During project execution, only listen for words like

"Scratch, pause"
"Scratch, stop"
"Scratch, start over"

When user says a Scratch command, Scratch should just execute it.

Provide unambiguous description for each step in project.

This is necessary because user speech --> ambiguous... and right now we don't distinguish between possible parses. we pick the simplest parse.

questions
with ambiguous description... how would the user know what to change?

Support use of ordinal words

in referencing project number, step number, and asset number

e.g. "scratch what is the first step"

'go home' provides no verbal feedback when already in the home state.

Add a "hello" getting started sequence for new users.

based on certain metrics for knowing how experienced a user is, I could introduce certain kinds of vocabulary as scaffolding and give the user the ability to skip this if they so desire.

one element of scaffolding I'd like to create is a sort of "hello" getting started sequence.
design:
" Hi, I’m Scratch, a tool for you to build and interact with Scratch projects. Scratch projects are computer programs that you can play, interact with, and share. You can create Scratch projects by telling me instructions. I keep track of these instructions and when you say the name of the project, I will follow the instructions step by step."

Trigger listening mode with "Scratch" alone

Maintain state for when Scratch was said by itself in the last utterance. For example:

"Scratch"
"How many projects do I have"
should get the response to "scratch how many projects do i have" instead of no response since "How many projects do I have" did not begin w/ the trigger word.

Validate Inputs

Problem:
you can name a project
e.g. project name = “scratch create a new project” and when you're done, you can’t call the project because “scratch create a new project” will always trigger the project creation.

Expected Behavior:
The system responds, saying that you can't use that name and asks for a different name.

Give users option to spend less time listening to ScratchVUI interface

For new users, it's important to give guidance and hear Scratch-VUI out (for guidance). However, things can be long. we want to user to be empowered and engaged and be able to go at their own pace without having to hear Scratch-VUI out (especially if its repetitive).

concise mode (different set of strings are used to communicate -- see the connectToVM_cypress branch for strings.js)
audio cues
skippable speech synthesis

On audio cues
when user gets a command wrong...they have to hear over and over...
"i heard you say ____, thats not a scratch command." this was initially designed assuming 2 failure modes:

the speech recognition mishears u
you say something that isn't a scratch command

some missing failure modes
3) you say something, but you know you messed up so you want to start over

4) you say something, but you're still trying to figure it out as you're talking...while you pause it processes what you said

4 can be addressed by only listening when user says "scratch" or triggers listening. If this were to be a mobile app, icld imagine it working with a touch screen by it being one big screen and the entire surface could be a button.

On skippable speech synthesis
inspired by T.E.D. as you navigate through options, you're able to cut off the last thing being said (Because of arrow keys and changes in focus). In our system, we are assuming a screenless experience, but doesn't mean there can't be buttons. maybe there can be a skip button or "skip" cue.

"Test the project" vs "Play the project" from inside the project.

Not sure if this is needed, but when the user is inside a project, editing a project they might want to test the project without having to "See Inside" again after. The "Test the project" command would make it so the user doesn't have to "Play the project" and then "see inside"

Clarify ambiguous parses of Scratch Commands

"When you've given an instruction that has more than one meaning, I will ask for clarification."

User says something....
Scratch says "I understand that as # different things. 1. [insert some representation of the program that is unambiguous. 2. [insert another representation that is not ambiguous]]

link to issue #35

Rethink the relationship between context and state in the system.

Right now, the state machine maintains a set of contexts on the general system navigation level and on the project level. What about on a finer scale? Or in a context that touches on both the project and the general system?

Is my current implementation based on contexts too rigid?

For example, say we want to support the behavior of confirming a (dangerous) act before executing:

Delete a project

User: Scratch, delete the say hello project.
Scratch: Are you sure you want to delete the say hello project?
User: Yes.

Confirm a step in the program

User: Scratch, say hello.
Scratch: LIke this? hello
User: No. Scratch, say jello.
Scratch: LIke this? jello
User: yes.
Scratch: Okay, what's the next step?

Understand pronouns as references to antecedents of the previous utterance.

User: Scratch, how many projects do I have?
Scratch: You have 3 projects.
User: What are they called?”
Scratch: Give me a compliment, big water bottle, and get ready for the dance party.

Create a guided tutorial for building projects based on available Scratch Commands.

For Scratch command capabilities that only make sense in context, create tutorials that take advantage of example programs (from #38) and show how to create them step by step.

Support ability to ask about Scratch Commands.

When someone is inside a project or creating a project, they might ask

what scratch commands are there?
what are the scratch commands ?

and want to dive deeper

what's an example?
(leading into tutorial mode)

I want to explore workflows / ideal flows through the interface and document them...

Allow user to interrupt Scratch via speech.

//stopTalking
ScratchAction.General.stopTalking = new Action({
"trigger":/stop talking playing|stop playing (?:the)? ?project|stop (?:the)? ?project/,
"idealTrigger":"stop the project",
"description":"skip what Scratch is saying"
});

Currently, this is difficult to resolve because speech synthesis will get picked up by speech recognition. To handle this, the microphone is turned off, but this means that the user will not be heard via voice if they try to interrupt Scratch.

scratch vui can't respond to greetings??

Support hardware extensions!

pico board
lego ev3
microbit
scratch go

Provide meaningful feedback.

instead of saying, "i don’t know how to do that” give more helpful error messages at the lowest level of parsing that succeeded,.

e.g.
user's goal: play project called give me water bottle
actual problem: scratch doesn't have a project called give me water bottle
scratch says: I heard you say scratch give me water bottle. I don’t know how to do that.

Should scratch say...
I heard you say scratch give me water bottle. There is no project called give me water bottle.

Even better, can we follow up with
Do you want to create one?

"play" trigger acts on Scratch commands instead of just Scratch VUI projects.

This is tricky because we want projects to be treated like commands... but not the other way around.
Ideas:

syntactically distinguish between playing a project and playing a sound. (Be more strict about how we play projects). "Play the (projectname) project"
check that the "project name" is actually a project name + if not, consider it to be a sound (validate that its a sound)
check that the "project name" is actually a project name + if so, ask the user whether they meant to play the project or play a sound ...

Support "What did I say?" "What did you say"

Breaking up long commands.

Examples:

do the following 10 times play the meow sound play the chomp sound thats it
listen and wait. if the speech is knock knock say who's there thats it
if the speech is knock knock say who's there thats it
when the project starts listen and wait and then if the speech is knock knock say who's there thats it thats it.

looking at example 1:
do the following 10 times. we see that we can generally...collect commands until "thats it"...and then send that to scratchNLP for the parse

plan:

scratchNLP will notify of partial parse by returning something :) ScratchInstruction.parse will understand that signal and build up until "thats it" or other end phrases like "end the if statement" "end if" "end loop"

Support more phrases for returning from the InsideProject state (editing flow)

prototype.js:136 nevermind
scratch_instruction.js:59 Error: Scratch does not know how to 'nevermind'
at Function.jsonToScratch (scratch_instruction.js:137)
at ScratchInstruction.getSteps (scratch_instruction.js:56)
at new ScratchInstruction (scratch_instruction.js:20)
at cstor.handleUtterance (scratch_project.js:121)
at ScratchProjectManager.handleUtterance (scratch_project_manager.js:188)
at cstor.handleUtterance (scratch_state_machine.js:67)
at SpeechRecognition.recognition.onresult (prototype.js:138)
scratch_project_manager.js:87 sayingI heard you say nevermind
scratch_project_manager.js:87 sayingThat doesn't match any Scratch commands.
prototype.js:136 go back
scratch_instruction.js:59 Error: Scratch does not know how to 'go'
at Function.jsonToScratch (scratch_instruction.js:137)
at ScratchInstruction.getSteps (scratch_instruction.js:56)
at new ScratchInstruction (scratch_instruction.js:20)
at cstor.handleUtterance (scratch_project.js:121)
at ScratchProjectManager.handleUtterance (scratch_project_manager.js:188)
at cstor.handleUtterance (scratch_state_machine.js:67)
at SpeechRecognition.recognition.onresult (prototype.js:138)
scratch_project_manager.js:87 sayingI heard you say go back
scratch_project_manager.js:87 sayingThat doesn't match any Scratch commands.
prototype.js:136 I'm done
scratch_instruction.js:59 Error: Scratch does not know how to 'i'm'
at Function.jsonToScratch (scratch_instruction.js:137)
at ScratchInstruction.getSteps (scratch_instruction.js:56)
at new ScratchInstruction (scratch_instruction.js:20)
at cstor.handleUtterance (scratch_project.js:121)
at ScratchProjectManager.handleUtterance (scratch_project_manager.js:188)
at cstor.handleUtterance (scratch_state_machine.js:67)
at SpeechRecognition.recognition.onresult (prototype.js:138)
scratch_project_manager.js:87 sayingI heard you say i'm done
scratch_project_manager.js:87 sayingThat doesn't match any Scratch commands.

Prompt user to ask for help when they've repeatedly failed to make Scratch do something

After they try n times, prompt them to ask for help based on what state they are in.
the 'n' times could vary also based on the state they are in.

Support recording of your own sounds and voice.

Rework the ScratchProject State Machine

The state transitions don't actually make sense....

When a user HOME --> Inside Existing Project, the project state is in 'create'

create a way to jump to the appropriate state (if creating a new project instance to represent an existing project)
reevaluate the states used.

Speech recognition often misrecognizes speech.

Speech Recognition Common Mishaps:

step and stop (step #)
the inside versus see inside

Idea:

for Web Speech API, you can define a grammar in the JSGF Format (https://www.w3.org/TR/jsgf/#16587)

How might I verify that the grammar is actually helping the situation? OR improve the grammar?

Consider using fuzzy matching for feedback. "did you mean fuzzy match?" http://glench.github.io/fuzzyset.js/
phonetic kid friendly dictionary by Andrew; (friendly-phonemes)

"I heard you say play a sound. That's not a Scratch command."

a scratch nlp problem

Expand the sound library + refine interface for recognizing sounds.

"What sounds do you have"/"What sounds do I have"/"What sounds do you know"

lots of sounds... here's a few plays sounds (pull request #27)

"Can you make the/a ___ sound"

direct match to name (pull request #27)
automatically map a kind of sound to the existing library (via tags)
lots of sounds... pick 3 random categories... [to implement, need categorization, need way to explore categories of sound]

Another idea:
utilize freesound api to search for sounds and return those
[the blank could be a SOUND_NAME or what the sound is like. could implement by using the freesoundapi to search for and get particular sounds. Also consider synonyms for describing or searching sound.