GithubHelp home page GithubHelp logo

cjsatuforc / kora-voice-assistant Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yuzhou346694246/kora-voice-assistant

0.0 1.0 0.0 70.07 MB

A voice assistant for Autodesk's Fusion 360 design software

Python 93.10% Ruby 0.10% C 3.59% C++ 2.61% CSS 0.03% JavaScript 0.01% HTML 0.56%

kora-voice-assistant's Introduction

Kora

Table of contents

  1. What Is Kora
  2. Packages
    1. Software Needed
    2. Preinstalled Packages
  3. Getting Started
  4. Explaining The Source Code
    1. Recording and Parsing Voice Commands
    2. User-Kora Interaction Logging
    3. Extrude Command
    4. Save Command
    5. Save As Command
    6. Rotate Command
  5. Next Steps
  6. Authors

What Is Kora

Kora is a proof of concept project that integrates a natural language processing library into Autodesk's Fusion 360 3-D computer aided design software. Kora is a speech-based virtual assistant for Fusion that lets users perform a subset of tasks within the product such as saving a document by verbally instructing it to perform the task. Kora offers users a tool that decreases the time required to achieve their goals within Fusion by offering an interface that runs in parallel with and complements the keyboard and mouse.

Kora works on Windows and Mac OS

Packages

Software Needed

Since Autodesk's Fusion 360 runs Add-Ins in their own environment, all software packages needed to run Kora had to be packaged within the Kora Add-In itself. Lucky for you, this means the only software that needs to be install is MongoDB. For this project we used v3.6.2, but any version newer than this will do.

A user also needs to make an account at https://wit.ai to obtain their Server Access Token.

Preinstalled Packages

The software packages that are already packaged within the Kora source code are:

  • PyAudio version 0.2.9. This is used for streaming audio from the user
  • PortAudio. This is an executable within PyAudio that PyAudio uses to stream audio.
  • PyMongo version 3.61. PyMongo is the low-level driver wrapping the MongoDB API into Python.
  • Mongoengine version 0.15.0. MongoEngine is a Document-Object Mapper for working with MongoDB from Python.

Getting Started

Make sure you first read the Packages heading above. 1. Install Kora into the Fusion 360 Add-Ins folder. The Add-Ins folder is generally in

Windows: C:/Users/<user>/AppData/Roaming/Autodesk/"Autodesk Fusion 360"/API/AddIns

Mac: /Users/<user>/Library/"Application Support"/Autodesk/"Autodesk Fusion 360"/API/AddIns

  1. Name the installed repository Kora. It is important that the installed repository is named Kora, to match the Add-In name.
  2. Go to wit.ai's website and get your Server Access Token and paste it in Kora/main/config.py WIT_AI_CLIENT_ACCESS_TOKEN
  3. In Fusion click on the ADD-INS dropdown in the top right of the ribbon, and click Scripts and Add-Ins...
  4. Click the Add-Ins tab at the top
  5. Click Create
  6. Click Python as the language and enter Kora as the Add-In name
  7. In Folder Location, browse to the Kora folder that you placed in the Add-Ins directory in step 2. Kora is now set up as an Add-In.
  8. Open a terminal window and type mongod to start the MongoDB daemon.
  9. Back in Fusion, double click the Kora Add-In, then exit the Add-Ins menu.
  10. Click the Add-Ins tab at the top
  11. Click Activate Kora

Explaining The Source Code

Recording and Parsing Voice Commands

The recording and parsing of user voice commands occurs in the nlp module and is all done via the streamAudio function which takes optional callbacks for when the beginning and end of a command is detected. The function begins by opening an audio stream using the Pyaudio library and making an HTTP request to WIT to parse chunked audio data coming the generator _gen which handles recording from the stream. When _gen detects audio levels above a specified silence threshold, it starts recording and yielding chunks of audio to WIT. Upon the audio levels falling below and remaining below the threshold for a specified amount of time, the command is deemed to be complete and the end of the audio data is signalled. In streamAudio the response from WIT containing the parsed meaning of the streamed audio is waited upon. Once the response is received, the Pyaudio stream is closed and the response is returned from the function.

User-Kora Interaction Logging

Inside of Kora/main/modules/logging is the relevant code. The file interaction.py has the Mongoengine class that outlines how the user-kora interaction document should be stored. The function logInteraction is the python decorator responsible for the actual storing of the interaction document. It first calls mongoSetup to initiate the connection to the mongoDB daemon.

In fusion_execute_intent.py, the executeCommand function is decorated by logInteraction and is called when Kora has a response back from Wit.ai and Kora wants to figure out what command to execute then execute it. Before that happens, the JSON containing the Wit.ai response and some extra, is routed through logInteraction. The logInteraction function then extracts information from the JSON then lets the JSON continue on to executeCommand. When executeCommand returns, it returns to logInteraction where the remaining fields needed to store the interaction document are extracted. Then logInteraction inserts the interaction document into the mongoDB database.

Extrude Command

All of the commands are located in Kora/main/modules/fusion_execute_intent/tasks. The extrude function is given a string representing what the user said, the magnitude, and the units. extrude checks if there is a "down" in the sentence. If there is and the magnitude is currently positive, then the magnitude is changed to negative. Next, extrude converts the magnitude to the equivalent magnitude in terms of centimeters (the API only excepts centimeters) if the units are not already centimeters. Then extrude scans through all of the profiles and faces in the project and extrudes them by magnitude units. If there are no profiles or faces selected, then Kora prompts the user to select the profile or face they would like extruded.

Save Command

The function save first checks that the project has been saved before. If it hasn't then Kora prompts the user to input what they want the project to be called and then hands off the flow of control to saveAs. If the project has been saved already, i.e. the project has a name, then save goes ahead and saves the project.

Save As Command

The function saveAs first checks if the call is coming from save. If it is, then it creates a copy of the project and saves it as the supplied filename. Otherwise, saveAs first converts the supplied filename to camelcase and then creates a copy of the file and saves it under the camelcased filename.

Rotate Command

The rotate function begins by converting the magnitude to radians if it is not already given as radians. The function the proceeds to determine the axis about which it should rotate the camera. If the rotation direction is left or right, then it rotates about the vector that defines the camera's up direction. Otherwise if rotates about a vector that is perpendicular to the camera's up direction and to the vector that defines the camera's position relative to the origin of what it is viewing, i.e. its target. Before the rotation occurs, the axis of rotation is set to intersect the camera's target so that the camera is rotating around its target.

Next Steps

  1. Reduce Kora's latency. Right now it takes on average 3.5 seconds after the user stops speaking until Fusion executes the command. This is all on Wit.ai's side. Kora is simply passing the audio on to Wit and then waiting for a response. The code is set up such that pivoting to a new natural language processor shouldn't be too difficult.
  2. Adding a wake-word to Kora. Instead of Kora always listening in the background, add a "Kora" wake-word would be much more user friendly.

Authors

Developed by:

For their undergraduate senior project at Oregon State University

kora-voice-assistant's People

Contributors

fischjer4 avatar austinrow1 avatar stallkaj avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.