GithubHelp home page GithubHelp logo

fossasia / susi_linux Goto Github PK

View Code? Open in Web Editor NEW
1.6K 28.0 148.0 20.05 MB

Hardware for SUSI AI https://susi.ai

License: Apache License 2.0

Python 87.73% Shell 12.27%
virtual-assistant artificial-intelligence raspberry-pi embedded

susi_linux's Introduction

SUSI.AI on Linux

Codacy Badge Build Status Join the chat at https://gitter.im/fossasia/susi_hardware Twitter Follow

This repository contains components to run SUSI.AI on the desktop or a headless smart speaker together with the SUSI.AI Server. Functionalities implemented here include using the microphone to collect voice commands, converting speech to text (STT) using components such as Deep Speech, Flite, Pocket Sphinx, IBM Watson or others, controlling the volume with voice commands and providing a simple GTK interface. In order to use the JSON output of the SUSI.AI Server (written in Java) we use a SUSI.AI API Python Wrapper. The ultimate goal of the project to enable users to install SUSI.AI anywhere, apart from desktops and smart speakers on IoT devices, car systems, washing machines and more.

The functionalities of the project are provided as follows:

  • Hotword detection works for hotword "Susi"
  • Voice detection for Speech to Text (STT) using with Google Speech API, IBM Watson Speech to Text API
  • Voice output for Text to Speech (TTS) working with Google Voice, IBM Watson TTS, Flite TTS
  • SUSI.AI response working through SUSI.AI API Python Wrapper

Project Overview

The SUSI.AI ecosystem consists of the following parts:

 * Web Client and Content Management System for the SUSI.AI Skills - Home of the SUSI.AI community
 |_ susi.ai   (React Application, User Account Management for the CMS, a client for the susi_server at https://api.susi.ai the content management system for susi skills)
 
 * server back-end
 |_ susi_server        (the brain of the infrastructure, a server which computes answers from queries)
 |_ susi_skill_data    (the knowledge of the brain, a large collection of skills provided by the SUSI.AI community)
 
 * android front-end
 |_ susi_android       (Android application which is a client for the susi_server at https://api.susi.ai)
 
 * iOS front-end
 |_ susi_iOS           (iOS application which is a client for the susi_server at https://api.susi.ai)
 
 * Smart Speaker - Software to turn a Raspberry Pi into a Personal Assistant
 | Several sub-projects come together in this device
 |_ susi_installer     (Framework which can install all parts on a RPi and Desktops, and also is able to create SUSIbian disk images)
 |_ susi_python        (Python API for the susi_server at https://api.susi.ai or local instance)
 |_ susi_server        (The same server as on api.susi.ai, hosted locally for maximum privacy. No cloud needed)
 |_ susi_skill_data    (The skills as provided by susi_server on api.susi.ai; pulled from the git repository automatically)
 |_ susi_linux         (a state machine in python which uses susi_python, Speech-to-text and Text-to-speech functions)
 |_ susi.ai            (React Application, the local web front-end with User Account Management, a client for the local deployment of the susi_server, the content management system for susi skills)

Installation

susi_linux is normally installed via the SUSI Installer. In this case there are binaries for configuration and starting and others available in $HOME/SUSI.AI/bin (under default installation settings).

In case of manual installations, the wrappers in wrapper directory need to be configured to point to the respective installation directories and location of the config.json file.

Setting up and configuring Susi on Linux / RaspberryPi

Configuration is done via the file config.json which normally resides in $HOME/.config/SUSI.AI/config.json.

The script $HOME/SUSI.AI/bin/susi-config is best used to query, set, and change configuration of susi_linux. There is also a GUI interface to the configuration in $HOME/SUSI.AI/bin/susi-linux-configure.

The possible keys and values are given by running $HOME/SUSI.AI/bin/susi-config keys

Some important keys and possible values:

- `stt` is the speech to text service, one of the following choices:
    - `google` - use Google STT service
    - `watson` - IBM/Watson STT
    - `bing` - MS Bing STT
    - `pocketsphinx` - PocketSphinx STT system, working offline
    - `deepspeech-local` - DeepSpeech STT system, offline, WORK IN PROGRESS
- `tts` is the text to speech service, one of the following choices:
    - `google` - use Google TTS
    - `watson` - IBM/Watson TTS (login credential necessary)
    - `flite` - flite TTS service working offline
- `hotword.engine` is the choice if you want to use snowboy detector as the hotword detection or not
    - `Snowboy` to use snowboy
    - `PocketSphinx` to use Pocket Sphinx
- `wakebutton` is the choice if you want to use an external wake button or not
    - `enabled` to use an external wake button
    - `disabled` to disable the external wake button
    - `not available` for systems without dedicated wake button

Other interfaces for configuration are available for Android and iOS.

Manual configuration is possible, the allowed keys in [`config.json`](config.json) are currently
- `device`: the name of the current device
- `wakebutton`: whether a wake button is available or not
- `stt`: see above for possible settings
- `tts`: see above for possible settings
- `language': language for STT and TTS processing
- `path.base`: directory where support files are installed
- `path.sound.detection`: sound file that is played when detection starts, relative to `data_base_dir`
- `path.sound.problem`: sound file that is played on general errors, relative to `data_base_dir`
- `path.sound.error.recognition`: sound file that is played on detection errors, relative to `data_base_dir`
- `path.sound.error.timeout`: sound file that is played when timing out waiting for spoken commands
- `path.flite_speech`: flitevox speech file, relative to `data_base_dir`
- `hotword.engine`: see above for possible settings
- `hotword.model`: (if hotword.engine = Snowboy) selects the model file for the hotword
- `susi.mode`: access mode to `accounts.susi.ai`, either `anonymous` or `authenticated`
- `susi.user`: (if susi.mode = authenticated) the user name (email) to be used
- `susi.pass`: (if susi.mode = authenticated) the password to be used
- `roomname`: free form description of the room
- `watson.stt.user`, `watson.stt.pass`, `watson.tts.user`, `watson.tts.pass`: credentials for IBM/Watson server for TTS and STT
- `watson.tts.voice`: voice name selected for IBM/Watson TTS
- `bing.api`: Bing STT API key

For details concerning installation, setup, and operation on RaspberryPi, see
the documentation at [SUSI Installer](https://github.com/fossasia/susi_installer).



## Information for developers

This section is intended for developer.

### **Important:** Tests before making a new release

1. The hotword detection should have a decent accuracy
2. SUSI Linux shouldn't crash when switching from online to offline and vice versa (failing as of now)
3. SUSI Linux should be able to boot offline when no internet connection available (failing as of now)

### Roadmap

- Offline Voice Detection (if possible with satisfactory results)

### General working of SUSI

- SUSI.AI follows a finite state system for the code architecture.
- Google TTS and STT services are used as default services but if the internet fails, a switch to offline services PocketSphinx (STT) and Flite (TTS) is made automatically


### Run SUSI Linux for development purposes

If installed via the SUSI Installer, systemd unit files are installed:
- `ss-susi-linux.service` for the user bus, use as user with `systemctl --user start/enable ss-susi-linux`
- `[email protected]` for the system bus, use as `root` user to start a job for a specific user, 
  independent from whether the user is logged in or not: `sudo systemctl start/enable ss-susi-linux@USER`

By default, it is ran in _production_ mode, where log messages are limited to _error_ and _warning_ only.
In development, you may want to see more logs, to help debugging. You can switch it to "verbose" mode by 2 ways:

1. Run it manually

- Stop systemd service by `sudo systemctl stop ss-susi-linux`
- Use Terminal, _cd_ to `susi_linux` directory and run

python3 -m susi_linux -v

or repeat `v` to increase verbosity:

python3 -m susi_linux -vv


2. Change command run by `systemd`

- Edit the _/lib/systemd/system/ss-susi-linux.service_ and change the command in `ExecStart` parameter:

```ini
ExecStart=/usr/bin/python3 -m susi_linux -v --short-log
  • Reload systemd daemon: sudo systemctl daemon-reload

  • Restart the servive: sudo systemctl restart ss-susi-linux

  • Now you can read the log via journalctl:

    • journalctl -u ss-susi-linux
    • or journalctl -fu ss-susi-linux to get updated when the log is continuously produced.

The -v option is actually the same as the 1st method. The --short-log option is to exclude some info which is already provided by journalctl. For more info about logging feature, see this GitHub issue.

susi_linux's People

Contributors

akshat-jain avatar alok760 avatar anupkumarpanwar avatar betterclever avatar bhadreshpsavani avatar changeless avatar codacy-badger avatar cromagnonninja avatar cweitat avatar donyme avatar felixonmars avatar girijamanoj avatar gitter-badger avatar hongquan avatar mariobehling avatar norbusan avatar orbiter avatar pipe-runner avatar prateekiiest avatar radioactivepiyush avatar sansyrox avatar vishalsharma0309 avatar woshikie avatar xeon-zolt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

susi_linux's Issues

Documentation fix

While the readme of the Ubuntu Start Guide and the Raspberry Pi guide mention running python3 main.py after the installation process, no file by the name of main.py exists. The correct method to run it is python3 -m main.

Add support for Home Automation

A needed feature in the Susi Hardware project is support for the Home Automation features in Susi.
Since we focus on Open Source Components, we can add support for openHAB project (https://www.openhab.org/).
openHAB is an open source Home Automation project and it already has support for Raspberry Pi. The only challenge is to connect openHAB API to correct Susi Action Types.

For example, we can have an ActionType, "turn_on" with "device" argument as "fan" and "location" argument as "living room", so we can call the API of openHAB project for that particular device.

It is just a random example. I am currently reading the documentation of openHAB project and setting it up on my Raspberry Pi. I will write clear working directions in the issue once I am familiar with openHAB API clearly.

Improve speed of the whole system

Currently the app is very slow. It takes considerable time between speech input and output. I dug down to find out cause and observe the same pattern on Alexa.
The main cause of slow down is that we need to handle things one by one i.e. first we record the voice, then send it to Google for recognition , after getting hypothesis text, we send it to Susi server and it replies back text, and then we synthesize speech either on device or send it for synthesis to Watson which can cause even more slowdown.

How does Amazon handles that?
Amazon Alexa devices send voice recorded to Amazon , to "Alexa Voice Service", the Alexa voice service does all the processing there and sends back a file with speech output , which is then played.

Can we incorporate the same in Susi?
Yes, we can shift all the speech processing work to server and follow Alexa like approach. This also helps us to think in a cross language form, i.e. a web client can also record , using web apis for microphone, and then playback the file sent via server. Also, smart mirror project https://magicmirror.builders/ has modules written primarily in Javascript so we get an advantage there too.

It can be done using a servelet on Susi Server dedicated for Speech Processing using other APIs.

@mariobehling @Orbiter what are your views on this?

Add option to start App in WebConnect mode i.e. with Web Socket connection open

Currently, app by default starts in normal recognition and processing mode and you need to make some code changes to switch to other configuration i.e. WebConnect mode where the module receives JSON response from WebSocket, rather than making a request by itself to Susi Server.

So, we need an option to specify a "--webconnect" mode while starting app to choose Susi Webconnect mode, and default mode without that flag.

Add a config generator script

Currently, config is explicitly provided in config.json. The user needs to edit this manually to configure SUSI Hardware. We can instead move config.json to config.json.sample and add a setup script that asks users some parameters and generates a config.json file based on it.

Connect Susi Hardware to Susi Webchat for demonstration

@Orbiter recommended about connecting Susi Hardware to Susi Webchat so that when we send a request via Speech to our Hardware device, we can view the exact response of Susi in a webchat interface also. This functionality is mainly for demonstration purposes where users can see the exact query.

Exact Requirement as stated on Gitter Channel
image

@uday96 @rishiraj824 Can you suggest a possible implementation of it? How can it be achieved?

Add Documentation for detailed Raspberry Pi setup with screenshots

Documentation for the Raspberry Pi setup is still inaccurate and a lot of hidden dependencies are being missed which users encounter at time of installation.

All such dependencies are need to be installed earlier and specified so that installation is straight forward. For this, a guide with proper screenshots for each step and correct commands must be added.

I have started work on it and it will be completed by tomorrow morning (IST).

Add travis.yml file and include tests

Tests are not included yet and travis is not working correctly. They are needed to be added.

Challenge here is to mock the voice activation and recognition and test other features.

Add Hotword Based Wakeup

Now, that prototype is almost ready, next step is to add Hotword based wakeup using "Susi" as hotword

Use Rx Observable based approach for handling subscriptions

Currently for handling passage of data between various threads, we rely on a queue on this input is passed and it is checked via busy/wait on other thread.
It can be improved by using RxObservable based Pubisher, Observer based aproaches which are preferred in modern programming practices.

It would be solved along with #62

Add Google Text to Speech Support

Currently, we support Flite and IBM Watson TTS. We can add Google TTS using the Google Translate API.
I did not add it previously since it is since Google once pulled out support for TTS API using translate using the known approach. The current way of handling it is a bit hacky but since speeds and voice are pretty good, we can add it till it works since it is not an officially documented API so support may be pulled out.
If support is removed some day, we still will have fallback TTS, Flite and IBM Watson.

The work for it is done in already done in PR #46.

Unable to install on Mac

Unable to install on Mac. I followed the read.me mentioned, and I learned that it is not working for mac os. Please ensure and cross-verify the read.me before publishing it.

Loop through action types and display result for all actions

Actual Behaviour

Query "multi3". The sever response contains 3 action types but only result of first is displayed.
Also for map type only first action type is used.

Expected Behaviour

"actions": [
      {
        "type": "answer",
        "expression": "Singapore is a place with a population of 3547809. Here is a map: https://www.openstreetmap.org/#map=13/1.2896698812440377/103.85006683126556"
      },
      {
        "type": "anchor",
        "link": "https://www.openstreetmap.org/#map=13/1.2896698812440377/103.85006683126556",
        "text": "Link to Openstreetmap: Singapore"
      },
      {
        "type": "map",
        "latitude": "1.2896698812440377",
        "longitude": "103.85006683126556",
        "zoom": "13"
      }
    ]

If response is of this type, then the client should loop through all the actions and display result one after other with delay provided in the response. So, ideally the result of this response should be :

For response this

"actions": [
      {
        "type": "answer",
        "expression": "Line 1"
      },
      {
        "type": "answer",
        "delay": 400,
        "expression": "Line 2"
      },
      {
        "type": "answer",
        "delay": 1000,
        "expression": "Line 3"
      }
    ]

Response should be

  • 1st message Line 1
  • 2nd message after 0.4 seconds of 1st message : Line 2
  • 3rd message after 1 second of 1st message : Line 3

SUSI for Meilix

we are trying to add SUSI to Meilix and faced this error
fossasia/meilix#81
I can fix the first 3 errors by editing the /usr/share/alsa/alsa.conf
but how to fix the channel error
susi

Enable Speech Output of other action types.

Currently, speech output is available only for Answer action type. While, this works on other platforms, in a completely headless system, information about other action type must be spoken as well.
This includes:

  • Speaking Table Action Type Responses (First few outputs)
  • Speaking RSS Action Type Response (First few outputs)

No Audio Feedback

Currently , after hotword is detected there is no feedback like a small "ding" bell sound, to indicate that recognition has started and when it has ended like Google Home and Alexa Echo.
Such a feedback is crucial for proper use.

Configuration interface for Susi

Susi Hardware will be a headless client so there must be a configuration interface to do it.
There are different options available for the same.
Some of them are:
Android App: - Adding Functionality in current Android App to connect to Susi Hardware scanning a QR Code displayed on I2C display connected to Raspberry Pi and configure it.
Web App A Web App can be made to configure Susi using any other device over same network without need to plug in a monitor to Raspberry Pi.

Use better audio playback approach

Due to a problem in Python PyAudio library with the Raspberry Pi soundcard. Audio gets hung in between at times and which results in audio thread forever hanging in that state without freeing up the resources.
We can use, sox based playback to fix this.

Use Flite instead of Festival TTS

Currently, we are using Festival TTS. While Festival is excellent for a system with good processing power, it takes considerable amount of time to synthesize speech on English Female Voice being used now on low powered development board like Raspberry Pi.

For example, to synthesize "Hey! I am fine, Thank You" , it takes 10 seconds !!!

We can use Flite instead.
Flite stands for Festival Lite and it synthesize US English Female Voice very fast. The above query for example takes mere 1 second to synthesize.

Thus, we should switch to Flite Speech Synthesis Engine.

Add support for Watson TTS Service

IBM Watson provides online TTS Service which has more natural voice. Support for it can be added with option to choose between Flite and IBM Watson.

Switch to LiveSpeech module in PocketSphinx for Recognition of Hotword

Currently, the process of detection hotword goes as follows:

  • Open Input Device with PyAudio
  • Get a chunk from audio input stream
  • use pocketsphinx decoder class to recognize whether it contains the hotword.

I followed the above procedure following an answer on Stackoverflow . While it gets the job done, but I found that pocketsphinx python library already contains a module LiveSpeech which can be used to detect keyword from LiveSpeech.
On testing , I found better performance, thus we should make a switch.

Make a GUI for authentication

Currently, we only have a CLI to configure various settings and logging in to SUSI.
We can add a GUI using the PyGTK Library to make setting up and configuring SUSI Linux easily.

Best Operating System suggestion for Hardware Application

As per current prototype of Susi Hardware that has hotword detection and handsfree interaction, any Operating System can be used for basic testing.
But for final product, we must decide and focus our work on a particular flavor of Linux, since it will help to maintain uniformity in project.

Current Options

Raspbian (Fork of Debian)

Pros

  • Excellent support for Raspberry Pi development board.
  • Most developers are familiar with Debian.

Cons

  • Focused on Raspberry Pi, ignores other development boards.
  • A lot of unneeded software preinstalled which uses RAM in the background.

Ubuntu Core

Pros

  • Support for many development boards.
  • Familiarity with Debain

Cons

  • Packages for voice tts Festival which is being used is needed to be compiled manually,
  • Voice packs for TTS are needed to be installed manually,

Arch Linux for ARM/ Arch Linux

Pros

  • Minimal. No extra software by default. Installation can be customized to suit our Susi Hardware Apps specific needs.
  • Support all development boards like Orange PI, Omega, Raspberry Pi
  • Festival TTS Application is available on official repositories for all platforms (ARM/x86/x64) with voice packages as well. (HUGE)
  • Fully Packaged solution can be distributed using AUR (https://aur.archlinux.org/) for installation as easy as a single line. (HUGE)

Cons

  • Unfamiliarity of some developers with Arch Environment.
  • Relatively small community.
  • Some different conventions from other operating systems like Python3 as default and bleeding edge distribution (always latest version of all packages).

This being said, I strongly recommend Arch Linux to be used to develop solution however I'm free to other opinions as well.

Best Strategy to put together SUSI images

Susi Hardware has a dependency on Libraries like PocketSphinx which takes a long amount of time to install on a Raspberry Pi and similar under- powered development boards (since they are needed to be compiled from source).
We have some potential solutions to save end users from going through the trouble:

  • Like Mycroft project (https://mycroft.ai), we can ship the solution as a compressed images with all dependencies.
  • Compile a packaged binary for dependencies like PocketSphinx for Raspberry Pi, and distribute it. Users can then install it directly without need to install it from source.

I am looking for better strategies and currently working on fulfilling first.

No details in README

Is this some kind of hardware project for Susi? It would be great to add some details in README

Add support for choosing Hotword Detection Engine in Configuration

Currently, snowboy hotword detection is being used as primary and there is no option to switch to PocketSphinx directly via configuration script.

On environment like Meilix (32 bit) where Snowboy is not support presently, we can provide option for falling back to PocketSphinx.

Switch to Audio Producer - Consumer based approach

Currently, the approach to Audio processing isn't be best and it needs to be handled in a better way.
MyCroft AI has implemented it in a very nice manner and they libraries and technologies we and they are using exactly match i.e. PocketSphinx for Hotword, Google and other STT of Speech to Text.

Link to MyCroft Approach

Please keep instructions on Readme.md

For now Readme.md is not super long and we don't need separate documents. Please merge the setup info to the Readme.

Once other documents are required, please follow best practices. Keep docs in:

  • /docs

Keep images in:

  • /docs/images

Shift to State Machine Approach

Current approach of handling things is not very resilent and structured. We can improve the codebase by using State Machine paradign like SUSI MagicMirror Project.

State Machine defines valid transitions and manage transactions ensuring that system is always in a consistent state.

It can be achieved using the PyTransition Library. https://github.com/pytransitions/transitions which is a well recognized library for managing state machines in Python.

Add documentation for functions.

Currently inline documentation of app functions is not present. This is a needed for someone to understand the codebase better.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.