GithubHelp home page GithubHelp logo

ashot72 / speech-to-text-to-image Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 2.48 MB

Generating texts from your voice then images form the texts

TypeScript 30.24% CSS 3.06% JavaScript 52.92% HTML 13.77%
chatgpt large-language-models llm speech-to-text speechtotext text-to-image texttoimage replicate stability-ai whisper whisper-ai

speech-to-text-to-image's Introduction

Speech to Text to Image Generation

I just built an app where you can record your voice and see the text extracted from your voice and the image generated.

I turn my audio into text using Whisper which is an OpenAI Speech Recognition Model that turns audio into text with up to 99% accuracy. Whisper is a speech transcription system form the creators of ChatGPT. Anyone can use it, and it is completely free. The system is trained on 680 000 hours of speech data from the network and recognizes 99 languages.

I generated images from texts using Replicate. Replicate runs machine learning models on the cloud. They have a library of open-source models that we can run with a few lines of code.

To get started.

       Clone the repository

       git clone https://github.com/Ashot72/Speech-to-Text-to-Image
       cd Speech-to-Text-to-Image

       Add your keys to .env file
       
       # installs dependencies
       npm install

       # to run locally
        npm start
      

Go to Speech To Text to Image Generation Video page

Go to Speech To Text to Image Generation description page

speech-to-text-to-image's People

Contributors

ashotik avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

speech-to-text-to-image's Issues

showing 401 error

Microsoft Windows [Version 10.0.22631.3155]
(c) Microsoft Corporation. All rights reserved.

C:\Users\rukmi>cd Speech-to-Text-to-Image

C:\Users\rukmi\Speech-to-Text-to-Image>npm start

[email protected] start
ts-node -r dotenv/config index.ts

App listening at http://localhost:3000
C:\Users\rukmi\Speech-to-Text-to-Image\node_modules\axios\lib\core\createError.js:16
var error = new Error(message);
^
Error: Request failed with status code 401
at createError (C:\Users\rukmi\Speech-to-Text-to-Image\node_modules\axios\lib\core\createError.js:16:15)
at settle (C:\Users\rukmi\Speech-to-Text-to-Image\node_modules\axios\lib\core\settle.js:17:12)
at IncomingMessage.handleStreamEnd (C:\Users\rukmi\Speech-to-Text-to-Image\node_modules\axios\lib\adapters\http.js:322:11)
at IncomingMessage.emit (node:events:530:35)
at IncomingMessage.emit (node:domain:488:12)
at endReadableNT (node:internal/streams/readable:1696:12)
at processTicksAndRejections (node:internal/process/task_queues:82:21) {
config: {
transitional: {
silentJSONParsing: true,
forcedJSONParsing: true,
clarifyTimeoutError: false
},
adapter: [Function: httpAdapter],
transformRequest: [ [Function: transformRequest] ],
transformResponse: [ [Function: transformResponse] ],
timeout: 0,
xsrfCookieName: 'XSRF-TOKEN',
xsrfHeaderName: 'X-XSRF-TOKEN',
maxContentLength: -1,
maxBodyLength: -1,
validateStatus: [Function: validateStatus],
headers: {
Accept: 'application/json, text/plain, /',
'Content-Type': 'multipart/form-data; boundary=--------------------------603685013536462814738367',
'User-Agent': 'OpenAI/NodeJS/3.2.1',
Authorization: 'Bearer sk-YUqD8BTx5PMWc0YyzUQaT3Blbk111111111111'
},
method: 'post',
data: FormData {
_overheadLength: 263,
_valueLength: 9,
_valuesToMeasure: [Array],
writable: false,
readable: true,
dataSize: 0,
maxDataSize: 2097152,
pauseStreams: true,
_released: true,
_streams: [],
_currentStream: null,
_insideLoop: false,
_pendingNext: false,
_boundary: '--------------------------603685013536462814738367',
_events: [Object: null prototype],
_eventsCount: 1
},
url: 'https://api.openai.com/v1/audio/transcriptions'
},
request: <ref *1> ClientRequest {
_events: [Object: null prototype] {
abort: [Function (anonymous)],
aborted: [Function (anonymous)],
connect: [Function (anonymous)],
error: [Function (anonymous)],
socket: [Function (anonymous)],
timeout: [Function (anonymous)],
finish: [Function: requestOnFinish]
},
_eventsCount: 7,
_maxListeners: undefined,
outputData: [],
outputSize: 0,
writable: true,
destroyed: true,
_last: false,
chunkedEncoding: true,
shouldKeepAlive: true,
maxRequestsOnConnectionReached: false,
_defaultKeepAlive: true,
useChunkedEncodingByDefault: true,
sendDate: false,
_removedConnection: false,
_removedContLen: false,
_removedTE: false,
strictContentLength: false,
_contentLength: null,
_hasBody: true,
_trailer: '',
finished: true,
_headerSent: true,
_closed: true,
socket: TLSSocket {
_tlsOptions: [Object],
_secureEstablished: true,
_securePending: false,
_newSessionPending: false,
_controlReleased: true,
secureConnecting: false,
_SNICallback: null,
servername: 'api.openai.com',
alpnProtocol: false,
authorized: true,
authorizationError: null,
encrypted: true,
_events: [Object: null prototype],
_eventsCount: 9,
connecting: false,
_hadError: false,
_parent: null,
_host: 'api.openai.com',
_closeAfterHandlingError: false,
_readableState: [ReadableState],
_writableState: [WritableState],
allowHalfOpen: false,
_maxListeners: undefined,
_sockname: null,
_pendingData: null,
_pendingEncoding: '',
server: undefined,
_server: null,
ssl: [TLSWrap],
_requestCert: true,
_rejectUnauthorized: true,
timeout: 5000,
parser: null,
_httpMessage: null,
autoSelectFamilyAttemptedAddresses: [Array],
[Symbol(alpncallback)]: null,
[Symbol(res)]: [TLSWrap],
[Symbol(verified)]: true,
[Symbol(pendingSession)]: null,
[Symbol(async_id_symbol)]: -1,
[Symbol(kHandle)]: [TLSWrap],
[Symbol(lastWriteQueueSize)]: 0,
[Symbol(timeout)]: [Timeout],
[Symbol(kBuffer)]: null,
[Symbol(kBufferCb)]: null,
[Symbol(kBufferGen)]: null,
[Symbol(shapeMode)]: true,
[Symbol(kCapture)]: false,
[Symbol(kSetNoDelay)]: false,
[Symbol(kSetKeepAlive)]: true,
[Symbol(kSetKeepAliveInitialDelay)]: 1,
[Symbol(kBytesRead)]: 0,
[Symbol(kBytesWritten)]: 0,
[Symbol(connect-options)]: [Object]
},
_header: 'POST /v1/audio/transcriptions HTTP/1.1\r\n' +
'Accept: application/json, text/plain, /\r\n' +
'Content-Type: multipart/form-data; boundary=--------------------------603685013536462814738367\r\n' +
'User-Agent: OpenAI/NodeJS/3.2.1\r\n' +
'Authorization: Bearer sk-YUqD8BTx5PMWc0YyzUQaT3Blbk111111111111\r\n' +
'Host: api.openai.com\r\n' +
'Connection: keep-alive\r\n' +
'Transfer-Encoding: chunked\r\n' +
'\r\n',
_keepAliveTimeout: 0,
_onPendingData: [Function: nop],
agent: Agent {
_events: [Object: null prototype],
_eventsCount: 2,
_maxListeners: undefined,
defaultPort: 443,
protocol: 'https:',
options: [Object: null prototype],
requests: [Object: null prototype] {},
sockets: [Object: null prototype] {},
freeSockets: [Object: null prototype],
keepAliveMsecs: 1000,
keepAlive: true,
maxSockets: Infinity,
maxFreeSockets: 256,
scheduling: 'lifo',
maxTotalSockets: Infinity,
totalSocketCount: 1,
maxCachedSessions: 100,
_sessionCache: [Object],
[Symbol(shapeMode)]: false,
[Symbol(kCapture)]: false
},
socketPath: undefined,
method: 'POST',
maxHeaderSize: undefined,
insecureHTTPParser: undefined,
joinDuplicateHeaders: undefined,
path: '/v1/audio/transcriptions',
_ended: true,
res: IncomingMessage {
_events: [Object],
_readableState: [ReadableState],
_maxListeners: undefined,
socket: null,
httpVersionMajor: 1,
httpVersionMinor: 1,
httpVersion: '1.1',
complete: true,
rawHeaders: [Array],
rawTrailers: [],
joinDuplicateHeaders: undefined,
aborted: false,
upgrade: false,
url: '',
method: null,
statusCode: 401,
statusMessage: 'Unauthorized',
client: [TLSSocket],
_consuming: false,
_dumped: false,
req: [Circular *1],
_eventsCount: 4,
responseUrl: 'https://api.openai.com/v1/audio/transcriptions',
redirects: [],
[Symbol(shapeMode)]: true,
[Symbol(kCapture)]: false,
[Symbol(kHeaders)]: [Object],
[Symbol(kHeadersCount)]: 26,
[Symbol(kTrailers)]: null,
[Symbol(kTrailersCount)]: 0
},
aborted: false,
timeoutCb: null,
upgradeOrConnect: false,
parser: null,
maxHeadersCount: null,
reusedSocket: false,
host: 'api.openai.com',
protocol: 'https:',
_redirectable: Writable {
_events: [Object],
_writableState: [WritableState],
_maxListeners: undefined,
_options: [Object],
_ended: true,
_ending: true,
_redirectCount: 0,
_redirects: [],
_requestBodyLength: 50044,
_requestBodyBuffers: [],
_eventsCount: 3,
_onNativeResponse: [Function (anonymous)],
_currentRequest: [Circular *1],
_currentUrl: 'https://api.openai.com/v1/audio/transcriptions',
[Symbol(shapeMode)]: true,
[Symbol(kCapture)]: false
},
[Symbol(shapeMode)]: false,
[Symbol(kCapture)]: false,
[Symbol(kBytesWritten)]: 0,
[Symbol(kNeedDrain)]: true,
[Symbol(corked)]: 0,
[Symbol(kOutHeaders)]: [Object: null prototype] {
accept: [Array],
'content-type': [Array],
'user-agent': [Array],
authorization: [Array],
host: [Array]
},
[Symbol(errored)]: null,
[Symbol(kHighWaterMark)]: 16384,
[Symbol(kRejectNonStandardBodyWrites)]: false,
[Symbol(kUniqueHeaders)]: null
},
response: {
status: 401,
statusText: 'Unauthorized',
headers: {
date: 'Fri, 01 Mar 2024 16:04:31 GMT',
'content-type': 'application/json; charset=utf-8',
'content-length': '291',
connection: 'keep-alive',
vary: 'Origin',
'x-request-id': 'req_520720ec1c6057ec643d27dae6d89cf9',
'strict-transport-security': 'max-age=15724800; includeSubDomains',
'cf-cache-status': 'DYNAMIC',
'set-cookie': [Array],
server: 'cloudflare',
'cf-ray': '85da621ef9f92e8a-HYD',
'alt-svc': 'h3=":443"; ma=86400'
},
config: {
transitional: [Object],
adapter: [Function: httpAdapter],
transformRequest: [Array],
transformResponse: [Array],
timeout: 0,
xsrfCookieName: 'XSRF-TOKEN',
xsrfHeaderName: 'X-XSRF-TOKEN',
maxContentLength: -1,
maxBodyLength: -1,
validateStatus: [Function: validateStatus],
headers: [Object],
method: 'post',
data: [FormData],
url: 'https://api.openai.com/v1/audio/transcriptions'
},
request: <ref *1> ClientRequest {
_events: [Object: null prototype],
_eventsCount: 7,
_maxListeners: undefined,
outputData: [],
outputSize: 0,
writable: true,
destroyed: true,
_last: false,
chunkedEncoding: true,
shouldKeepAlive: true,
maxRequestsOnConnectionReached: false,
_defaultKeepAlive: true,
useChunkedEncodingByDefault: true,
sendDate: false,
_removedConnection: false,
_removedContLen: false,
_removedTE: false,
strictContentLength: false,
_contentLength: null,
_hasBody: true,
_trailer: '',
finished: true,
_headerSent: true,
_closed: true,
socket: [TLSSocket],
_header: 'POST /v1/audio/transcriptions HTTP/1.1\r\n' +
'Accept: application/json, text/plain, /\r\n' +
'Content-Type: multipart/form-data; boundary=--------------------------603685013536462814738367\r\n' +
'User-Agent: OpenAI/NodeJS/3.2.1\r\n' +
'Authorization: Bearer sk-YUqD8BTx5PMWc0YyzUQaT3Blbk111111111111\r\n' +
'Host: api.openai.com\r\n' +
'Connection: keep-alive\r\n' +
'Transfer-Encoding: chunked\r\n' +
'\r\n',
_keepAliveTimeout: 0,
_onPendingData: [Function: nop],
agent: [Agent],
socketPath: undefined,
method: 'POST',
maxHeaderSize: undefined,
insecureHTTPParser: undefined,
joinDuplicateHeaders: undefined,
path: '/v1/audio/transcriptions',
_ended: true,
res: [IncomingMessage],
aborted: false,
timeoutCb: null,
upgradeOrConnect: false,
parser: null,
maxHeadersCount: null,
reusedSocket: false,
host: 'api.openai.com',
protocol: 'https:',
_redirectable: [Writable],
[Symbol(shapeMode)]: false,
[Symbol(kCapture)]: false,
[Symbol(kBytesWritten)]: 0,
[Symbol(kNeedDrain)]: true,
[Symbol(corked)]: 0,
[Symbol(kOutHeaders)]: [Object: null prototype],
[Symbol(errored)]: null,
[Symbol(kHighWaterMark)]: 16384,
[Symbol(kRejectNonStandardBodyWrites)]: false,
[Symbol(kUniqueHeaders)]: null
},
data: { error: [Object] }
},
isAxiosError: true,
toJSON: [Function: toJSON]
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.