Skip to content

old4ever/transcription-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Go HTTP Audio Server

This is a rudimentary Go HTTP server for recording, transcribing, and translating audio.

Features

  • Start audio recording using pw-record.
  • Stop audio recording.
  • Transcribe audio using the OpenAI Whisper API.
  • Translate audio using the OpenAI ChatCompletion API.

Requirements

  • Linux
  • Go
  • pw-record (PipeWire audio recorder)
  • OpenAI API Key (for Whisper and ChatCompletion models)

Installation

  1. Clone the repository.

  2. Install dependencies:

    go mod tidy
  3. Set up your environment variables:

    • Create a .env file in the project root.

    • Add your OpenAI API keys to the .env file:

      OPENAI_WHISPER_API_KEY=<YOUR_OPENAI_WHISPER_API_KEY>
      OPENAI_TRANSCRIBE_API_KEY=<YOUR_OPENAI_CHATCOMPLETION_API_KEY>
      

Usage

  1. Run the server:
    go run main.go
    The server will start on port 5757.
  2. API Endpoints:
    • POST /audio/start: Starts audio recording.
    • Response:
      • {"id": "<process_id>", "message": "Recording started", "file": "<output_file>"}
    • POST /audio/stop?id=<process_id>: Stops audio recording.
      • Query Parameter:
        • id: The process ID returned by /audio/start.
      • Response:
        • {"message": "Recording stopped", "file": "<file_name>"}
    • POST /audio/transcribe?filename=<filename>&lang=<language_code>: Transcribes audio using OpenAI Whisper.
      • Query Parameters:
        • filename: The path to the audio file to transcribe.
        • lang (Optional): The language of the audio (e.g., en for English, ru for Russian). If the language is not valid, it will be ignored.
      • Response:
        • {"message": "<transcription_text>"}
    • POST /audio/translate?input=<input_text>&prompt=<prompt_text>: Translates audio using OpenAI ChatCompletion.
      • Query Parameters:
        • input: The input text to translate.
        • prompt: The prompt to guide the translation model.
      • Response:
        • {"message": "<translation_text>"}

CORS Configuration

The server is configured to allow requests from http://localhost:3000. You can adjust the AllowOrigins configuration in main.go if needed.

Notes

  • The server uses pw-record for audio recording, ensure it is correctly configured in your system.
  • The removeTempFiles function is currently not working as expected and may require adjustments.
  • Error handling is rudimentary and logging could be improved.

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors