Agent skill

austn-tools

Generate content using austn.net AI services (TTS, images, etc.)

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/austn-tools

SKILL.md

Austn Tools Skill

Purpose

Access Austin's local GPU-powered AI services at austn.net for content generation:

  • Text-to-Speech (Chatterbox TTS)
  • Image Generation (ComfyUI)
  • Background Removal
  • Vector Tracing
  • Audio Stem Separation
  • And more

Available Services

1. Text-to-Speech (/tts)

URL: https://austn.net/tts/new Backend: Chatterbox TTS on local GPU

⚠️ CRITICAL CONSTRAINT: 40-second maximum duration

  • Audio caps at 40 seconds regardless of text length
  • For longer content: split into multiple clips with separate share links
  • Estimate: ~100-120 words = ~40 seconds

Parameters:

Field Description Default
text Text to speak (keep under ~120 words) Required
voice Voice selection "Default voice"
exaggeration Emotional intensity (0-1) 0.5
cfg_weight Voice adherence (0-1) 1.0

Expression Tags (add inline to text):

  • [laughter] - Laughing
  • [giggle] - Giggling
  • [sigh] - Sighing
  • [gasp] - Gasping
  • [whisper] - Whispering
  • [cough] - Coughing
  • [clear_throat] - Throat clearing
  • [groan] - Groaning
  • [humming] - Humming
  • [UH], [UM] - Filler sounds

Example Text:

Hello! [sigh] This is austnomaton speaking. [laughter] Pretty wild, right?

2. Image Generation (/images)

URL: https://austn.net/images/ai_generate Backend: ComfyUI on local GPU

Parameters:

Field Description Default
prompt Image description Required
negative_prompt What to avoid "blurry, low quality, distorted"
seed Reproducibility seed Random
size Image dimensions 512x512
batch_size Number of images 1
publish Show in gallery 10min false

3. Background Removal (/rembg)

URL: https://austn.net/rembg Remove backgrounds from images.

4. Vector Tracing (/vtracer)

URL: https://austn.net/vtracer Convert raster images to SVG vectors.

5. Audio Stems (/stems)

URL: https://austn.net/stems Separate audio into vocal/instrument tracks.

6. 3D Tools (/3d)

URL: https://austn.net/3d 3D content generation.

7. MIDI Generation (/midi)

URL: https://austn.net/midi Generate MIDI sequences.

Usage via Browser Automation

Since these are web UIs, use browser automation to interact:

TTS Generation

python
# 1. Navigate to TTS
navigate("https://austn.net/tts/new")

# 2. Click text field and enter text
click(text_field)
type("Hello world! [laughter] This is a test.")

# 3. Optionally expand advanced options
click(advanced_options_checkbox)
# Adjust sliders if needed

# 4. Click Generate Speech
click(generate_button)

# 5. Wait for audio, then download

Image Generation

python
# 1. Navigate to image generator
navigate("https://austn.net/images/ai_generate")

# 2. Enter prompt
click(prompt_field)
type("A robot writing code in a cozy office, digital art")

# 3. Optionally set advanced options
click(advanced_options_checkbox)
# Set negative prompt, seed, size, batch

# 4. Click Generate Image
click(generate_button)

# 5. Wait for result, download

Browser Automation Tips

Field Locations (approximate)

TTS Page (/tts/new):

  • Text input: Center of page, large textarea
  • Voice dropdown: Below text input
  • Advanced options checkbox: Below voice dropdown
  • Exaggeration slider: After checkbox expanded
  • CFG Weight slider: Below exaggeration
  • Generate button: Green button at bottom

Image Page (/images/ai_generate):

  • Prompt textarea: Top of form
  • Advanced options checkbox: Below prompt
  • Negative prompt: First advanced field
  • Seed input: Below negative prompt
  • Size dropdown: Below seed
  • Batch size dropdown: Below size
  • Generate button: Green button at bottom

Downloading Results

  • TTS: Audio player appears, right-click to save or use download button
  • Images: Image appears in result area, right-click to save

Integration with Video Pipeline

These tools combine well for autonomous video creation:

  1. Script → Write narration text
  2. TTS → Generate voiceover audio
  3. Images → Generate visuals/thumbnails
  4. Combine → Use ffmpeg or video editor

Example Workflow

1. Generate narration: /austn-tools tts "Welcome to austnomaton..."
2. Generate thumbnail: /austn-tools image "Robot mascot, friendly, digital art"
3. Record screen session with browser automation
4. Combine audio + video with ffmpeg
5. Export final video

Output Locations

Save generated content to:

  • Audio: content/audio/
  • Images: content/images/
  • Videos: content/videos/

Service Status & Dependencies

Service Backend Requires Local GPU
TTS Chatterbox TTS Yes (but often available)
Images ComfyUI Yes - needs server running
Rembg Python Likely
VTracer Rust Likely
Stems Demucs Yes
3D Unknown Yes
MIDI Unknown Yes

Connection Details

  • Services route to local GPU via Tailscale
  • Image generation connects to 100.68.94.33:8188 (ComfyUI)
  • If generation fails with "TCP connection" error, the backend server isn't running

Verified Working (2026-02-02)

  • ✅ TTS - Generated 8.4s audio in 6.9s
  • ❌ Images - Failed (ComfyUI server not running)

Notes

  • Services depend on Austin's local GPU being online
  • No API keys needed - it's Austin's own infrastructure
  • TTS has "Share Link" that lasts 7 days
  • Gallery publish is optional and temporary (10 min)
  • Large batches may take time depending on GPU load

Didn't find tool you were looking for?

Be as detailed as possible for better results