Austn Tools Skill

Purpose

Access Austin's local GPU-powered AI services at austn.net for content generation:

Text-to-Speech (Chatterbox TTS)
Image Generation (ComfyUI)
Background Removal
Vector Tracing
Audio Stem Separation
And more

Available Services

1. Text-to-Speech (`/tts`)

URL: https://austn.net/tts/new Backend: Chatterbox TTS on local GPU

⚠️ CRITICAL CONSTRAINT: 40-second maximum duration

Audio caps at 40 seconds regardless of text length
For longer content: split into multiple clips with separate share links
Estimate: ~100-120 words = ~40 seconds

Parameters:

Field	Description	Default
text	Text to speak (keep under ~120 words)	Required
voice	Voice selection	"Default voice"
exaggeration	Emotional intensity (0-1)	0.5
cfg_weight	Voice adherence (0-1)	1.0

Expression Tags (add inline to text):

[laughter] - Laughing
[giggle] - Giggling
[sigh] - Sighing
[gasp] - Gasping
[whisper] - Whispering
[cough] - Coughing
[clear_throat] - Throat clearing
[groan] - Groaning
[humming] - Humming
[UH], [UM] - Filler sounds

Example Text:

Hello! [sigh] This is austnomaton speaking. [laughter] Pretty wild, right?

2. Image Generation (`/images`)

URL: https://austn.net/images/ai_generate Backend: ComfyUI on local GPU

Parameters:

Field	Description	Default
prompt	Image description	Required
negative_prompt	What to avoid	"blurry, low quality, distorted"
seed	Reproducibility seed	Random
size	Image dimensions	512x512
batch_size	Number of images	1
publish	Show in gallery 10min	false

3. Background Removal (`/rembg`)

URL: https://austn.net/rembg Remove backgrounds from images.

4. Vector Tracing (`/vtracer`)

URL: https://austn.net/vtracer Convert raster images to SVG vectors.

5. Audio Stems (`/stems`)

URL: https://austn.net/stems Separate audio into vocal/instrument tracks.

6. 3D Tools (`/3d`)

URL: https://austn.net/3d 3D content generation.

7. MIDI Generation (`/midi`)

URL: https://austn.net/midi Generate MIDI sequences.

Usage via Browser Automation

Since these are web UIs, use browser automation to interact:

TTS Generation

python

# 1. Navigate to TTS
navigate("https://austn.net/tts/new")

# 2. Click text field and enter text
click(text_field)
type("Hello world! [laughter] This is a test.")

# 3. Optionally expand advanced options
click(advanced_options_checkbox)
# Adjust sliders if needed

# 4. Click Generate Speech
click(generate_button)

# 5. Wait for audio, then download

Image Generation

python

# 1. Navigate to image generator
navigate("https://austn.net/images/ai_generate")

# 2. Enter prompt
click(prompt_field)
type("A robot writing code in a cozy office, digital art")

# 3. Optionally set advanced options
click(advanced_options_checkbox)
# Set negative prompt, seed, size, batch

# 4. Click Generate Image
click(generate_button)

# 5. Wait for result, download

Browser Automation Tips

Field Locations (approximate)

TTS Page (/tts/new):

Text input: Center of page, large textarea
Voice dropdown: Below text input
Advanced options checkbox: Below voice dropdown
Exaggeration slider: After checkbox expanded
CFG Weight slider: Below exaggeration
Generate button: Green button at bottom

Image Page (/images/ai_generate):

Prompt textarea: Top of form
Advanced options checkbox: Below prompt
Negative prompt: First advanced field
Seed input: Below negative prompt
Size dropdown: Below seed
Batch size dropdown: Below size
Generate button: Green button at bottom

Downloading Results

TTS: Audio player appears, right-click to save or use download button
Images: Image appears in result area, right-click to save

Integration with Video Pipeline

These tools combine well for autonomous video creation:

Script → Write narration text
TTS → Generate voiceover audio
Images → Generate visuals/thumbnails
Combine → Use ffmpeg or video editor

Example Workflow

1. Generate narration: /austn-tools tts "Welcome to austnomaton..."
2. Generate thumbnail: /austn-tools image "Robot mascot, friendly, digital art"
3. Record screen session with browser automation
4. Combine audio + video with ffmpeg
5. Export final video

Output Locations

Save generated content to:

Audio: content/audio/
Images: content/images/
Videos: content/videos/

Service Status & Dependencies

Service	Backend	Requires Local GPU
TTS	Chatterbox TTS	Yes (but often available)
Images	ComfyUI	Yes - needs server running
Rembg	Python	Likely
VTracer	Rust	Likely
Stems	Demucs	Yes
3D	Unknown	Yes
MIDI	Unknown	Yes

Connection Details

Services route to local GPU via Tailscale
Image generation connects to 100.68.94.33:8188 (ComfyUI)
If generation fails with "TCP connection" error, the backend server isn't running

Verified Working (2026-02-02)

✅ TTS - Generated 8.4s audio in 6.9s
❌ Images - Failed (ComfyUI server not running)

Notes

Services depend on Austin's local GPU being online
No API keys needed - it's Austin's own infrastructure
TTS has "Share Link" that lasts 7 days
Gallery publish is optional and temporary (10 min)
Large batches may take time depending on GPU load

Search AI Tools

austn-tools

Install this agent skill to your Project

SKILL.md

Austn Tools Skill

Purpose

Available Services

1. Text-to-Speech (`/tts`)

2. Image Generation (`/images`)

3. Background Removal (`/rembg`)

4. Vector Tracing (`/vtracer`)

5. Audio Stems (`/stems`)

6. 3D Tools (`/3d`)

7. MIDI Generation (`/midi`)

Usage via Browser Automation

TTS Generation

Image Generation

Browser Automation Tips

Field Locations (approximate)

Downloading Results

Integration with Video Pipeline

Example Workflow

Output Locations

Service Status & Dependencies

Connection Details

Verified Working (2026-02-02)

Notes

Search AI Tools

Install this agent skill to your Project

SKILL.md

Austn Tools Skill

Purpose

Available Services

1. Text-to-Speech (/tts)

2. Image Generation (/images)

3. Background Removal (/rembg)

4. Vector Tracing (/vtracer)

5. Audio Stems (/stems)

6. 3D Tools (/3d)

7. MIDI Generation (/midi)

Usage via Browser Automation

TTS Generation

Image Generation

Browser Automation Tips

Field Locations (approximate)

Downloading Results

Integration with Video Pipeline

Example Workflow

Output Locations

Service Status & Dependencies

Connection Details

Verified Working (2026-02-02)

Notes

1. Text-to-Speech (`/tts`)

2. Image Generation (`/images`)

3. Background Removal (`/rembg`)

4. Vector Tracing (`/vtracer`)

5. Audio Stems (`/stems`)

6. 3D Tools (`/3d`)

7. MIDI Generation (`/midi`)