Journey to Building a Free Automated Story Machine

Okay folks, let’s talk about something seriously cool: the automated story machine. You might have seen concepts floating around, maybe even tried building one yourself and hit some roadblocks (I know I did initially!). The idea of automatically generating engaging stories, complete with visuals and voiceover, is incredibly appealing, but often seems complex or expensive.

Well, I recently went down the rabbit hole and built my own version. It achieves the same kind of results you might see from paid platforms, but with a twist – it’s built almost entirely on free, open-source software running right on my own computer. This gives you incredible flexibility and, importantly, costs virtually nothing to run once set up.

There’s a small catch: it requires setting up some tools locally using Docker. But trust me, once you have this foundation, it opens the door to *so many* other automation possibilities beyond just this project. In this post, I want to walk you through:

Why I decided to build this automated story creator locally.
The amazing open-source tools that power the whole thing.
A peek into the n8n workflow that ties it all together.
The kind of results you can expect (spoiler: they’re pretty neat!).

So grab a coffee, and let’s dive into the world of free automated video creation!

Table of Contents

Why Go Local and Open Source for Automated Video?

You might be wondering, “Why bother with local setup when cloud services exist?” That’s a fair question! Tools like Creatomate (which the original video creator mentioned) offer powerful features, but they often come with hefty price tags, especially if you’re generating a lot of content or just experimenting.

I ran into limitations with cloud platforms quickly:

Cost: Subscription fees and per-video costs add up fast.
Limits: Hitting API request limits, storage caps (like Airtable’s attachment limits), or processing constraints can halt your creative flow.
Flexibility: You’re often tied to the specific features and models the platform offers.

Building it locally with open-source tools flips the script:

Cost-Effective: It’s FREE to run (aside from your electricity bill!). We’re using models like Google’s Gemini Flash via OpenRouter’s free tier for scripting and local tools for everything else.
No Limits (Almost): Your only real limits are your computer’s processing power and storage space. No more worrying about hitting arbitrary caps.
Ultimate Flexibility: Swap out models (like different Stable Diffusion checkpoints or Text-to-Speech voices), tweak parameters endlessly, and integrate with virtually anything using APIs.
Learning Opportunity: Setting this up is a fantastic way to learn about Docker, APIs, AI models, and automation workflows.

The initial setup takes a bit of time (maybe 20-30 minutes if you follow the guides), but the long-term benefits in cost savings and creative freedom are, in my opinion, totally worth it.

The Magic Ingredients: Our Open-Source Toolkit

This whole system relies on a suite of powerful, free tools working together. Most of these run neatly inside Docker, which is like having mini virtual containers for apps, keeping things organized and easy to manage. Here’s the lineup:

Tool	Role	Why it’s Awesome
n8n	The Brain / Workflow Automation	Visually connects all the tools and APIs, orchestrates the entire process step-by-step. Free, self-hostable, incredibly powerful.
BaseRow	Database / Input Management	An open-source alternative to Airtable. Stores story ideas, parameters (like voice, image count), and final video links. No pesky limits!
MiniIO	Local S3 Storage	Provides Amazon S3-compatible object storage on your own machine. Stores generated audio and image files reliably.
Coqui TTS (via API)	Text-to-Speech (TTS)	Generates natural-sounding voiceovers locally. It’s lightweight (uses <3GB RAM in my experience), incredibly fast, and offers voice mixing. The specific Docker setup mentioned uses a fast API wrapper.
Stability Matrix / Stable Diffusion WebUI Forge	Image Generation	Stability Matrix makes installing different Stable Diffusion interfaces (like Forge) super easy. Forge provides a web UI and API for generating images locally using models like Juggernaut and style enhancers (Loras).
NCA Toolkit	Video Processing API	A fantastic open-source toolkit (shoutout to Steve!) that handles video tasks via API calls: converting images to video clips, concatenating clips, adding audio, getting media info (like duration). It’s the backbone of the video assembly.
OpenRouter (Free Tier)	AI Script & Prompt Generation	Provides access to various LLMs via a unified API. We use it with free models like Gemini Flash to generate the story script and image prompts based on our inputs.

The Core Open-Source Tools Powering the Automated Story Machine

A Note on Setup

Getting these running involves using Docker commands, which might seem intimidating if you’re new to it. However, the original video creator mentioned providing a document (link to be added when available) with step-by-step instructions. Most are simple one-line commands. The only non-Docker part is Stability Matrix, which is a straightforward installer.

Key Tip: When setting up MiniIO, remember to set your bucket policy to ‘public’ so the n8n workflow can access the files via their URLs. Don’t worry, this usually only makes them accessible within your local network unless you’ve specifically configured external access.

Another Key Tip: For the Stable Diffusion WebUI (Forge in this case), make sure to add the --api flag in the extra launch arguments within Stability Matrix. This enables the API that n8n needs to talk to.

The Automated Workflow: A Step-by-Step Look Inside n8n

Okay, so we have the tools. How does n8n actually make the magic happen? Here’s a simplified breakdown of the workflow I put together (you can grab the full workflow JSON from the original creator’s resources):

Trigger & Fetch Data: The workflow starts (e.g., manually or on a schedule). It fetches a pending story idea from BaseRow using its awesome filtering feature (find rows where ‘Status’ contains ‘Pending’). It limits processing to one story at a time.
Set Parameters: It grabs details like desired video length, TTS speed, and number of images from BaseRow, ensuring numerical values are correctly formatted for later calculations.
Generate Script: It sends the topic, audience, characters, etc., to OpenRouter (using the free Gemini Flash model) via an LLM node configured with a specific prompt to generate the story script.
Clean Script: A quick formatting step removes unwanted line breaks from the script to make it perfect for TTS.
Parse Script: Another LLM node ensures the script is consistently formatted as a JSON object containing the story text. This improves reliability.
Generate Audio (TTS): The cleaned script text is sent to the local Coqui TTS API. It uses the voice and speed settings from BaseRow and generates an MP3 audio file of the story narration.
Upload Audio: The generated MP3 is uploaded to our local MiniIO storage bucket using n8n’s S3 node. It gets a unique filename based on the BaseRow entry’s UUID.
Get Audio Duration: The NCA Toolkit API is called to analyze the uploaded audio file and return its exact duration in seconds. This is crucial for timing the image animations!
Generate Image Prompts: The story script goes back to OpenRouter (again, Gemini Flash) with instructions to generate a specific number of image prompts (matching the number set in BaseRow) based on the story beats. I also instructed it to add specific style keywords and Lora triggers (e.g., “art style painting, <lora_name>”) to maintain a consistent visual look.
Split Prompts & Generate IDs: The list of prompts is split into individual items. For each prompt, a unique ID (UUID) is generated using n8n’s Crypto node – this will be used for image filenames.
Prepare Image Data: For each prompt, it calculates the required duration for its video clip (Total Audio Duration / Number of Images). It also constructs the final URL where the image will be stored in MiniIO.
Generate Images (Loop): This is where the Stable Diffusion API comes in. The workflow loops through each prompt one by one (to avoid overloading the GPU/timing out).
- It sends the prompt, negative prompts, sampler settings, dimensions, etc., to the Stable Diffusion WebUI Forge API.
- The API returns the generated image as a Base64 encoded string.
- n8n converts this string into a binary image file (PNG).
- The image file is uploaded to MiniIO with its unique UUID filename.
Create Video Clips from Images: Once all images are generated and uploaded, n8n sends requests to the NCA Toolkit. For each image URL in MiniIO, it creates a short video clip (e.g., 22 seconds long, based on the calculated duration) with a subtle zoom effect.
Combine Video Clips: All the individual video clip URLs are gathered. Another call to the NCA Toolkit uses the ‘concatenate’ endpoint to stitch these clips together in order, creating one silent video sequence.
Add Audio to Video: The final step! A call to the NCA Toolkit’s ‘compose’ endpoint takes the combined silent video and the original TTS audio file from MiniIO and merges them into the final MP4 video.
Update BaseRow: The workflow updates the entry in BaseRow with the URL of the final video hosted on MiniIO and changes the status to ‘Done’.

And voilà! The entire process, from idea to finished video, happens automatically. In my testing, a ~2.5-minute video with 6 images took about 4-5 minutes to generate on my machine (Ryzen CPU, RTX 3060 GPU). Not bad at all for zero cost!

The End Result: What Does It Look Like?

So, after all that setup and workflow magic, what do you get? Here’s an example output description based on the video:

“Petal the rabbit, with her softest fur, took tiny hops into a field of flowers unlike any you’ve ever seen. Thistle The Gnome, with his pointy moss-covered hat, walked…”

The resulting video features:

AI-generated images in a consistent style (thanks to the model, Lora, and prompt engineering).
Subtle zoom animations on each image to give a sense of motion.
A clear, pleasant AI-generated voiceover narrating the story.
Everything perfectly timed, running for the duration of the narration.

Is it Pixar quality? Of course not. But is it a charming, fully-formed animated story created entirely automatically for free? Absolutely! When I first saw the complete output, I was genuinely impressed by how well these open-source components meshed together.

Why Open Source Rocks for This Kind of Project

This project really highlights the power of the open-source ecosystem:

Community & Innovation: Tools like Stable Diffusion, Coqui TTS, and NCA Toolkit are built and improved upon by passionate communities.
Interoperability: Thanks to APIs, these disparate tools can be connected like building blocks.
Control & Ownership: You run the software, you control the data, you own the process.
Cost Savings: Eliminating subscription fees democratizes access to powerful technology.

Yes, there’s a learning curve, but the ability to build sophisticated automation pipelines without breaking the bank is a game-changer for creators, developers, and hobbyists.

Ready to Build Your Own Story Machine?

Setting up this automated story creation system was a rewarding experience. It proved that with a bit of technical curiosity and the amazing resources available in the open-source world, you can build incredibly powerful tools yourself, bypassing expensive commercial alternatives.

While I’ve focused on children’s stories here, you could adapt this workflow for countless other types of automated video content – explainers, summaries, faceless YouTube content, social media snippets, you name it!

If you decide to give this a try, remember to check the original creator’s resources for the detailed setup guide and the n8n workflow file. Don’t be afraid to tinker and experiment!

What do you think? Does the idea of a free, automated story generator excite you? What kind of videos would you create with a system like this? Let me know your thoughts in the comments below!

And if you found this interesting, maybe share it with a friend who loves automation or AI!