Hosting a Bot for BlueSky on Google Cloud Run
Finding a place to host my personal projects is often more challenging than the project itself, mainly because I don’t want to spend money on it. So, I’m sharing the solution I found that works well for me.
But first, let me explain the problem I needed to solve: I have a bot that does the following:
- Reads a list of RSS feeds
- For each feed in the list, it reads the XML and extracts the items
- Compares the new items with a list of already published ones
- Publishes all the new items to BlueSky
- Updates the list of published items
So, I needed two things:
- A place to run the bot
- A place to save the configuration and the list of published items
The Plan
Cloud Run is a serverless solution. It has many features, but the ones that matter to me are:
- Easy to configure
- Runs a Docker image with minimal setup
- It has a free tier
- I can mount a Cloud Storage bucket as a volume in the Docker image
Just to be clear, I’m far from being an expert in infrastructure, cloud computing, or anything similar. Google Cloud’s greatest merit is that it was the first platform I managed to get working!
The Bot
To fit with what Cloud Run offers, here’s what I did:
- Set up a GitHub repository
- Run the bot in Docker
- The bot runs in hourly batches
- Configuration is a JSON file
- The list of posts from each feed is stored as a NDJSON file
The cost of Cloud Storage is much lower than setting up a dedicated database, and JSON files are more than enough for what the bot does. I did consider using SQLite, but that would be overkill, and I imagine running SQLite on a Cloud Storage bucket isn’t a great idea (though I still plan to try it in the future).
Docker
I’m using a multi-stage Dockerfile, one stage to build the application and the other to run the jar
file. The key here is defining a volume at /root/dev/json
, where the bot’s files will be stored.
My application is in Java, but the same principle applies to any other language.
FROM eclipse-temurin:21 AS build_image
ENV APP_HOME=/root/dev/
RUN mkdir -p $APP_HOME/src/main/java
WORKDIR $APP_HOME
COPY app/build.gradle settings.gradle gradlew gradlew.bat $APP_HOME
COPY gradle $APP_HOME/gradle
# download dependencies
RUN ./gradlew build -x test --continue
COPY . .
RUN ./gradlew build
FROM eclipse-temurin:21-jre
WORKDIR /root/
COPY --from=build_image /root/dev/app/build/libs/app.jar .
RUN mkdir -p /root/dev/json
# Set environment variables
ENV JSON_PATH=/root/dev/json
# Use this to access the JSON files
VOLUME /root/dev/json
CMD ["java","-jar","app.jar"]
The configuration file is source.json
and has the following format:
[
{
"feedId": "example_feed",
"name": "Example Feed",
"rssUrl": "https://www.youtube.com/feeds/videos.xml?playlist_id=PLAYLIST_ID",
"feedExtractor": "YOUTUBE",
"bskyIdentity": "example.bsky.app",
"bskyPassword": "example-app-password"
}
]
The published items list is named {feedId}.ndjson
and consists of JSON objects separated by new lines.
{"sourceId":"example_feed","title":"Some title 1","url":"https://www.youtube.com/watch?v=w5ebcowAJD8"}
{"sourceId":"example_feed","title":"Some title 2","url":"https://www.youtube.com/watch?v=UE-k4hYHIDE"}
Cloud Run
In Google Cloud, here’s what you need to create:
- A build in Cloud Build for your Docker image
- A bucket in Cloud Storage to save your files
- A new Job in Cloud Run
Build
The first step is to connect Cloud Build with your GitHub repository. Just click on Connect Repository
and follow the steps.
At the end, simply choose Dockerfile
as the build type and select the Dockerfile
you want to use.
Bucket
Probably the easiest part—Cloud Storage will already be enabled because of the previous step (that’s where your Docker images go).
For me, all I needed to do was:
- Create a new private bucket
- Upload my configuration file
source.json
Job
A Job in Cloud Run is a process that runs once and finishes. You create a new Job, select the Docker image you created, and configure the volume to mount the Cloud Storage bucket.
I chose the smallest VM size available in the cheapest region I could find, which means:
- 512MiB of RAM
- 1 vCPU
Trigger
With the Job created, you can run it manually, but to schedule it to run periodically, you’ll need to create a Trigger.
For my bot, I set it up to run hourly, from 9 AM to 6 PM, Monday to Saturday. You do this using the cron syntax:
0 9-18 * * 1-6
And that’s it! The Job will run at the scheduled intervals, and the files will be read and saved in the bucket.
Some Considerations, and Costs
It’s worth noting that this solution works for me because of the specific characteristics of the application I’m running:
- It’s not interactive; it can run in batches
- It accesses the disk very little—each feed updates roughly once a day with one or two new posts
- Everything I need in terms of logic is contained within a single Docker image
Given all of that, the estimated monthly cost of this setup is an extravagant €0.10
. So, if what you need fits within the same constraints I imposed on myself, I think it’s worth giving Google Cloud a try.
I use the bot to manage several accounts that repost videos from my favourite science channels on YouTube. You can check it out on this BlueSky list:
And here’s the source code on my GitHub: