Automaton is a scalable and fully automated system that can scrape multiple websites either once or repeatedly according to specified schedules.

-
Technologies Used
Below is a list of programming languages, frameworks, and libraries used.-
Backend
- Socket.io
- Bullmq
- PostgreSQL
- MongoDB
- Playwright
- Docker
- Redis
- Prisma
- OpenAI
-
Frontend
- NextJS 19
- Tailwind
- Rechart
- Shadcn/ui
-
-
Features
- You can add a website and set the scraping priority (High, Medium, Low, None). The scraping frequency can be defined in the frontend with a dropdown menu, which corresponds to predefined Cron jobs.
- The results are immediately available after scraping and are plotted in a bar chart. Price information and labels are also displayed.
- The system has an AI Mode and a Test Mode. Switching to AI Mode enables scraping with ChatGPT, while Test Mode generates sample data.
- If recurring scrapes are set, scraping can be performed concurrently; i.e., multiple websites can be added and scraped at the selected intervals.
- The overview provides a simple status indicator showing when the last scrape took place. Each status is displayed in different colors and updates in real time if there is a change.
- Scrape tasks can be selected and freely deleted as desired.
List any prerequisites needed to run your project:
The Backend runs in multiple Docker containers. You need Docker to setup the environment.
- Docker for creating the containers Docker
- Docker Compose running the multiple containers Docker Compose
- ChatGPT Assistant for AI mode (you'll need access to Openai API and create an Assistant with custom Prompts), otherwise the application will only run in Demo Mode with the AI switch turned off (by default).
The Backend application containers expose the ports:
- http://localhost:4000 (Scraper)
- http://localhost:4444 (Scheduler)
- http://localhost:5000 (Management)
- http://localhost:27017 (MongoDB)
- http://localhost:5432 (PostgreSQL)
- http://localhost:6379 (Redis)
You need to add a global .env in automaton/ folder and additionally two .env files in each src/ folder for each service (scraper, management and scheduler). Docker Compose uses the .env.*** environment variables file to read the variables.
-
Automaton repository .env Create a
.envFile these are used for setting up the databases postgres and mongodbMONGO_DB="mongodb" MONGODB_DB="" MONGODB_USER="" MONGODB_PASS="" MONGO_INITDB_ROOT_USERNAME="" MONGO_INITDB_ROOT_PASSWORD="" POSTGRES_PASSWORD="" POSTGRES_USER="" POSTGRES_DB="" POSTGRES_HOST="" -
Scraper: Create a
.envFilePORT="" REDIS_HOST="127.0.0.1" OPENAI_API_KEY="" OPENAI_ASSISTANT_ID="" OPENAI_THREAD_ID="" MONGO_DB="127.0.0.1" MONGODB_DB="" MONGODB_USER="" MONGODB_PASS="" MANAGEMENT_HOST="127.0.0.1" SCHEDULER_API_URL="http://localhost:4444"
Create a
.env.scraperFilePORT=4000 REDIS_HOST="redis" OPENAI_ASSISTANT_ID="" SCHEDULER_API_URL="http://scheduler:4444" MONGO_DB="mongodb" MONGODB_DB="" MONGODB_USER="" MONGODB_PASS="" MANAGEMENT_HOST="management"
-
Scheduler: Create a
.envFilePORT=4444 REDIS_HOST="127.0.0.1"
Create a
.env.schedulerFilePORT=4444 REDIS_HOST="redis"
-
Management: Create a
.envFileDATABASE_URL="" PORT=5000 SCHEDULER_API_URL="http://localhost:4444" MONGO_DB="127.0.0.1" MONGODB_DB="" MONGODB_USER="" MONGODB_PASS=""
Create a
.env.managementFileDATABASE_URL="" PORT=5000 SCHEDULER_API_URL="http://scheduler:4444" MONGO_DB="mongodb" MONGODB_DB="cloudgpu" MONGODB_USER="" MONGODB_PASS=""
The Frontend
- The Frontend application doesn't run in a container. You have to install the dependencies manually. See under Steps.
The Frontend application exposes the port:
- http://localhost:3000 (frontend)
Step-by-step instructions to install the project:
# Clone the repository
git clone https://github.com/toldpixel/automaton.git
# Navigate to the project directory
cd automaton
# Build the backend services
docker compose build
# Start the backend services
docker compose up
# Switch to the frontend folder
cd frontend
# Install the frontend dependencies
npm install # or your package manager
# Start the frontend
npm run devAfter a few seconds, you should see the "Scraper ready" signal in green. This means that a connection between your frontend and the scraper service has been established, and your scraper is connected to Redis, the scheduler, and other dependent services.
