This is a repository for the "Topic Modelling with Bert" course developed by Joy Lan and Aybuke Atalay, building on its earlier versions created by Xan Cochran and Pedro Jacobetty. Within this repository you are going to find all the material needed to attend this two-classes course once it will be ready. The notebook (for which we suggest you to run via Edina Noteable) and slides for the class can be downloaded from this repository.
Session 1: Thursday 2nd April 2026
- Session 1 will introduce the core concepts behind topic modelling, focusing on how text can be represented and analysed using BERT-based approaches. We will begin with the theoretical foundations and then move to a hands-on notebook, where we will run a BERTopic model on a sample dataset. The session will conclude with a brief look at the generated topics to give a first sense of the results.
Session 2: Friday 9th April 2026
- Session 2 will focus on interpreting and working with the outputs from BERTopic. We will explore how to examine topics in more detail, use visualisations, and discuss practical ways to improve topic quality. The session will also introduce more advanced extensions, including how embedding models can be fine-tuned to better suit specific datasets.
For this course, the instructor is going to use Python via Edina Noteable. If you are going to use Google Colab you do not need to install anything but you will need to set up on it up (see below). You can also set up locally on your computer via Anaconda or using Google Colab Please notice that if you decide to use Noteable or run the notebook on your own machine you may encounter some problems in installing the libraries and you may not have enough ram to run it.
Open Google Colab: https://colab.research.google.com If you are not already logged you will be prompted to log-in via Gmail
- Go to the GitHub header and copy and paste the link to this repo and select the notebook you want to use and press enter
The Notebook contains paragraphs of explanatory text interspersed with grey cells containing code blocks. To run a code block and see the result:
- Place your cursor within the cell
- Click the 'Run' button on the top menu
- The results of running this code will appear below
- If the results don't appear immediately, check the icon in the browser tab. AN egg-timer icon indicates it is processing the code.
- It is best to follow the Notebook from top to bottom as some code blocks will depend on results from previous cells
- You can edit code blocks yourself and run them to see the results of your changes
All material collected here is free to use but is covered by a License: 
To clear the results and run the code again you can use the 'Cell' menu on the top menu bar
- To clear the results of the current cell: Cell > Current Outputs > Clear
- To clear the results of all cells: Cell > All Output > Clear
!pip install bertopic !pip install keyphrase-vectorizers !pip install nbformat
Warning: unlike Colab, this cannot run training on a GPU, and as such will be very, very slow
- Go to https://noteable.edina.ac.uk/login
- Login with your EASE credentials
- Select 'Standard Notebook (Python3)' as a personal notebook server and press start
- Click the 'Git' menu, and 'Clone a Repository'
- Copy and Paste this repository URL https://github.com/DCS-training/TopicModellingBert as the Repository URL - you do not need to add in any other fields.
- Decide where to locate the folder. By default, it will locate it in your home directory
- Press 'Clone' Congratulations you have now pulled the content of the repository on your Notable server space.
Python is great for general-purpose programming and is a popular language for scientific computing as well. Installing all of the packages required for this lessons individually can be a bit difficult, however, so we recommend the all-in-one installer Anaconda.
Regardless of how you choose to install it, please make sure you install Python version 3.x (preferably Python 3.11 or higher).
Windows - Video tutorial
-
Open anaconda.com/download with your web browser.
-
Download the Python 3 installer for Windows.
-
Double-click the executable and install Python 3 using MOST of the default settings. The only exception is to check the Make Anaconda the default Python option.
macOS - Video tutorial
-
Open anaconda.com/download with your web browser.
-
Download the Python 3 installer for macOS.
-
Install Python 3 using all of the defaults for installation.
To start Jupyter Notebook Open the Anaconda Navigator and Launch Jupyter Notebook.
If you wish, you can create a Python virtual environment and install all dependencies for the course by navigating to the course folder in the terminal and running sh setup.sh.
- Download the notebook on your machine
- Go to Upload
- Navigate to where you have downloaded your file
- Select Upload again
- Double-click on the uploaded file
!pip install bertopic !pip install keyphrase-vectorizers