roostorg · mackenz-r · May 7, 2026 · May 7, 2026 · May 7, 2026 · May 7, 2026
diff --git a/...uard/projects/tvec-policy/annex 1 - defining terrorism and violent extremism.md b/...uard/projects/tvec-policy/annex 1 - defining terrorism and violent extremism.md
@@ -0,0 +1,62 @@
+# Approaches to Defining Terrorism and Violent Extremism
+
+## Terrorism
+
+In the absence of an internationally agreed definition of terrorism and considering the low level of agreement on who is considered a terrorist actor, identification and classification of TVEC can be based on an ideology agnostic approach focusing on terrorism as a method.  
+
+Such an approach can build on legal definitions of crimes of terrorism, which focus on terrorism as a criminal act and commonly integrate the following elements[^1]:
+- Perpetration of a criminal act
+- Intent to threaten the population or to coerce a government or international governmental organisation.
+
+[^1]: The below elements are common to the legal definitions of terrorism in France (https://www.legifrance.gouv.fr/codes/id/LEGISCTA000006149845); Belgium (https://www.ejustice.just.fgov.be/cgi_loi/article.pl?language=fr&lg_txt=f&type=&sort=&numac_search=1867060850&cn_search=&caller=article&&view_numac=1867060850nx1867060850f#Art.137), and the European Union (https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=LEGISSUM%3A4322328). 
+
+Certain jurisdictions include a third element[^2]:
+- Aiming to advance a cause that is ideological, political, religious – with certain jurisdictions also including a cause that is racial.
+
+[^2]: This element is common to the legal definitions of terrorism in the United Kingdom (https://www.legislation.gov.uk/ukpga/2000/11/section/1), Canada (https://laws-lois.justice.gc.ca/PDF/C-46.pdf), and Australia (https://www.ag.gov.au/national-security/australias-counter-terrorism-laws). 
+
+The UN customary definition also integrates a transnational element, which can also be found in certain national definitions of terrorism including as a key factor for terrorist designation[^3]. However, this is jurisdiction dependent and not all legal approaches to defining terrorism or designating terrorist actors will include this element.  
+
+[^3]: See the UNODC's resources on defining terrorism: https://www.unodc.org/e4j/fr/terrorism/module-4/key-issues/defining-terrorism.html
+
+The Global Internet Forum to Counter Terrorism’s (GIFCT) map of Global Definitions of Terrorism offers an overview of key elements of legal definitions of terrorism globally, covering 64 definitions[^4].
+
+[^4]: See the GIFCT's map of Global Definitions of Terrorism: https://def-frameworks.gifct.org/global-definitions-of-terrorism/
+
+In addition to legal definitions of terrorist activities, certain countries and international organisations maintain terrorist designation lists[^5]. Such designation lists can serve as a basis to assess who is considered a terrorist actor in a certain jurisdiction. Though it should be noted that designation lists are a financial sanctions and intelligence sharing tool and were not designed for countering terrorist and violent extremist use of the internet. A number of criticisms (including from civil society organisations, counter-terrorism experts, and legal experts) have also been raised against terrorist designation lists over the years. Alternatively, certain jurisdictions can proscribe terrorist and violent extremist organisations on grounds related to terrorism[^6], violent extremism, threats to national security, or threats to the constitutional order.  Proscriptions on the ground of terrorism or violent extremism can also serve as an indicator of who is considered a terrorist or violent extremist actor in a jurisdiction. Depending on the jurisdiction, these proscriptions are designed as an administrative process prohibiting membership and support to proscribed organisations, as in France. In other jurisdictions it can also expand to include a ban on associated imagery, as in Germany[^7].
+
+[^5]: Example of jurisdictions maintaining designation lists at the national level include the US, UK, Australia, Canada. At the international level, the UN and EU also maintain national designation lists which their members abide with. 
+
+[^6]: This is the case in France for instance through the Interior Security Code (https://www.legifrance.gouv.fr/codes/section_lc/LEGITEXT000025503132/LEGISCTA000025505187/).
+
+[^7]: Germany Domestic Intelligence Services (2022), Right-wing extremism: symbols, signs and banned organisations (https://www.verfassungsschutz.de/SharedDocs/publikationen/EN/right-wing-extremism/2022-07-right-wing-extremism-symbols-and-organisations.pdf?__blob=publicationFile&v=12https://www.verfassungsschutz.de/SharedDocs/publikationen/EN/right-wing-extremism/2022-07-right-wing-extremism-symbols-and-organisations.pdf?__blob=publicationFile&v=12)
+
+Regarding the definition of terrorist content, few jurisdictions have one. The EU does provide such a definition in the Regulation on Addressing the Dissemination of Terrorist Content Online (2021/784, commonly referred to as EU-TCO). The EU-TCO definition of terrorist content applies to material that, in relation to the EU legal framework[^8]:
+- Incited the commission of a terrorist offence,
+- Solicit a person or a group of persons to commit or contribute to the commission of a terrorist offence
+- Solicit a person or a group of persons to participate in the activities of a terrorist group
+- Provide instruction on the making or uses of weapons or other specific methods or techniques for the purpose of committing or contributing to the commission of a terrorist offence
+- Constitutes a threat to commit a terrorist offence  
+
+The UK Online Safety Act also defines “terrorism content”, which is considered a “priority illegal content”, in reference to existing terrorism offences in UK laws[^9]. New Zealand also has a framework for classifying content as TVEC though its Classification Office[^10].
+
+[^8]: See the EU TCO definition: https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32021R0784
+[^9]: See Vaughan Katy (2025), The UK’s Online Safety Act and “Terrorist Content”, in Vox-Pol (https://voxpol.eu/the-uks-online-safety-act-and-terrorist-content/)
+[^10]: See the New Zealand Classification Office resources at: https://www.dia.govt.nz/Countering-Violent-Extremism-Legislation-and-legal-process
+
+## Violent Extremism
+
+Similar to terrorism, there is no internationally agreed definition of violent extremism, and most jurisdictions do not provide a legal definition of violent extremism or related crimes.  
+
+Different scholars have attempted to provide a definition of violent extremism [^11]. Common to different conceptualisations of violent extremism is the notion of political violence, which can include terrorism. Common to the definition proposed are the following elements:  
+- Promoting ideological, political, or religious aims
+- Advocating for or uses violence to realise those aims
+- Tolerating, supporting, actively calling for, or directly uses violence against civilians or critical civilian infrastructure.
+
+[^11]: Including: Bak, Tarp, and Liang, 2019; Berger, 2019; Lamphere-Englund & Thompson, 2024. 
+
+These elements are reflected in the definition used by UNESCO: “the beliefs and actions of people who support or use violence to achieve ideological, religious or political goals,” including “terrorism and other forms of politically motivated and sectarian violence”[^12].
+
+[^12]: UNESCO (2017), Preventing violent extremism through education: A guide for policymakers: https://unesdoc.unesco.org/ark:/48223/pf0000247764 
+
+Certain tech companies’ definitions of dangerous organisations, including violent extremist ones, further add the element of “non-state actors” to their approach to violent extremism.  
diff --git a/...eguard/projects/tvec-policy/annex 2 - terrorist and violent extremist actors.md b/...eguard/projects/tvec-policy/annex 2 - terrorist and violent extremist actors.md
@@ -0,0 +1,65 @@
+# Identifying Terrorist and Violent Extremist Actors
+
+The Christchurch Call Foundation encourages an approach to identifying terrorism and violent extremism that combines actor, content, and behaviour signals. Actor-level signals alone are often insufficient because of the constantly evolving threat landscape, the challenges of identifying emerging terrorist and violent extremist groups, and the possible politicization of designations.
+
+However, we recognize that teams with limited expertise in terrorism and violent extremism may find it helpful to refer to existing lists of terrorist and violent extremist actors when implementing this policy. In policy testing, we identified that the model will occasionally return a false negative due to a failure to recognize certain emerging terrorist and violent extremist groups. To solve this challenge, we recommend that teams implementing open-source reasoning models like gpt-oss-safeguard work directly with expert researchers and practitioners to support more accurate assessments of actor, content, and behaviour signals for this sensitive harm area.
+
+In addition to recommending dedicated engagement with subject matter experts to augment model capabilities, we offer the following resources to help teams navigate the landscape of terrorism and violent extremism.
+
+## Lists of Designated or Proscribed Terrorist Individuals and Organizations
+
+Designated and proscribed status differs across countries and there is no universally accepted list of terrorist and violent extremist organizations. As a starting point, teams looking for reference lists of terrorist and violent extremist organizations could refer to the following designations made by the United Nations, the European Union, and Five Eyes countries:
+
+- [UN Security Council: Consolidated List](https://main.un.org/securitycouncil/en/content/un-sc-consolidated-list)
+- [European Union: Terrorist List](http://data.europa.eu/eli/dec/2026/455/oj)
+- [Public Safety Canada: Currently Listed Entities](https://www.publicsafety.gc.ca/cnt/ntnl-scrt/cntr-trrrsm/lstd-ntts/crrnt-lstd-ntts-en.aspx)
+- [United Kingdom Home Office: List of Proscribed Terrorist Groups or Organizations](https://www.gov.uk/government/publications/proscribed-terror-groups-or-organisations--2/proscribed-terrorist-groups-or-organisations-accessible-version)
+- [Australian National Security: Listed Terrorist Organizations](https://www.nationalsecurity.gov.au/what-australia-is-doing/terrorist-organisations/listed-terrorist-organisations)
+- [New Zealand Police: Designated Terrorist Entities](https://www.police.govt.nz/advice/personal-community/counterterrorism/designated-entities)
+- [United States Department of States: Foreign Terrorist Organizations](https://www.state.gov/foreign-terrorist-organizations)
+
+Importantly, these lists should not be used as an exclusive indication of terrorism or violent extremism, but rather as one signal interpreted among many. Terrorist and violent extremist activity is not always organized under the umbrella of an official organization. Moreover, many of these lists do not capture increasingly hybridized threats, particularly those related to nihilistic violent extremism (NVE). We strongly encourage teams using this policy to use these resources only as additional context, rather than a single source of truth. 
+
+## Specific Incidents and Perpetrators
+
+One of the important limitations of using the designation lists above is that they do not capture individuals who have acted alone to commit terrorist or violent extremist attacks. Increasingly, we observe that violent extremist communities online glorify individual perpetrators including by sharing their manifestos, discussing their tactics, and copying their aesthetics. Maintaining a singular list of relevant perpetrators is challenging because of inconsistency in the application of terrorism or violent extremism charges across jurisdications and uncertainty about the motivations of lone actors.
+
+The Christchurch Call Foundation maintains a list of perpetrators and incidents where: 
+- we engaged our Crisis Response Protocol and/or
+- the attack involved references to the Christchurch terrorist.
+
+The names of perpetrators associated with incidents are provided below for reference when implementing and interpreting this policy.  Importantly, these lists only consider recent incidents (since the founding of the Christchurch Call in 2019) and are not a comprehensive list of terrorist and violent extremist incidents. In addition, we note that not all perpetrators on this list have been subject to terrorism or violent extremism charges or investigations. We offer this list as context for understanding perpetrators and incidents for which there may be serious risk of glorification online.
+
+| Date | Location of Incident | Incident | Perpetrator |
+|------|---------------------|----------|-------------|
+| 15/03/2019 | New Zealand | Christchurch terrorist attacks | Brenton Tarrant |
+| 27/04/2019 | USA | Chabad of Poway Synagogue attack | John Timothy Earnest |
+| 3/08/2019 | USA | Walmart shooting in El Paso, Texas | Patrick Crusius |
+| 10/08/2019 | Norway | Al-Noor Islamic Centre attack in Baerum | Philip Manshaus |
+| 9/10/2019 | Germany | Halle Synagogue attack | Stephan Balliet |
+| 9/02/2020 | Germany | Shisha bar shootings in Hanau | Tobias R. |
+| 20/05/2020 | USA | Mall shooting in Glendale, Arizona | Armando Hernandez |
+| 16/10/2020 | France | Murder of a French school teacher | Abdoullakh Anzorov |
+| 6/06/2021 | Canada | Vehicle ramming attack in London, Ontario | Nathan Veltman |
+| 19/08/2021 | Sweden | School stabbing in Eslöv | Hugo Jackson |
+| 14/05/2022 | USA | Supermarket shooting in Buffalo, New York | Payton Gendron |
+| 12/10/2022 | Slovakia | Shooting at Tepláreň Bar in Bratislava | Juraj Krajčík |
+| 6/05/2023 | USA | Mall shooting in Allen, Texas | Mauricio Garcia |
+| 26/08/2023 | USA | Dollar General Store shooting in Jacksonville, Florida | Ryan Palmeter |
+| 13/10/2023 | France | School stabbing attack in Arras | Mohammed Mogouchkov |
+| 1/01/2024 | USA | Vehicle ramming attack in New Orleans, Louisiana | Shamsud Din Bahar Jabbar |
+| 12/08/2024 | Turkey | Tea stall knife attack near mosque | Arda Küçükyetim |
+| 16/12/2024 | USA | Abundant Life Christian School shooting | Natalie Lyn (Samantha) Rupnow |
+| 20/12/2024 | Germany | Vehicle ramming attack at a Christmas market in Magdeburg | Taweb Jawab al-Abdelmosen |
+| 23/01/2025 | USA | Antioch High School shooting | Solomon Henderson |
+| 24/04/2025 | France | School stabbing in Nantes | Justin Polat |
+| 17/05/2025 | USA | Car bomb outside fertility clinic in Palm Springs, California | Guy Edward Bartkus |
+| 21/05/2025 | USA | Shooting outside Capital Jewish Museum in Washington, DC | Elias Rodriguez |
+| 27/08/2025 | USA | Annunicion Catholic School shooting | Robin (nee Robert) Westman |
+| 10/09/2025 | USA | Evergreen High School shooting | Desmond Holly |
+| 2/10/2025 | United Kingdom | Heaton Park Synagogue attack in Manchester | Jihad Al-Shamie |
+| 14/12/2025 | Australia | Bondi Beach attack | Saijd Akram and Naveed Akram |
+| 16/12/2025 | Russia | Stabbing attack at Uspenskaya school | Timofey K. |
+| 10/02/2026 | Canada | Tumbler Ridge Secondary School shooting | Jesse Van Rootselaar |
+| 24/03/2026 | United Kingdom | Arson attack in Golders Green | Harak Ashab al-Yamin al-Islamia |
+| 25/03/2026 | Italy | Stabbing attack at school in Trescore Balneario | Abanoud Youssef |
diff --git a/gpt-oss-safeguard/projects/tvec-policy/evaluation report.md b/gpt-oss-safeguard/projects/tvec-policy/evaluation report.md
@@ -0,0 +1,74 @@
+# Policy Evaluation Results: gpt-oss-safeguard
+
+**Last updated: 17 April 2026 · n=170**
+
+---
+
+## 1. Overall Metrics
+
+| Metric | Value | Target |
+|---|---|---|
+| Precision | **100.0%** | ≥90% | 
+| Recall | **90.5%** | ≥90% | 
+| F1 Score | **95.0%** | ≥90% | 
+| False Positive Rate | **0.0%** | ≤5% | 
+| False Negative Rate | **9.5%** | ≤10% | 
+| Specificity | **100.0%** | ≥90% | 
+| Cohen's Kappa | **0.831** | ≥0.75 | 
+| MCC | **0.843** | ≥0.80 | 
+| TVEC2 Recall *(Credible threat to life)* | **100.0%** | ≥95% |
+
+---
+
+## 2. Confusion Matrix
+
+|  | **Model: Invalid (remove)** | **Model: Valid (permit)** |
+|---|---|---|
+| **Actual: Invalid** | TP = 114 | FN = 12 |
+| **Actual: Valid** | FP = 0 | TN = 44 |
+
+`TP=114 · TN=44 · FP=0 · FN=12 · Total=170`
+
+---
+
+## 3. Per-Category Performance
+
+### Recall (and correct permit rate for valid categories)
+
+| Category | Metric | Target | Value | TP or TN / Total | Status |
+|---|---|---|---|---|---|
+| TVEC1 | Recall | ≥85% | **100%** | 15 / 15 | ✅ Pass |
+| TVEC2 | Recall | ≥95% | **100%** | 17 / 17 | ✅ Pass |
+| TVEC3 | Recall | ≥85% | **92%** | 12 / 13 | ✅ Pass |
+| TVEC4 | Recall | ≥85% | **85%** | 11 / 13 | ✅ Pass |
+| TVEC5 | Recall | ≥95% | **100%** | 14 / 14 | ✅ Pass |
+| TVEC6 | Recall | ≥85% | **71%** | 12 / 17 | ❌ Fail |
+| TVEC7 | Recall | ≥85% | **93%** | 13 / 14 | ✅ Pass |
+| TVEC8 | Recall | ≥85% | **87%** | 20 / 23 | ✅ Pass |
+| TVEC0 | Correct permit rate | ≥90% | **100%** | 30/30 | ✅ Pass |
+| TVEC.01 | Correct permit rate | ≥90% | **100%** | 14 / 14 | ✅ Pass |
+
+Note that recall targets vary across content categories. The content categories with the greatest potential harm (TVEC2 - Credible threat to life and TVEC5 - Instructional material to commit a terrorist attack) are held to a higher performance standard (recall ≥95%). Permissible content categories (TVEC0 - Non-TVEC and TVEC.01 - EDSA & Newsworthiness Exceptions) are also held to a higher performance standard (recall ≥90%), given the impact of over-removal on freedom of expression. All other categories are evaluated based on a target of ≥85% recall. 
+
+---
+
+## 4. Failure Cluster Analysis
+
+| Failure Cluster | Categories Impacted | Count | Interpretation |
+|---|---|---|---|
+| Entity recognition gap | TVEC4, TVEC6 | 5 | Model fails when content references lesser known terrorist and violent extremist organisations. Addressing this gap requires fine-tuning or a supplementary reference list (like that provided in Annex 2. |
+| Threshold not met | TVEC6, TVEC7, TVEC8 | 4 | Model recognises potential harm but concludes it does not meet the threshold for terrorism or violent extremism; whereas experts on our team disagreed based on our interpretation and application of the policy. This misalignment affects content in the test dataset related to violent accelerationism and violence targeting women and immigrants. Future policy refinement could seek to close this gap. |
+| Framing confusion | TVEC3, TVEC4 | 2 | Operational content framed as advice or material support framed as community events or political activities were not flagged. In these examples, it appears the model was influenced by surface framing and overlooked details that provide important context. |
+| Public figures | TVEC8 | 1 | In one case, the model did not recognize a threat against a public political figure as incitement to terrorism or violent extremism. This could be improved through future policy refinement. |
+
+---
+
+## 5. Key Takeaways
+
+1. **Strong Performance on Permissible Speech:** In this limited test, the model produced no false positives, correctly identifying all permitted speech as non-violative. This is an encouraging result, as avoiding false positives was a core design priority — over-restriction risks chilling legitimate discourse and free expression.
+
+2. **Improving Entity Recognition:** Model performance could be improved if the policy is used alongside a reference list of terrorist and violent extremist actors (groups, individual perpetrators, and communities). Annex 2 provides a starting point for developing these lists, however we strongly recommend that users implementing this policy work directly with subject matter experts in preventing and countering terrorism and violent extremism to develop more fullsome internal resources based on latest available evidence.
+
+4. **EDSA Precision:** Finally, we note that the models struggels to apply TVEC.01 (the EDSA exceptions) accurately, with a precision rate of just 72.2%. Importantly, none of the false negatives in the dataset were permitted under the guise of an EDSA exception. This means that the lack of precision in this category is not permitting violative content - rather, it points to the difficulty of distinguishing between TVEC0 (non-TVEC content) and TVEC.01 (content which would otherwise be TVEC but is permitted due to its educational, documentary, scientific, or artistic purpose). This could be improved through future policy refinement.
+
+---