The Nonproliferation Cheminformatics Compliance Tool (NCCT) Proof-of-Concept

A complete proof of concept for a low-cost and accessible tool for front-line customs officers to identify controlled chemicals

By  Joyce M. Abides  • Stefano Costanzi  • Greg Koblentz  • Christina McAllister  • Gabriel Savagner

Frontline officers for border security and trade controls must quickly determine whether chemicals declared for export can be utilized as a chemical warfare agent or precursor. A proposed web-based Nonproliferation Compliance Cheminformatics Tool (NCCT) would digitize and automate this complex and time-consuming task and address vulnerabilities to the CW nonproliferation regime caused by manual cross referencing of chemical export declarations with lists of chemicals of proliferation concern.

Chemical Weapons and the Non-Proliferation Regime

Chemical weapons (CW) remain an enduring challenge to international peace and security. International frameworks, such as the Chemical Weapons Convention (CWC), the Australia Group, and the Wassenaar Arrangement have been put in place to prevent the proliferation of CW and their precursors. These frameworks contain lists of controlled chemicals (CW-control lists) including dual-use chemicals that have both beneficial and potentially CW-related applications. In theory, all a frontline customs officer needs to do to determine whether a shipment contains controlled chemicals is to check the export declaration’s list of chemicals against these control lists. However, this task is not so simple, for several reasons.

Enforcement Challenges

Synonyms: The same chemical can have several different synonyms (Figure 1), and a frontline officer might be unable to match one name used on the declaration to a different name used in a control list.

Figure 1. Pinacolyl alcohol can be identified with different synonyms, 11 of which are shown in the figure as an example (Source: SciFinder). Note: Not all synonyms of Pinacolyl alcohol are displayed.

Chemical variants: Chemicals have numerous variants (different salts, tautomers, stereoisomers, isotopes, etc.) that reflect slight molecular differences (Figure 2). Because each variant has a different name and a different registry number,1The Chemical Abstract Service (CAS, <https://www.cas.org>) Registry of the American Chemical Society is a collection of chemical substances that are assigned a unique numeric identifier, the CAS Registry Number. it is difficult for a frontline officer to tell that a registry number on a declaration form represents a variant of the chemical with a different registry number on the control list.

Figure 2. Examples of Pinacolyl alcohol variants (deprotonated2Deprotonated molecules are molecules in which a proton (i.e. hydrogen) is removed., stereoisomer3Stereoisomers are molecules that have the same structural formula but differ in 3D orientation of their atoms., and isotopically4Isotopically labeled molecules are molecules in which one or more of the constituent atoms is substituted with a less common isotope of the same atom (isotopes are atoms with the same number of protons but different number of neutrons). labeled) with different registry numbers (Source: SciFinder). Note: Not all variants of Pinacolyl alcohol are displayed.

Multiple control lists: The international chemical nonproliferation regime consists of multiple control lists that overlap in some respects and differ significantly in others. While some countries develop a single integrated national control list, others do not, complicating the task of checking if a chemical is controlled.

Families of chemicals: Some CW-control lists cover whole families of chemicals, defined by a common scaffold with variable chemical groups attached (Figure 3). This approach is very comprehensive and covers a potentially infinite number of highly toxic chemicals that may be synthesized in the future. However, it is time-consuming for a trained chemist to determine whether a particular chemical listed for export is part of the controlled family, and it is extremely difficult for a frontline officer without chemistry expertise to do so.

Figure 3. A snippet of CWC Schedule 1 from the OPCW website (https://www.opcw.org/chemical-weapons-convention/annexes/annex-chemicals/schedule-1). The first three entries (red boxes) describe three families of chemicals. The fourth entry is an individually listed chemical (green box).

Extraneous information: Item descriptions in declarations and other customs documentation may include significant additional information in addition to the chemical name and/or registry number, the significance of which may not always be clear to a non-specialist.  The frontline officer may not be able to determine the relevant chemical information in these cases, and search for the wrong information or just one of several relevant pieces of information.

Time and Cost: Given the high volume of chemical exports they are responsible for reviewing, frontline officers are under tremendous time pressure to review declarations and determine whether additional scrutiny is warranted. Given the cost to the industry of delaying shipments, frontline officers are also under pressure to hold shipments only if they have very good reason to do so.

To address these issues, the project team has proposed the development and wide adoption of a cheminformatics tool. The tool would automate the task of determining if a chemical is part of a CW-control list.

Why Cheminformatics?

A web-based cheminformatics tool, comprised of a user-friendly interface and database of chemical control lists enhanced with a comprehensive collection of synonyms, chemical variants, and chemical structures, can automate the task of assessing whether a declared chemical is covered by a CW-control list. Integrating capabilities to extract key chemical identifying information from export declaration entries, automation can help frontline officers overcome trade control enforcement challenges by:

  • Quickly and accurately identifying whether a chemical in a shipment falls under the control of various chemical weapons control lists.
  • Reducing the time pressure of processing thousands of chemical export declarations daily.
  • Relieving frontline officers without extensive training in chemistry of the task of manually identifying and extracting relevant information from declaration forms.
  • Incorporating other lists of controlled chemicals, beyond CW agents and precursors (e.g. explosives, narcotics, etc.) to provide frontline officers with a single tool that can handle all lists of controlled chemicals. This will allow the tool to have broader applicability.
  • Enhancing export control enforcement while facilitating legitimate trade and lawful civilian activities.

The NCCT Proof-of-Concept

To show how CW-control lists can be handled with cheminformatics, we developed the Nonproliferation Cheminformatics Compliance Tool proof-of-concept. The NCCT proof-of-concept consists of a database of chemical structures for all individual chemicals and families of chemicals covered by 8 key CW control lists (Figure 4). The proof of concept’s database is implemented and runs through a commercial, desktop-based cheminformatics software (ChemAxon’s Instant JChem).

CW-Control Lists
1234
CWC Schedules (Schedule 1, Schedule 2, and Schedule 3)Australia Group Chemical Weapons Precursors listWassenaar Arrangement Munitions List 7 (ML7)European Union Council Regulation 36/2012 (Syria-related list)
5678
World Customs Organization (WCO) Strategic Trade Control Enforcement Implementation Guide (STCE)European Union Council Regulation 2022/879 (Restrictive measure for Russia)United Kingdom Statutory Instruments 2022 No. 689 (Sanctions against Russia)United States Dept. of Commerce Bureau of Industry and Security (Sanctions against Russia and Belarus)

Figure 4. The CW-control lists added to the NCCT database at this time.

Testing conducted in June, October, and November, 2022, demonstrated that under the right circumstances, the proof-of-concept can quickly convert an entered chemical name or registry number into a chemical structure and determine whether that structure matches any entry in the database (Figure 5).

Figure 5. A chemical can be entered into Instant JChem (IJC) in a variety of ways (i.e. CAS RN®, chemical name, or structural identifier). The input is converted to a 2D chemical structure which is standardized and checked against the database for structures that match. Alternatively, a chemical structure can be sketched through the IJC interface. The figure is from https://doi.org/10.1515/pac-2021-1107.

Limitations of the NCCT Proof-of-Concept

Testing also demonstrated that the NCCT is not a tool that can be directly employed by frontline officers. The following are the limitations of the proof of concept:

  • Unable to automatically identify and extract relevant information from declaration forms.
  • Does not support automatic batch processing for multiple queries.
  • Lacks a robust engine for the conversion of names and registry numbers to chemical structures. During testing, users relied on external databases (e.g. PubChem, ChemSpider, SciFinder, etc.) to retrieve additional information to run a successful search in the tool.
  • The interface is not user-friendly.
  • Not a web-based application; requires installing the Instant JChem Software and downloading a database to a single defined user terminal.

Future Development

To achieve full envisioned functionality, the project team proposes to develop a field-deployable cheminformatics tool. The proposed cheminformatics tool is intended to be low-cost, accessible, and web-based. The tool will be capable of extracting relevant chemical identifying information from export declarations and will include customizable control lists enhanced with synonyms, chemical variants, and chemical structures along with automatic batch processing and improved language accessibility.

Stakeholders and Potential Users

Through this work, we intend to facilitate the missions of a wide range of relevant stakeholders. Feedback on the NCCT proof-of-concept gathered from key stakeholders has helped identify how a cheminformatics tool could be useful for:

  • Frontline officers working in the areas of border security, customs, homeland security, and export controls
  • Employees of chemical manufacturing, shipping, or logistics companies
  • Chemical security
  • Chemical terrorism response
  • First responders
  • Law enforcement
  • Policy change enabler (more comprehensive control through chemical families)

With the addition of control lists from other areas, the proposed tool will have broader applicability in pharmaceutical, narcotics, and explosives fields.

Project Team

This project is led by the Henry L. Stimson Center’s Partnerships in Proliferation Prevention Program (PPP). The Costanzi Research Group at the American University’s (AU) Department of Chemistry was involved in Phases 1 and 2 of the project and developed the proof-of-concept NCCT. We gratefully acknowledge financial support from Global Affairs Canada’s Threat Reduction Program.

Frequently Asked Questions (FAQs)

  • Why can’t customs officers just Google the Chemical Weapons Convention (CWC) schedules or other control lists to check whether a chemical is listed there?

Comparing chemicals listed on export declaration forms to CW-control lists (e.g. CWC Schedules) is not as simple as just looking for a match (refer to Enforcement Challenges under “Chemical Weapons and the Non-Proliferation Regime”). Synonyms, chemical variants, and chemical families are among the factors complicating efforts to enforce export control regimes by manually cross checking declared chemicals against control lists.

  • Are you aware that the OPCW already has a chemical database available for frontline officer use?

Yes, we are aware. The OPCW Handbook on Chemicals and the Scheduled Chemicals Database is a cheminformatics tool based on the individual enumeration of known members of families of chemicals rather than the families themselves. It is a useful tool and we are considering incorporating this database into our proposed cheminformatics tool. In addition, our proposed tool will have a very accurate and robust engine for converting names or CAS RNs® into chemical structures and will automatically check whether a chemical matches any entry in the database, including entries that define families of chemicals. This will provide the added advantage of covering chemicals synthesized in the future and named in IUPAC form that may fall within the definition of a listed family of chemicals.

  • Will the tool work in foreign languages?

Our proposed web-based cheminformatics tool will accept chemical names in foreign languages. The tool’s interface will also be available in foreign languages.

  • Will I need to download the tool, or can I use an online version?

Our proposed cheminformatics tool will be a web-based application. 

  • How can I get involved in the development or field testing? Who do I contact?

[email protected]

Notes

  • 1
    The Chemical Abstract Service (CAS, <https://www.cas.org>) Registry of the American Chemical Society is a collection of chemical substances that are assigned a unique numeric identifier, the CAS Registry Number.
  • 2
    Deprotonated molecules are molecules in which a proton (i.e. hydrogen) is removed.
  • 3
    Stereoisomers are molecules that have the same structural formula but differ in 3D orientation of their atoms.
  • 4
    Isotopically labeled molecules are molecules in which one or more of the constituent atoms is substituted with a less common isotope of the same atom (isotopes are atoms with the same number of protons but different number of neutrons).

Recent & Related

Video
William Marshall • Christina McAllister

Subscription Options

* indicates required

Research Areas

Pivotal Places

Publications & Project Lists

38 North: News and Analysis on North Korea