top of page

Building Consistency In Cybersecurity Investigations

Updated: Oct 17, 2024

Using Elemendar’s READ application to convert STIX outputs into an Advanced Information Model using a Graphdb.

In a recent DASA project Elemendar worked with its partners at Loughborough University to gain a better understanding of how to model and analyse Cyber Threat Intelligence (CTI) data. In an ‘industry-first' experiment, STIX outputs from Elemendar’s READ application were modelled more comprehensively than conventional techniques through the use of an Advanced Information Model (AIM) - built using the UK Government’s ‘Magmacore’ library.


The end result of the research was a combined data model, queryable across all these sources of information, which offers a potential expansion of opportunities for analysis of relevant intelligence sources and a powerful new capability for threat identification and understanding. Such a technique represents a powerful method for a greater consistency of awareness across different sources of intelligence. A powerful aid to decision makers and risk analysis.



ai threat intelligence cybersecurity capturing dataimage


About READ and STIX


As an application, Elemendar’s READ applies state-of-the-art AI to transform unstructured CTI from diverse sources—including URLs, PDFs, and free-form text—into structured, actionable data.  In doing so READ enables; comprehensive source integration (for a better understanding of potential risks), enhanced data structuring (improving how unstructured data is transformed into actionable intelligence) and advanced ML analysis (bespoke Elemendar technology that pinpoints threat patterns and relationships with a high degree of precision).  For specific information on how to access READ through G-cloud, please visit here.  


A further specific benefit of READ is that it automates the formation of Structured Threat Information eXpression (STIX) data from unstructured CTI data sources. STIX is an OASIS-supported, standardised language that is integral to the CTI ecosystem, facilitating information sharing in the cybersecurity community.


STIX is a framework for expressing and exchanging cyber threat information, using JSON (JavaScript Object Notation) as a serialisation format. The language represents specific aspects of CTI through various ‘objects’ such as indicators, threat actors and campaigns - each with attributes, properties and relationships. It also includes a taxonomy and pattern language to ensure consistency and express complex conditions related to CTI. Converting relevant intelligence into STIX outputs.


As part of this research project - the Elemendar CTI team took unstructured CTI reports and blog posts, applied READ to analyse them, and achieved structured, machine-readable STIX outputs. Specifically, READ assessed STIX Objects (things relating to indicators, threat actors and campaigns), Properties (qualities relating to or defining the objects) and Relationships (how objects relate and connect).  READ enabled the automated recognition and tagging of the relevant STIX objects and relationships together with production of the following mapping to summarise them visually.

As illustrated in the figure below:


After producing the STIX entities, they were then encapsulated in a specific STIX bundle - a single JSON container that combines both the source information and the relevant marking definition

An example of Elemendars READ application being used to analyse a CTI source. After producing the STIX entities, they were then encapsulated in a specific STIX bundle - a single JSON container that combines both the source information and the relevant marking definitions. An important point to note is that this output contains both the source information AND the analyst's applied intelligence in one place, which is very valuable for ongoing analysis.

Converting STIX into an Advanced Information Model.


In the next phase of the investigation, the STIX outputs from READ were converted into an Advanced Information Model (AIM).  Specifically, the High Quality Data Model (HQDM) designed by Dr Matthew West was used (please see the following ref for more details -"M. West, Developing High Quality Data Models. Burlington, MA: Morgan Kaufmann, 2011).  In this experiment, the research team at Loughborough undertook a thorough evaluation of the STIX outputs generated from READ.  This research enabled the automated mapping of the relevant STIX outputs to the HQDM. First a Terse RDF Triple Language (TTL) file was generated. TTL files are Resource Description Framework (RDF) compliant which enables them to be uploaded to other platforms for exploitation. For the purposes of this research ‘GraphDB’ was chosen as the database platform to which the TTL file was then uploaded and mapped to HQDM.


Processing the outputs in this way has expanded the opportunities for future onward analysis and interaction through querying, and visualisation through graph databases. Although the focus has been on using GraphDB, TTL files are vendor-neutral, allowing other platforms to be used for data exploration.


What are the research benefits of this work?


In doing this work, the research team has delivered the following benefits.


  1. The research created a module that automated the conversion of STIX outputs to HQDM.  This is a rapid conversion tool that greatly underpinned the power of the model developed for rich datasets - particularly STIX bundles, which contain both source information and analytical insights, making them particularly valuable to store in an AIM for future reference and use.

  2. Expanding underlying HQDM classes and objects to better support ongoing CTI investigations.  As a result of this work the HQDM itself was expanded to contain a series of new classes and types meaning that further work to use this process wouldn’t require the model to be updated.  These classes can now be used as part of the basis of further research by the CTI-STIX user community.  

  3. Proving out the utility of a combined data model.  Although not described in this post, the outputs from the CTI analysis explored here were also combined with two other sources of analytical investigation.  Using a combined data model (HQDM) these three sources of analysis were combined into the same, queriable datastore using TTL file outputs in a rdf4j database (please note, other tools like GraphDB could also have been used for this activity).  

  4. Worked to provide a use case for how HQDM as a 4D Ontology can be applied industrially.  By completing this research, Elemendar and its partners have deepened the understanding of how HQDM can be used as a candidate technology alongside other industrial options like IES4.  This paves the way for improving standards and options for improving data consistency and information management.  

Final deductions.

Through this research, as well as preparing for the automation and modelling of STIX outputs, we have also proven the utility of using AIM (in this case using the 4d ontology known as HQDM) to combine different sources of information from real life derived use cases (specifically using analysis feeds from Elemendar’s READ application, industry insights and bespoke tabular data extraction tools developed by Elemendar).  In doing this we have produced a combined data model that is queryable across all these sources of information - a powerful new capability to aid modern analysis. Please get in contact with the Elemendar team if you are interested in learning more or having access to this model. For more information about READ and how it can help your organization please contact us by clicking the button below.



Acknowledgements

This blog post was authored by:

Chris Evett, Director of Strategy

Ragini Gurumurthy, Junior CTI Analyst.


Recent Posts

See All

2 Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Guest
Jul 30, 2024
Rated 5 out of 5 stars.

Well done to the team for their innovative work!

Edited
Like

Guest
Jul 30, 2024
Rated 5 out of 5 stars.

good one

Like
bottom of page