Multi-type relational clustering for enterprise cyber-security networks

•Propose fast novel hard clustering algorithm for multi-type relational data.•Extend the popular NNDSVD method initialisation of our algorithm.•Propose internal performance clustering measure for assessing cluster similarity. Several cyber-security data sources are collected in enterprise networks p...

Full description

Saved in:
Bibliographic Details
Published in:Pattern recognition letters Vol. 149; pp. 172 - 178
Main Authors: Riddle-Workman, Elizabeth, Evangelou, Marina, Adams, Niall M.
Format: Journal Article
Language:English
Published: Amsterdam Elsevier B.V 01.09.2021
Elsevier Science Ltd
Subjects:
ISSN:0167-8655, 1872-7344
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Propose fast novel hard clustering algorithm for multi-type relational data.•Extend the popular NNDSVD method initialisation of our algorithm.•Propose internal performance clustering measure for assessing cluster similarity. Several cyber-security data sources are collected in enterprise networks providing relational information between different types of nodes in the network, namely computers, users and ports. This relational data can be expressed as adjacency matrices detailing inter-type relationships corresponding to relations between nodes of different types and intra-type relationships showing relationships between nodes of the same type. In this paper, we propose an extension of Non-Negative Matrix Tri-Factorisation (NMTF) to simultaneously cluster nodes based on their intra and inter-type relationships. Existing NMTF based clustering methods suffer from long computational times due to large matrix multiplications. In our approach, we enforce stricter cluster indicator constraints on the factor matrices to circumvent these issues. Additionally, to make our proposed approach less susceptible to variation in results due to random initialisation, we propose a novel initialisation procedure based on Non-Negative Double Singular Value Decomposition for multi-type relational clustering. Finally, a new performance measure suitable for assessing clustering performance on unlabelled multi-type relational data sets is presented. Our algorithm is assessed on both a simulated and real computer network against standard approaches showing its strong performance.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0167-8655
1872-7344
DOI:10.1016/j.patrec.2021.05.021