An up-to-date list is available on Google Scholar.

conferences & journals

2024

NeurIPS

SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors

Vijay Lingam, Atula Tejaswi, Aditya Vavre, Aneesh Shetty, Gautham Krishna Gudur, Joydeep Ghosh, Alex Dimakis, Eunsol Choi, Aleksandar Bojchevski, and Sujay Sanghavi

In Neural Information Processing Systems, NeurIPS 2024

Abs arXiv HTML PDF Code

Popular parameter-efficient fine-tuning (PEFT) methods, such as LoRA and its variants, freeze pre-trained model weights \(W\)and inject learnable matrices \(∆W\). These \(∆W\)matrices are structured for efficient parameterization, often using techniques like low-rank approximations or scaling vectors. However, these methods typically show a performance gap compared to full fine-tuning. Although recent PEFT methods have narrowed this gap, they do so at the cost of additional learnable parameters. We propose SVFT, a simple approach that fundamentally differs from existing methods: the structure imposed on \(∆W\)depends on the specific weight matrix \(W\). Specifically, SVFT updates \(W\)as a sparse combination of outer products of its singular vectors, training only the coefficients (scales) of these sparse combinations. This approach allows fine-grained control over expressivity through the number of coefficients. Extensive experiments on language and vision benchmarks show that SVFT recovers up to 96% of full fine-tuning performance while training only 0.006 to 0.25% of parameters, outperforming existing methods that only recover up to 85% performance using 0.03 to 0.8% of the trainable parameter budget.
ICML

Robust Yet Efficient Conformal Prediction Sets

Soroush H. Zargarbashi, Sadegh Mohammad Akhondzadeh, and Aleksandar Bojchevski

In International Conference on Machine Learning, ICML 2024

Abs arXiv HTML PDF Code Poster

Conformal prediction (CP) can convert any model’s output into prediction sets guaranteed to include the true label with any user-specified probability. However, same as the model itself, CP is vulnerable to adversarial test examples (evasion) and perturbed calibration data (poisoning). We derive provably robust sets by bounding the worst-case change in conformity scores. Our tighter bounds lead to more efficient sets. We cover both continuous and discrete (sparse) data and our guarantees work both for evasion and poisoning attacks (on both features and labels).
ICLR

Conformal Inductive Graph Neural Networks

Soroush H. Zargarbashi, and Aleksandar Bojchevski

In International Conference on Learning Representations, ICLR 2024

Abs arXiv HTML PDF Code Poster

Conformal prediction (CP) transforms any model’s output into prediction sets guaranteed to include (cover) the true label. CP requires exchangeability, a relaxation of the i.i.d. assumption, to obtain a valid distribution-free coverage guarantee. This makes it directly applicable to transductive node-classification. However, conventional CP cannot be applied in inductive settings due to the implicit shift in the (calibration) scores caused by message passing with the new nodes. We fix this issue for both cases of node and edge-exchangeable graphs, recovering the standard coverage guarantee without sacrificing statistical efficiency. We further prove that the guarantee holds independently of the prediction time, e.g. upon arrival of a new node/edge or at any subsequent moment.
ICLR

Rethinking Label Poisoning for GNNs: Pitfalls and Attacks

Vijay Lingam, Sadegh Mohammad Akhondzadeh, and Aleksandar Bojchevski

In International Conference on Learning Representations, ICLR 2024

Abs HTML PDF Code Poster

Node labels for graphs are usually generated using an automated process or crowd-sourced from human users. This opens up avenues for malicious users to compromise the training labels, making it unwise to blindly rely on them. While robustness against noisy labels is an active area of research, there are only a handful of papers in the literature that address this for graph-based data. Even more so, the effects of adversarial label perturbations is sparsely studied. More critically, we reveal that the entire literature on label poisoning for GNNs is plagued by serious evaluation pitfalls. Thus making it hard to conclude how robust GNNs are against label perturbations. After course correcting the state of label poisoning attacks with our faithful evaluation, we identify a discrepancy in attack efficiency of 9% on average. Additionally, we introduce two new simple yet effective attacks that are significantly stronger (up to 8%) than the previous strongest attack. Our strongest proposed attack can be efficiently computed and is theoretically backed.

2023

NeurIPS

Hierarchical Randomized Smoothing

Yan Scholten, Jan Schuchardt, Aleksandar Bojchevski, and Stephan Günnemann

In Neural Information Processing Systems, NeurIPS 2023

Abs arXiv PDF Code Poster Talk

Real-world data is complex and often consists of objects that can be decomposed into multiple entities (e.g. images into pixels, graphs into interconnected nodes). Randomized smoothing is a powerful framework for making models provably robust against small changes to their inputs - by guaranteeing robustness of the majority vote when randomly adding noise before classification. Yet, certifying robustness on such complex data via randomized smoothing is challenging when adversaries do not arbitrarily perturb entire objects (e.g. images) but only a subset of their entities (e.g. pixels). As a solution, we introduce hierarchical randomized smoothing: We partially smooth objects by adding random noise only on a randomly selected subset of their entities. By adding noise in a more targeted manner than existing methods we obtain stronger robustness guarantees while maintaining high accuracy. We initialize hierarchical smoothing using different noising distributions, yielding novel robustness certificates for discrete and continuous domains. We experimentally demonstrate the importance of hierarchical smoothing in image and node classification, where it yields superior robustness-accuracy trade-offs. Overall, hierarchical smoothing is an important contribution towards models that are both - certifiably robust to perturbations and accurate.
NeurIPS

Are GATs Out of Balance?

Nimrah Mustafa, Aleksandar Bojchevski, and Rebekka Burkholz

In Neural Information Processing Systems, NeurIPS 2023

Abs arXiv Code Poster

While the expressive power and computational capabilities of graph neural networks (GNNs) have been theoretically studied, their optimization and learning dynamics, in general, remain largely unexplored. Our study undertakes the Graph Attention Network (GAT), a popular GNN architecture in which a node’s neighborhood aggregation is weighted by parameterized attention coefficients. We derive a conservation law of GAT gradient flow dynamics, which explains why a high portion of parameters in GATs with standard initialization struggle to change during training. This effect is amplified in deeper GATs, which perform significantly worse than their shallow counterparts. To alleviate this problem, we devise an initialization scheme that balances the GAT network. Our approach i) allows more effective propagation of gradients and in turn enables trainability of deeper networks, and ii) attains a considerable speedup in training and convergence time in comparison to the standard initialization. Our main theorem serves as a stepping stone to studying the learning dynamics of positive homogeneous models with attention mechanisms.
ICML

Conformal Prediction Sets for Graph Neural Networks

Soroush H. Zargarbashi, Simone Antonelli, and Aleksandar Bojchevski

In International Conference on Machine Learning, ICML 2023

Abs HTML PDF Code Poster

Despite the widespread use of graph neural networks (GNNs) we lack methods to reliably quantify their uncertainty. We propose a conformal procedure to equip GNNs with prediction sets that come with distribution-free guarantees – the output set contains the true label with arbitrarily high probability. Our post-processing procedure can wrap around any (pretrained) GNN, and unlike existing methods, results in meaningful sets even when the model provides only the top class. The key idea is to diffuse the node-wise conformity scores to incorporate neighborhood information. By leveraging the network homophily we construct sets with comparable or better efficiency (average size) and significantly improved singleton hit ratio (correct sets of size one). In addition to an extensive empirical evaluation, we investigate the theoretical conditions under which smoothing provably improves efficiency.
AISTATS

Probing Graph Representations

Sadegh Mohammad Akhondzadeh, Vijay Lingam, and Aleksandar Bojchevski

In International Conference on Artificial Intelligence and Statistics, AISTATS 2023

Abs arXiv Code

Today we have a good theoretical understanding of the representational power of Graph Neural Networks (GNNs). For example, their limitations have been characterized in relation to a hierarchy of Weisfeiler-Lehman (WL) isomorphism tests. However, we do not know what is encoded in the learned representations. This is our main question. We answer it using a probing framework to quantify the amount of meaningful information captured in graph representations. Our findings on molecular datasets show the potential of probing for understanding the inductive biases of graph-based models. We compare different families of models, and show that Graph Transformers capture more chemically relevant information compared to models based on message passing. We also study the effect of different design choices such as skip connections and virtual nodes. We advocate for probing as a useful diagnostic tool for evaluating and developing graph-based models.
ICLR

Unveiling the Sampling Density in Non-uniform Geometric Graphs

Raffaele Paolino, Aleksandar Bojchevski, Stephan Günnemann, Gitta Kutyniok, and Ron Levie

In International Conference on Learning Representation, ICLR 2023

Abs arXiv

A powerful framework for studying graphs is to consider them as geometric graphs: nodes are randomly sampled from an underlying metric space, and any pair of nodes is connected if their distance is less than a specified neighborhood radius. Currently, the literature mostly focuses on uniform sampling and constant neighborhood radius. However, real-world graphs are likely to be better represented by a model in which the sampling density and the neighborhood radius can both vary over the latent space. For instance, in a social network communities can be modeled as densely sampled areas, and hubs as nodes with larger neighborhood radius. In this work, we first perform a rigorous mathematical analysis of this (more general) class of models, including derivations of the resulting graph shift operators. The key insight is that graph shift operators should be corrected in order to avoid potential distortions introduced by the non-uniform sampling. Then, we develop methods to estimate the unknown sampling density in a self-supervised fashion. Finally, we present exemplary applications in which the learnt density is used to 1) correct the graph shift operator and improve performance on a variety of tasks, 2) improve pooling, and 3) extract knowledge from networks. Our experimental findings support our theory and provide strong evidence for our model.
ICLR notable

Localized Randomized Smoothing for Collective Robustness Certification

Jan Schuchardt, Tom Wollschläger, Aleksandar Bojchevski, and Stephan Günnemann

In International Conference on Learning Representation, ICLR 2023

Abs arXiv

Models for image segmentation, node classification and many other tasks map a single input to multiple labels. By perturbing this single shared input (e.g. the image) an adversary can manipulate several predictions (e.g. misclassify several pixels). Collective robustness certification is the task of provably bounding the number of robust predictions under this threat model. The only dedicated method that goes beyond certifying each output independently is limited to strictly local models, where each prediction is associated with a small receptive field. We propose a more general collective robustness certificate for all types of models and further show that this approach is beneficial for the larger class of softly local models, where each output is dependent on the entire input but assigns different levels of importance to different input regions (e.g. based on their proximity in the image). The certificate is based on our novel localized randomized smoothing approach, where the random perturbation strength for different input regions is proportional to their importance for the outputs. Localized smoothing Pareto-dominates existing certificates on both image segmentation and node classification tasks, simultaneously offering higher accuracy and stronger guarantees.
AAAI oral

Adversarial Weight Perturbation Improves Generalization in Graph Neural Networks

Yihan Wu, Aleksandar Bojchevski, and Heng Huang

In Conference on Artificial Intelligence, AAAI 2023

Abs arXiv Code

A lot of theoretical and empirical evidence shows that the flatter local minima tend to improve generalization. Adversarial Weight Perturbation (AWP) is an emerging technique to efficiently and effectively find such minima. In AMP we minimize the loss w.r.t. a bounded worst-case perturbation of the model parameters thereby favoring local minima with a small loss in a neighborhood around them. The benefits of AWP, and more generally the connections between flatness and generalization, have been extensively studied for i.i.d. data such as images. In this paper, we extensively study this phenomenon for graph data. Along the way, we first derive a generalization bound for non-i.i.d. node classification tasks. Then we identify a vanishing-gradient issue with all existing formulations of AWP and we propose a new Weighted Truncated AWP (WT-AWP) to alleviate this issue. We show that regularizing graph neural networks with WT-AWP consistently improves both natural and robust generalization across many different graph learning tasks and models.

2022

NeurIPS

Are Defenses for Graph Neural Networks Robust?

Felix Mujkanovic, Simon Geisler, Stephan Günnemann, and Aleksandar Bojchevski

In Neural Information Processing Systems, NeurIPS 2022

Abs arXiv Code Talk

A cursory reading of the literature suggests that we made a lot of progress in designing effective adversarial defenses for Graph Neural Networks (GNNs). Yet, the standard methodology has a serious flaw – virtually all of the defenses are evaluated against non-adaptive attacks leading to overly optimistic robustness estimates. We perform a thorough robustness analysis of 7 of the most popular defenses spanning the entire spectrum of strategies, i.e. aimed at improving the graph, the architecture, or the training. The results are sobering – most defenses show no or only marginal improvement compared to an undefended baseline. We advocate using custom adaptive attacks as a gold standard and we outline the lessons we learned from successfully designing such attacks. Moreover, our diverse collection of perturbed graphs forms a (black-box) unit test offering a first glance at a model’s robustness.
NeurIPS

Randomized Message-Interception Smoothing: Gray-box Certificates for Graph Neural Networks

Yan Scholten, Jan Schuchardt, Simon Geisler, Aleksandar Bojchevski, and Stephan Günnemann

In Neural Information Processing Systems, NeurIPS 2022

Abs arXiv Code Poster Talk

Randomized smoothing is one of the most promising frameworks for certifying the adversarial robustness of machine learning models, including Graph Neural Networks (GNNs). Yet, existing randomized smoothing certificates for GNNs are overly pessimistic since they treat the model as a black box, ignoring the underlying architecture. To remedy this, we propose novel gray-box certificates that exploit the message-passing principle of GNNs: We randomly intercept messages and carefully analyze the probability that messages from adversarially controlled nodes reach their target nodes. Compared to existing certificates, we certify robustness to much stronger adversaries that control entire nodes in the graph and can arbitrarily manipulate node features. Our certificates provide stronger guarantees for attacks at larger distances, as messages from farther-away nodes are more likely to get intercepted. We demonstrate the effectiveness of our method on various models and datasets. Since our gray-box certificates consider the underlying graph structure, we can significantly improve certifiable robustness by applying graph sparsification.
ICLR

Generalization of Neural Combinatorial Solvers Through the Lens of Adversarial Robustness

Simon Geisler, Johanna Sommer, Jan Schuchardt, Aleksandar Bojchevski, and Stephan Günnemann

In International Conference on Learning Representation, ICLR 2022

Abs HTML PDF Talk

End-to-end (geometric) deep learning has seen first successes in approximating the solution of combinatorial optimization problems. However, generating data in the realm of NP-hard/-complete tasks brings practical and theoretical challenges, resulting in evaluation protocols that are too optimistic. Specifically, most datasets only capture a simpler subproblem and likely suffer from spurious features. We investigate these effects by studying adversarial robustness – a local generalization property – to reveal hard, model-specific instances and spurious features. For this purpose, we derive perturbation models for SAT and TSP. Unlike in other applications, where perturbation models are designed around subjective notions of imperceptibility, our perturbation models are efficient and sound, allowing us to determine the true label of perturbed samples without a solver. Surprisingly, with such perturbations, a sufficiently expressive neural solver does not suffer from the limitations of the accuracy-robustness trade-off common in supervised learning. Although such robust solvers exist, we show empirically that the assessed neural solvers do not generalize well w.r.t. small perturbations of the problem instance.

2021

NeurIPS

Robustness of Graph Neural Networks at Scale

Simon Geisler, Thomas Schmidt, Hakan Şirin, Daniel Zügner, Aleksandar Bojchevski, and Stephan Günnemann

In Neural Information Processing Systems, NeurIPS 2021

Abs arXiv Code Talk

Graph Neural Networks (GNNs) are increasingly important given their popularity and the diversity of applications. Yet, existing studies of their vulnerability to adversarial attacks rely on relatively small graphs. We address this gap and study how to attack and defend GNNs at scale. We propose two sparsity-aware first-order optimization attacks that maintain an efficient representation despite optimizing over a number of parameters which is quadratic in the number of nodes. We show that common surrogate losses are not well-suited for global attacks on GNNs. Our alternatives can double the attack strength. Moreover, to improve GNNs’ reliability we design a robust aggregation function, Soft Median, resulting in an effective defense at all scales. We evaluate our attacks and defense with standard GNNs on graphs more than 100 times larger compared to previous work. We even scale one order of magnitude further by extending our techniques to a scalable GNN.
ICLR

Collective Robustness Certificates: Exploiting Interdependence in Graph Neural Networks

Jan Schuchardt, Aleksandar Bojchevski, Johannes Gasteiger, and Stephan Günnemann

In International Conference on Learning Representations, ICLR 2021

Abs HTML PDF Code Poster Slides Talk

In tasks like node classification, image segmentation, and named-entity recognition we have a classifier that simultaneously outputs multiple predictions (a vector of labels) based on a single input, i.e. a single graph, image, or document respectively. Existing adversarial robustness certificates consider each prediction independently and are thus overly pessimistic for such tasks. They implicitly assume that an adversary can use different perturbed inputs to attack different predictions, ignoring the fact that we have a single shared input. We propose the first collective robustness certificate which computes the number of predictions that are simultaneously guaranteed to remain stable under perturbation, i.e. cannot be attacked. We focus on Graph Neural Networks and leverage their locality property - perturbations only affect the predictions in a close neighborhood - to fuse multiple single-node certificates into a drastically stronger collective certificate. For example, on the Citeseer dataset our collective certificate for node classification increases the average number of certifiable feature perturbations from 7 to 351.
AISTATS

Completing the Picture: Randomized Smoothing Suffers from the Curse of Dimensionality for a Large Family of Distributions

Yihan Wu, Aleksandar Bojchevski, Aleksei Kuvshinov, and Stephan Günnemann

In International Conference on Artificial Intelligence and Statistics, AISTATS 2021

Abs PDF Code Poster Talk

Randomized smoothing is currently the most competitive technique for providing provable robustness guarantees. Since this approach is model-agnostic and inherently scalable we can certify arbitrary classifiers. Despite its success, recent works show that for a small class of i.i.d. distributions, the largest lp radius that can be certified using randomized smoothing decreases as O(1/d^(1/2-1/p)) with dimension d for p > 2. We complete the picture and show that similar no-go results hold for the l2 norm for a much more general family of distributions which are continuous and symmetric about the origin. Specifically, we calculate two different upper bounds of the l2 certified radius which have a constant multiplier of order Theta(1/d^1/2). Moreover, we extend our results to lp (p>2) certification with spherical symmetric distributions solidifying the limitations of randomized smoothing. We discuss the implications of our results for how accuracy and robustness are related, and why robust training with noise augmentation can alleviate some of the limitations in practice. We also show that on real-world data the gap between the certified radius and our upper bounds is small.

2020

ICML

Efficient Robustness Certificates for Discrete Data: Sparsity-Aware Randomized Smoothing for Graphs, Images and More

Aleksandar Bojchevski, Johannes Gasteiger, and Stephan Günnemann

In International Conference on Machine Learning, ICML 2020

Abs arXiv Code Slides Talk

Existing techniques for certifying the robustness of models for discrete data either work only for a small class of models or are general at the expense of efficiency or tightness. Moreover, they do not account for sparsity in the input which, as our findings show, is often essential for obtaining non-trivial guarantees. We propose a model-agnostic certificate based on the randomized smoothing framework which subsumes earlier work and is tight, efficient, and sparsity-aware. Its computational complexity does not depend on the number of discrete categories or the dimension of the input (e.g. the graph size), making it highly scalable. We show the effectiveness of our approach on a wide variety of models, datasets, and tasks – specifically highlighting its use for Graph Neural Networks. So far, obtaining provable guarantees for GNNs has been difficult due to the discrete and non-i.i.d. nature of graph data. Our method can certify any GNN and handles perturbations to both the graph structure and the node attributes.
KDD Oral

Scaling Graph Neural Networks with Approximate PageRank

Aleksandar Bojchevski*, Johannes Gasteiger*, Bryan Perozzi, Amol Kapoor, Martin Blais, Benedek Rózemberczki, Michal Lukasik, and Stephan Günnemann

In International Conference on Knowledge Discovery and Data Mining, KDD 2020

Abs arXiv Code Slides Talk

Graph neural networks (GNNs) have emerged as a powerful approach for solving many network mining tasks. However, learning on large graphs remains a challenge - many recently proposed scalable GNN approaches rely on an expensive message-passing procedure to propagate information through the graph. We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs resulting in significant speed gains while maintaining state-of-the-art prediction performance. In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings. We demonstrate that PPRGo outperforms baselines in both distributed and single-machine training environments on a number of commonly used academic graphs. To better analyze the scalability of large-scale graph learning methods, we introduce a novel benchmark graph with 12.4 million nodes, 173 million edges, and 2.8 million node features. We show that training PPRGo from scratch and predicting labels for all nodes in this graph takes under 2 minutes on a single machine, far outpacing other baselines on the same graph. We discuss the practical application of PPRGo to solve large-scale node classification problems at Google.
ALENEX

Group Centrality Maximization for Large-scale Graphs

Eugenio Angriman, Alexander Grinten, Aleksandar Bojchevski, Daniel Zügner, Stephan Günnemann, and Henning Meyerhenke

In Symposium on Algorithm Engineering and Experiments, ALENEX 2020

Abs arXiv Code

The study of vertex centrality measures is a key aspect of network analysis. Naturally, such centrality measures have been generalized to groups of vertices; for popular measures it was shown that the problem of finding the most central group is NP-hard. As a result, approximation algorithms to maximize group centralities were introduced recently. Despite a nearly-linear running time, approximation algorithms for group betweenness and (to a lesser extent) group closeness are rather slow on large networks due to high constant overheads. That is why we introduce GED-Walk centrality, a new submodular group centrality measure inspired by Katz centrality. In contrast to closeness and betweenness, it considers walks of any length rather than shortest paths, with shorter walks having a higher contribution. We define algorithms that (i) efficiently approximate the GED-Walk score of a given group and (ii) efficiently approximate the (proved to be NP-hard) problem of finding a group with highest GED-Walk score. Experiments on several real-world datasets show that scores obtained by GED-Walk improve performance on common graph mining tasks such as collective classification and graph-level classification. An evaluation of empirical running times demonstrates that maximizing GED-Walk (in approximation) is two orders of magnitude faster compared to group betweenness approximation and for group sizes ≤100 one to two orders faster than group closeness approximation. For graphs with tens of millions of edges, approximate GED-Walk maximization typically needs less than one minute. Furthermore, our experiments suggest that the maximization algorithms scale linearly with the size of the input graph and the size of the group.

2019

NeurIPS

Certifiable Robustness to Graph Perturbations

Aleksandar Bojchevski, and Stephan Günnemann

In Neural Information Processing Systems, NeurIPS 2019

Abs arXiv Code Poster

Despite the exploding interest in graph neural networks there has been little effort to verify and improve their robustness. This is even more alarming given recent findings showing that they are extremely vulnerable to adversarial attacks on both the graph structure and the node attributes. We propose the first method for verifying certifiable (non-)robustness to graph perturbations for a general class of models that includes graph neural networks and label/feature propagation. By exploiting connections to PageRank and Markov decision processes our certificates can be efficiently (and under many threat models exactly) computed. Furthermore, we investigate robust training procedures that increase the number of certifiably robust nodes while maintaining or improving the clean predictive accuracy.
ICML Oral

Adversarial Attacks on Node Embeddings via Graph Poisoning

Aleksandar Bojchevski, and Stephan Günnemann

In International Conference on Machine Learning, ICML 2019

Abs arXiv Code Poster Slides Talk

The goal of network representation learning is to learn low-dimensional node embeddings that capture the graph structure and are useful for solving downstream tasks. However, despite the proliferation of such methods, there is currently no study of their robustness to adversarial attacks. We provide the first adversarial vulnerability analysis on the widely used family of methods based on random walks. We derive efficient adversarial perturbations that poison the network structure and have a negative effect on both the quality of the embeddings and the downstream tasks. We further show that our attacks are transferable since they generalize to many models and are successful even when the attacker is restricted.
ICLR

Predict then Propagate: Graph Neural Networks meet Personalized PageRank

Johannes Gasteiger, Aleksandar Bojchevski, and Stephan Günnemann

In International Conference on Learning Representations, ICLR 2019

Abs arXiv Code Poster

Neural message passing algorithms for semi-supervised classification on graphs have recently achieved great success. However, for classifying a node these methods only consider nodes that are a few propagation steps away and the size of this utilized neighborhood is hard to extend. In this paper, we use the relationship between graph convolutional networks (GCN) and PageRank to derive an improved propagation scheme based on personalized PageRank. We utilize this propagation procedure to construct a simple model, personalized propagation of neural predictions (PPNP), and its fast approximation, APPNP. Our model’s training time is on par or faster and its number of parameters on par or lower than previous models. It leverages a large, adjustable neighborhood for classification and can be easily combined with any neural network. We show that this model outperforms several recently proposed methods for semi-supervised classification in the most thorough study done so far for GCN-like models. Our implementation is available online.

2018

ICML Oral

NetGAN: Generating Graphs via Random Walks

Aleksandar Bojchevski*, Oleksandr Shchur*, Daniel Zügner*, and Stephan Günnemann

In International Conference on Machine Learning, ICML 2018

Abs arXiv Code Poster Slides

We propose NetGAN – the first implicit generative model for graphs able to mimic real-world networks. We pose the problem of graph generation as learning the distribution of biased random walks over the input graph. The proposed model is based on a stochastic neural network that generates discrete output samples and is trained using the Wasserstein GAN objective. NetGAN is able to produce graphs that exhibit well-known network patterns without explicitly specifying them in the model definition. At the same time, our model exhibits strong generalization properties, as highlighted by its competitive link prediction performance, despite not being trained specifically for this task. Being the first approach to combine both of these desirable properties, NetGAN opens exciting avenues for further research.
ICLR

Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking

Aleksandar Bojchevski, and Stephan Günnemann

In International Conference on Learning Representations, ICLR 2018

Abs arXiv Code Poster

Methods that learn representations of nodes in a graph play a critical role in network analysis since they enable many downstream learning tasks. We propose Graph2Gauss – an approach that can efficiently learn versatile node embeddings on large scale (attributed) graphs that show strong performance on tasks such as link prediction and node classification. Unlike most approaches that represent nodes as point vectors in a low-dimensional continuous space, we embed each node as a Gaussian distribution, allowing us to capture uncertainty about the representation. Furthermore, we propose an unsupervised method that handles inductive learning scenarios and is applicable to different types of graphs: plain/attributed, directed/undirected. By leveraging both the network structure and the associated node attributes, we are able to generalize to unseen nodes without additional training. To learn the embeddings we adopt a personalized ranking formulation w.r.t. the node distances that exploits the natural ordering of the nodes imposed by the network structure. Experiments on real world networks demonstrate the high performance of our approach, outperforming state-of-the-art network embedding methods on several different tasks. Additionally, we demonstrate the benefits of modeling uncertainty – by analyzing it we can estimate neighborhood diversity and detect the intrinsic latent dimensionality of a graph.
AAAI

Bayesian Robust Attributed Graph Clustering: Joint Learning of Partial Anomalies and Group Structure

Aleksandar Bojchevski, and Stephan Günnemann

In Conference on Artificial Intelligence, AAAI 2018

Abs PDF Code Poster

We study the problem of robust attributed graph clustering. In real data, the clustering structure is often obfuscated due to anomalies or corruptions. While robust methods have been recently introduced that handle anomalies as part of the clustering process, they all fail to account for one core aspect: Since attributed graphs consist of two views (network structure and attributes) anomalies might materialize only partially, i.e. instances might be corrupted in one view but perfectly fit in the other. In this case, we can still derive meaningful cluster assignments. Existing works only consider complete anomalies. In this paper, we present a novel probabilistic generative model (PAICAN) that explicitly models partial anomalies by generalizing ideas of Degree Corrected Stochastic Block Models and Bernoulli Mixture Models. We provide a highly scalable variational inference approach with runtime complexity linear in the number of edges. The robustness of our model w.r.t. anomalies is demonstrated by our experimental study, outperforming state-of-the-art competitors.
BMC

LocText: Relation Extraction of Protein Localizations to Assist Database Curation

Juan Miguel Cejuela, Shrikant Vinchurkar, Tatyana Goldberg, Madhukar S. Prabhu Shankar, Ashish Baghudana, Aleksandar Bojchevski, Carsten Uhlig, André Ofner, Pandu Raharja-Liu, Lars Juhl Jensen, and others

BMC Bioinformatics 2018

Abs HTML Code

The subcellular localization of a protein is an important aspect of its function. However, the experimental annotation of locations is not even complete for well-studied model organisms. Text mining might aid database curators to add experimental annotations from the scientific literature. Existing extraction methods have difficulties to distinguish relationships between proteins and cellular locations co-mentioned in the same sentence. LocText was created as a new method to extract protein locations from abstracts and full texts. LocText learned patterns from syntax parse trees and was trained and evaluated on a newly improved LocTextCorpus. Combined with an automatic named-entity recognizer, LocText achieved high precision (P = 86%±4). After completing development, we mined the latest research publications for three organisms: human (Homo sapiens), budding yeast (Saccharomyces cerevisiae), and thale cress (Arabidopsis thaliana). Examining 60 novel, text-mined annotations, we found that 65% (human), 85% (yeast), and 80% (cress) were correct. Of all validated annotations, 40% were completely novel, i.e. did neither appear in the annotations nor the text descriptions of Swiss-Prot. LocText provides a cost-effective, semi-automated workflow to assist database curators in identifying novel protein localization annotations. The annotations suggested through text-mining would be verified by experts to guarantee high-quality standards of manually-curated databases such as Swiss-Prot.

2017

KDD

Robust Spectral Clustering for Noisy Data: Modeling Sparse Corruptions Improves Latent Embeddings

Aleksandar Bojchevski, Yves Matkovic, and Stephan Günnemann

In International Conference on Knowledge Discovery and Data Mining, KDD 2017

Abs HTML PDF Code Poster

Spectral clustering is one of the most prominent clustering approaches. However, it is highly sensitive to noisy input data. In this work, we propose a robust spectral clustering technique able to handle such scenarios. To achieve this goal, we propose a sparse and latent decomposition of the similarity graph used in spectral clustering. In our model, we jointly learn the spectral embedding as well as the corrupted data – thus, enhancing the clustering performance overall. We propose algorithmic solutions to all three established variants of spectral clustering, each showing linear complexity in the number of edges. Our experimental analysis confirms the significant potential of our approach for robust spectral clustering. Supplementary material is available at www.kdd.in.tum.de/RSC.
BioInf

nala: Text Mining Natural Language Mutation Mentions

Juan Miguel Cejuela, Aleksandar Bojchevski, Carsten Uhlig, Rustem Bekmukhametov, Sanjeev Kumar Karn, Shpend Mahmuti, Ashish Baghudana, Ankit Dubey, Venkata P Satagopam, and Burkhard Rost

Bioinformatics 2017

Abs HTML Code

The extraction of sequence variants from the literature remains an important task. Existing methods primarily target standard (ST) mutation mentions (e.g. ‘E6V’), leaving relevant mentions natural language (NL) largely untapped (e.g. ‘glutamic acid was substituted by valine at residue 6’). We introduced three new corpora suggesting named-entity recognition (NER) to be more challenging than anticipated: 28–77% of all articles contained mentions only available in NL. Our new method nala captured NL and ST by combining conditional random fields with word embedding features learned unsupervised from the entire PubMed. In our hands, nala substantially outperformed the state-of-the-art. For instance, we compared all unique mentions in new discoveries correctly detected by any of three methods (SETH, tmVar, or nala). Neither SETH nor tmVar discovered anything missed by nala, while nala uniquely tagged 33% mentions. For NL mentions the corresponding value shot up to 100% nala-only.

workshops

MLG

Is PageRank All You Need for Scalable Graph Neural Networks?

Aleksandar Bojchevski, Johannes Gasteiger, Bryan Perozzi, Martin Blais, Amol Kapoor, Michal Lukasik, and Stephan Günnemann

In International Workshop on Mining and Learning with Graphs, MLG 2019

Abs PDF Code

Graph neural networks (GNNs) have emerged as a powerful approach for solving many network mining tasks. However, efficiently utilizing them on web-scale data remains a challenge despite related advances in research. Most recently proposed scalable GNNs rely on an expensive recursive message-passing procedure to propagate information through the graph. We circumvent this limitation by leveraging connections between GNNs and personalized PageRank and we develop a model that incorporates multi-hop neighborhood information in a single (non-recursive) step. Our work-in-progress approach PPRGo is significantly faster than multi-hop models while maintaining state-of-the-art prediction performance. We demonstrate the strengths and scalability of our approach on graphs orders of magnitude larger than typically considered in the literature.
GEM

Dual-primal Graph Convolutional Networks

Federico Monti, Oleksandr Shchur, Aleksandar Bojchevski, Or Litany, Stephan Günnemann, and Michael M Bronstein

In Graph Embedding and Mining Workshop, GEM 2019

Abs arXiv

In recent years, there has been a surge of interest in developing deep learning methods for non-Euclidean structured data such as graphs. In this paper, we propose Dual-Primal Graph CNN, a graph convolutional architecture that alternates convolution-like operations on the graph and its dual. Our approach allows to learn both vertex- and edge features and generalizes the previous graph attention (GAT) model. We provide extensive experimental validation showing state-of-the-art results on a variety of tasks tested on established graph benchmarks, including CORA and Citeseer citation networks as well as MovieLens, Flixter, Douban and Yahoo Music graph-guided recommender systems.
R2L

Pitfalls of Graph Neural Network Evaluation

Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann

In Relational Representation Learning Workshop, R2L 2018

Abs arXiv Code

Semi-supervised node classification in graphs is a fundamental problem in graph mining, and the recently proposed graph neural networks (GNNs) have achieved unparalleled results on this task. Due to their massive success, GNNs have attracted a lot of attention, and many novel architectures have been put forward. In this paper we show that existing evaluation strategies for GNN models have serious shortcomings. We show that using the same train/validation/test splits of the same datasets, as well as making significant changes to the training procedure (e.g. early stopping criteria) precludes a fair comparison of different architectures. We perform a thorough empirical evaluation of four prominent GNN models and show that considering different splits of the data leads to dramatically different rankings of models. Even more importantly, our findings suggest that simpler GNN architectures are able to outperform the more sophisticated ones if the hyperparameters and the training procedure are tuned fairly for all models.
ICDMW

Anomaly Detection in Car-Booking Graphs

Oleksandr Shchur, Aleksandar Bojchevski, Mohamed Farghal, Stephan Günnemann, and Yusuf Saber

In International Conference on Data Mining Workshops, ICDM 2018

Abs HTML

The use of car-booking services has gained massive popularity in the recent years – which led to an increasing number of fraudsters that try to game these systems. In this paper we describe a framework for fraud detection in car-booking systems. Our core idea lies in casting this problem as an instance of anomaly detection in temporal graphs. Specifically, we use unsupervised techniques, such as dense subblock discovery, to detect suspicious activity. The proposed framework is able to adapt to the variations in the data inherent to the car-booking setting, and detects fraud with high precision. This work is performed in collaboration with Careem, where the described framework is currently being deployed in production.

theses

PhD

Machine Learning on Graphs in the Presence of Noise and Adversaries
Technical University of Munich, 2020
Abs

From protein interactions to social networks, complex systems of interlinked entities are endemic in a connected world and graphs are a powerful abstraction for capturing their structure. Accordingly, we have a rich literature of machine learning techniques for graph data to solve problems ranging from fraud detection to cancer classification. Since in reality data is unreliable understanding the robustness of these techniques to noise and adversaries is critical. The contributions of this thesis deepen our understanding of robustness for three types of models: unsupervised, generative, and semi-supervised. First, we derive a noise-resilient variant of the classical spectral embedding, and we introduce Gaussian embeddings that represent nodes as distributions to capture uncertainty. Then we study the sensitivity of node embeddings to graph poisoning. Next, we develop a generative model that explicitly accounts for anomalies and detects clusters obfuscated by noise. Finally, we derive provable robustness guarantees. We propose the first certificate w.r.t. structure perturbations for a large class of PageRank-based models, and we derive a general certificate for discrete data applicable to any graph classifier.
MSc

Semi-supervised Learning for Biomedical Named-Entity Recognition
Technical University of Munich, 2015
Abs PDF Code Slides

As the volume of published research in the biomedical domain increases, the need for effective information extraction systems grows in parallel. In this context, the task of named-entity recognition (NER) is essential. NER is defined as the classification of words in free text that represent predefined categories such as genes, proteins or other entities. As a specific application of NER, the main focus of this thesis is the recognition of mutation mentions from the biomedical literature. More specifically we aim to create a model able to recognize mutation mentions expressed in natural language. The current state-of-the-art method, tmVar is only able to recognize a small subset of standard or semi-standard mentions. Our method both outperforms tmVar on those types of mentions and is also able to recognize natural language (NL) mentions. Previously no other method considered NL mutation mentions. The performance of NER machine learning models is intrinsically limited by the availability of high-quality-annotated corpora. The construction of such corpora is costly – specially when expert annotators are required. In the biomedical domain, the difficulty of the task is even greater, since the number of possible named entities is higher and keeps growing with new discoveries. To combat the lack of large annotated corpora, we turn to the exploitation of large volumes of unlabeled text, applying a semi-supervised learning approach. Using techniques for unsupervised feature learning we aim to increase the performance of traditional NER models. More specifically, this thesis focuses on augmenting common conditional random field (CRF) approaches combined with novel word representation features learned from large bodies of biomedical text. Furthermore, using an active learning approach we extend an existing corpus of mutation mentions (IDP4) with additional NL mentions. Finally, and in support of evaluating our semi-supervised learning approach, we develop a complete pipeline for biomedical named-entity recognition including preprocessing steps, feature generation, model learning and normalized predictions. Our extended corpus, NER tool and pipeline framework are all open sourced on GitHub.
BEng

Personality Prediction Based on Information from Social Networks
Ss. Cyril and Methodius University, 2013