Safro Research Group @ University of Delaware

Monday, August 30, 2021

Tutorial on the quantum approximate optimization algorithm, its applications and simulation

We recorded this tutorial for IEEE International Conference on Quantum Computing and Engineering (QCE) 2020. This tutorial consists of four parts:

QAOA theory and quantum computing basics
Hands on example of QAOA and MaxCut
Introduction to problem decomposition and solving large-scale problems with QAOA
Tensor networks and simulation of QAOA with classical computers

Thursday, February 25, 2021

Multilevel Graph Partitioning for Three-Dimensional Discrete Fracture Network Flow Simulations

Combinatorial scientific computing in action! Our paper on accelerating 3D discrete fracture network flow simulations by multilevel graph partitioning is accepted in Mathematical Geosciences!

Hayato Ushijima-Mwesigwa, Jeffrey D. Hyman, Aric Hagberg, Ilya Safro, Satish Karra, Carl W. Gable, Gowri Srinivasan "Multilevel Graph Partitioning for Three-Dimensional Discrete Fracture Network Flow Simulations", accepted in Mathematical Geosciences, preprint at https://arxiv.org/abs/1902.08029, 2020

We present a topology-based method for mesh-partitioning in three-dimensional discrete fracture network (DFN) simulations that takes advantage of the intrinsic multi-level nature of a DFN. DFN models are used to simulate flow and transport through low-permeability fracture media in the subsurface by explicitly representing fractures as discrete entities. The governing equations for flow and transport are numerically integrated on computational meshes generated on the interconnected fracture networks. Modern high-fidelity DFN simulations require high-performance computing on multiple processors where performance and scalability depends partially on obtaining a high-quality partition of the mesh to balance work work-loads and minimize communication across all processors.

The discrete structure of a DFN naturally lends itself to various graph representations, which can be thought of as coarse-scale representations of the computational mesh. Using this concept, we develop a variant of the multilevel graph partitioning algorithm to partition the mesh of a DFN. We compare the performance of this DFN-based mesh-partitioning with standard multi-level graph partitioning using graphbased metrics (cut, imbalance, partitioning time), computational-based metrics (FLOPS, iterations, solver time), and total run time. The DFN-based partition and the mesh-based partition are comparable in terms of the graph-based metrics, but the time required to obtain the partition is several orders of magnitude faster using the DFN-based partition. The computation-based metrics show comparable performance between both methods so, in combination, the DFN-based partition is several orders of magnitude faster than the mesh-based partition.

ELRUNA: Network Alignment Algorithm

Our network alignment algorithm ELRUNA is accepted in ACM Journal of Experimental Algorithmics. Turns out that three relatively simple node similarity rules can successfully compete with several state of the art algorithms and improve both the running time and alignment quality. You can get it at https://github.com/BridgelessAlexQiu/ELRUNA

Zirou Qiu, Ruslan Shaydulin, Xiaoyuan Liu, Yuri Alexeev, Christopher S. Henry, Ilya Safro "ELRUNA: Elimination Rule-based Network Alignment", accepted in ACM Journal of Experimental Algorithmics, preprint at https://arxiv.org/abs/1911.05486, 2020

Networks model a variety of complex phenomena across different domains. In many applications, one of the most essential tasks is to align two or more networks to infer the similarities between cross-network vertices and discover potential node-level correspondence. In this paper, we propose ELRUNA (Elimination rule-based network alignment), a novel network alignment algorithm that relies exclusively on the underlying graph structure. Under the guidance of the elimination rules that we defined, ELRUNA computes the similarity between a pair of cross-network vertices iteratively by accumulating the similarities between their selected neighbors. The resulting cross-network similarity matrix is then used to infer a permutation matrix that encodes the final alignment of cross-network vertices. In addition to the novel alignment algorithm, we also improve the performance of local search, a commonly used post-processing step for solving the network alignment problem, by introducing a novel selection method RAWSEM (Random walk based selection method) based on the propagation of the levels of mismatching (defined in the paper) of vertices across the networks. The key idea is to pass on the initial levels of mismatching of vertices throughout the entire network in a random-walk fashion. Through extensive numerical experiments on real networks, we demonstrate that ELRUNA significantly outperforms the state-of-the-art alignment methods in terms of alignment accuracy under lower or comparable running time. Moreover, ELRUNA is robust to network perturbations such that it can maintain a close to optimal objective value under a high level of noise added to the original networks. Finally, the proposed RAWSEM can further improve the alignment quality with a less number of iterations compared with the naive local search method.

Thursday, February 11, 2021

Can we outperform Quantum Approximate Optimization Algorithm?

Check our new paper:

Xiaoyuan Liu, Anthony Angone, Ruslan Shaydulin, Ilya Safro, Yuri Alexeev, Lukasz Cincio "Layer VQE: A Variational Approach for Combinatorial Optimization on Noisy Quantum Computers", preprint at https://arxiv.org/abs/2102.05566, 2021

We propose a hybrid quantum-classical algorithm, Layer Variational Quantum Eigensolver (L-VQE), inspired by the Variational Quantum Eigensolver (VQE). L-VQE is a heuristic approach to solve combinatorial optimization problems on near term intermediate-scale quantum devices. We demonstrate the potential of the proposed approach by applying it to the problem of community detection, a famous problem in network science. Our large-scale numerical simulation study shows that L-VQE has the potential to outperform Quantum Approximate Optimization Algorithm (QAOA), and is more robust to sampling noise as compared with standard VQE approaches.

Literature-based knowledge discovery to accelerate COVID-19 research

Our new paper on customization of AGATHA knowledge discovery model for COVID-19 is out!

Ilya Tyagin, Ankit Kulshrestha, Justin Sybrandt, Krish Matta, Michael Shtutman, Ilya Safro

"Accelerating COVID-19 research with graph mining and transformer-based learning", 2021

https://www.biorxiv.org/content/10.1101/2021.02.11.430789v1

In 2020, the White House released the, "Call to Action to the Tech Community on New Machine Readable COVID-19 Dataset," wherein artificial intelligence experts are asked to collect data and develop text mining techniques that can help the science community answer high-priority scientific questions related to COVID-19. The Allen Institute for AI and collaborators announced the availability of a rapidly growing open dataset of publications, the COVID-19 Open Research Dataset (CORD-19). As the pace of research accelerates, biomedical scientists struggle to stay current. To expedite their investigations, scientists leverage hypothesis generation systems, which can automatically inspect published papers to discover novel implicit connections. We present an automated general purpose hypothesis generation systems AGATHA-C and AGATHA-GP for COVID-19 research. The systems are based on graph-mining and the transformer model. The systems are massively validated using retrospective information rediscovery and proactive analysis involving human-in-the-loop expert analysis. Both systems achieve high-quality predictions across domains (in some domains up to 0.97% ROC AUC) in fast computational time and are released to the broad scientific community to accelerate biomedical research. In addition, by performing the domain expert curated study, we show that the systems are able to discover on-going research findings such as the relationship between COVID-19 and oxytocin hormone.

Wednesday, December 23, 2020

Reboot the Computing-Research Publication Systems

Many good points by Moshe Vardi on current CS research publication system in the Communications of ACM

Reboot the Computing-Research Publication Systems: The virtualization of conferences due to COVID-19 has sharpened my conviction that the computing-research publication system is badly broken and in need of a serious reboot.

Saturday, December 12, 2020

Do we always need big data in text mining?

Do we always need Big Data text mining? Can we filter it? Check our new paper "Accelerating Text Mining Using Domain-Specific Stop Word Lists" accepted at IWBDR https://arxiv.org/pdf/2012.02294.pdf

Text preprocessing is an essential step in text mining. Removing words that can negatively impact the quality of prediction algorithms or are not informative enough is a crucial storage-saving technique in text indexing and results in improved computational efficiency. Typically, a generic stop word list is applied to a dataset regardless of the domain. However, many common words are different from one domain to another but have no significance within a particular domain. Eliminating domain-specific common words in a corpus reduces the dimensionality of the feature space, and improves the performance of text mining tasks. In this paper, we present a novel mathematical approach for the automatic extraction of domain-specific words called the hyperplane-based approach. This new approach depends on the notion of low dimensional representation of the word in vector space and its distance from hyperplane. The hyperplane-based approach can significantly reduce text dimensionality by eliminating irrelevant features. We compare the hyperplane-based approach with other feature selection methods, namely \c{hi}2 and mutual information. An experimental study is performed on three different datasets and five classification algorithms, and measure the dimensionality reduction and the increase in the classification performance. Results indicate that the hyperplane-based approach can reduce the dimensionality of the corpus by 90% and outperforms mutual information. The computational time to identify the domain-specific words is significantly lower than mutual information.

Thursday, December 10, 2020

Quantum Approximate Optimization Algorithm and Symmetry

Our new paper QAOA symmetry and its predictability using classical symmetry is out! We analyze the connections between the symmetries of the objective function and the symmetries of QAOA dynamics, and applications to performance prediction, simulation and more https://scirate.com/arxiv/2012.04713

Ruslan Shaydulin, Stuart Hadfield, Tad Hogg, Ilya Safro "Classical symmetries and QAOA", 2020

We study the relationship between the Quantum Approximate Optimization Algorithm (QAOA) and the underlying symmetries of the objective function to be optimized. Our approach formalizes the connection between quantum symmetry properties of the QAOA dynamics and the group of classical symmetries of the objective function. The connection is general and includes but is not limited to problems defined on graphs. We show a series of results exploring the connection and highlight examples of hard problem classes where a nontrivial symmetry subgroup can be obtained efficiently. In particular we show how classical objective function symmetries lead to invariant measurement outcome probabilities across states connected by such symmetries, independent of the choice of algorithm parameters or number of layers. To illustrate the power of the developed connection, we apply machine learning techniques towards predicting QAOA performance based on symmetry considerations. We provide numerical evidence that a small set of graph symmetry properties suffices to predict the minimum QAOA depth required to achieve a target approximation ratio on the MaxCut problem, in a practically important setting where QAOA parameter schedules are constrained to be linear and hence easier to optimize.

Friday, November 6, 2020

On improving nonlinear SVM scalability using multilevel frameworks

Our paper "AML-SVM: Adaptive Multilevel Learning with Support Vector Machines" is accepted at 2020 IEEE International Conference on Big Data. https://arxiv.org/abs/2011.02592

The support vector machines (SVM) is one of the most widely used and practical optimization based classification models in machine learning because of its interpretability and flexibility to produce high quality results. However, the big data imposes a certain difficulty to the most sophisticated but relatively slow versions of SVM, namely, the nonlinear SVM. The complexity of nonlinear SVM solvers and the number of elements in the kernel matrix quadratically increases with the number of samples in training data. Therefore, both runtime and memory requirements are negatively affected. Moreover, the parameter fitting has extra kernel parameters to tune, which exacerbate the runtime even further. This paper proposes an adaptive multilevel learning framework for the nonlinear SVM, which addresses these challenges, improves the classification quality across the refinement process, and leverages multi-threaded parallel processing for better performance. The integration of parameter fitting in the hierarchical learning framework and adaptive process to stop unnecessary computation significantly reduce the running time while increase the overall performance. The experimental results demonstrate reduced variance on prediction over validation and test data across levels in the hierarchy, and significant speedup compared to state-of-the-art nonlinear SVM libraries without a decrease in the classification quality.

Saturday, September 19, 2020

How to use NISQ quantum devices to solve large-scale problems?

Our paper on multilevel hybrid quantum-classical optimization "Multilevel Combinatorial Ooptimization Across Quantum Architectures" is accepted in ACM Transactions on Quantum Computing! Big kudos to two lead former students Hayato and Ruslan! https://arxiv.org/abs/1910.09985

Emerging quantum processors provide an opportunity to explore new approaches for solving traditional problems in the post Moore's law supercomputing era. However, the limited number of qubits makes it infeasible to tackle massive real-world datasets directly in the near future, leading to new challenges in utilizing these quantum processors for practical purposes. Hybrid quantum-classical algorithms that leverage both quantum and classical types of devices are considered as one of the main strategies to apply quantum computing to large-scale problems. In this paper, we advocate the use of multilevel frameworks for combinatorial optimization as a promising general paradigm for designing hybrid quantum-classical algorithms. In order to demonstrate this approach, we apply this method to two well-known combinatorial optimization problems, namely, the Graph Partitioning Problem, and the Community Detection Problem. We develop hybrid multilevel solvers with quantum local search on D-Wave's quantum annealer and IBM's gate-model based quantum processor. We carry out experiments on graphs that are orders of magnitudes larger than the current quantum hardware size, and we observe results comparable to state-of-the-art solvers in terms of quality of the solution.

Friday, August 21, 2020

New NSF grant to develop a simulator for quantum computing

NSF awarded grant to develop large-scale QAOA simulator! This is a collaborative project with the Yuri Alexeev@University of Chicago.

Generating biomedical scientific hypotheses with AGATHA

Accepted paper in the 29TH ACM International Conference on Information and Knowledge Management (CIKM)

Sybrandt, Tyagin, Shtutman, Safro "AGATHA: Automatic Graph-mining and Transformer based Hypothesis Generation Approach", preprint at http://arxiv.org/pdf/2002.05635.pdf

Medical research is risky and expensive. Drug discovery requires researchers to efficiently winnow thousands of potential targets to a small candidate set. However, scientists spend significant time and money long before seeing the intermediate results that ultimately determine this smaller set. Hypothesis generation systems address this challenge by mining the wealth of publicly available scientific information to predict plausible research directions. We present AGATHA, a deep-learning hypothesis generation system that learns a data-driven ranking criteria to recommend new biomedical connections. We massively validate our system with a temporal holdout wherein we predict connections first introduced after 2015 using data published beforehand. We additionally explore biomedical sub-domains, and demonstrate AGATHA's predictive capacity across the twenty most popular relationship types. Furthermore, we perform an ablation study to examine the aspects of our semantic network that most contribute to recommendation quality. Overall, AGATHA achieves best-in-class recommendation quality when compared to other hypothesis generation systems built to predict across all available biomedical literature. Reproducibility: All code, experimental data, and pre-trained models are available online: sybrandt.com/2020/agatha.

Sunday, July 12, 2020

Two papers on graph representation learning

Accepted papers at Workshop on Mining and Learning with Graphs co-located with ACM KDD 2020!

1) Ding, Zhang, Sybrandt, Safro "Unsupervised Hierarchical Graph Representation Learning by Mutual Information Maximization", 2020

Graph representation learning based on graph neural networks (GNNs) can greatly improve the performance of downstream tasks, such as node and graph classification. However, the general GNN models do not aggregate node information in a hierarchical manner, and can miss key higher-order structural features of many graphs. The hierarchical aggregation also enables the graph representations to be explainable. In addition, supervised graph representation learning requires labeled data, which is expensive and error-prone. To address these issues, we present an unsupervised graph representation learning method, Unsupervised Hierarchical Graph Representation (UHGR), which can generate hierarchical representations of graphs. Our method focuses on maximizing mutual information between "local" and high-level "global" representations, which enables us to learn the node embeddings and graph embeddings without any labeled data. To demonstrate the effectiveness of the proposed method, we perform the node and graph classification using the learned node and graph embeddings. The results show that the proposed method achieves comparable results to state-of-the-art supervised methods on several benchmarks. In addition, our visualization of hierarchical representations indicates that our method can capture meaningful and interpretable clusters.

2) Sybrandt, Safro "FOBE and HOBE: First- and High-Order Bipartite Embeddings", 2020

Typical graph embeddings may not capture type-specific bipartite graph features that arise in such areas as recommender systems, data visualization, and drug discovery. Machine learning methods utilized in these applications would be better served with specialized embedding techniques. We propose two embeddings for bipartite graphs that decompose edges into sets of indirect relationships between node neighborhoods. When sampling higher-order relationships, we reinforce similarities through algebraic distance on graphs. We also introduce ensemble embeddings to combine both into a "best of both worlds" embedding. The proposed methods are evaluated on link prediction and recommendation tasks and compared with other state-of-the-art embeddings. While being all highly beneficial in applications, we demonstrate that none of the considered embeddings is clearly superior (in contrast to what is claimed in many papers), and discuss the trade offs present among them.

Wednesday, June 10, 2020

PhD thesis defense

Congratulations to Dr. Ruslan Shaydulin for successfully defending his Ph.D. thesis "Quantum and Classical Multilevel Algorithms for (Hyper)Graphs"! Ruslan will join Argonne National Lab with MGM fellowship in August. This is the second defense in our lab this week!

Monday, June 8, 2020

PhD thesis defense

Congratulations to Dr. Ehsan Sadrfaridpour for successfully defending his Ph.D. thesis "Fast Machine Learning Algorithms for Massive Datasets with Applications in Biomedical Domain"! Ehsan will join Lowe's data science team this summer.

Friday, May 15, 2020

Our work in HPCwire

News coverage in HPCwire about our collaboration with Argonne on hybrid quantum-classical computing
Argonne Receives Two Awards from DARPA for Quantum Information Science

Tuesday, April 21, 2020

NSF grant to tackle COVID-19

Our team received NSF grant to tackle COVID-19 using our AI hypothesis generation system AGATHA
Clemson news coverage: Artificial intelligence could aid in fight against COVID-19

Ilya Safro of @socclemson said that his team will soon roll out a new artificial intelligence system aimed at helping researchers explore the scientific literature as they strive for new discoveries to combat the novel coronavirus.https://t.co/TwQyZqEUra pic.twitter.com/fh3K8iAoWO
— Clemson Engineering, Computing & Applied Sciences (@ClemsonCECAS) April 21, 2020

Tuesday, April 14, 2020

Board game night

Board game night from quarantine! pic.twitter.com/6env2hICaa
— Justin Sybrandt, Ph.D. (@Justin_Sybrandt) March 28, 2020

Monday, April 13, 2020

Combing the best of quantum and classical computing

News coverage from Clemson: Combing the best of quantum and classical computing

Friday, April 3, 2020

From SoC@Clemson

Congrats to Dr. Ilya Safro on receiving funding from @DARPA to work on hybrid quantum-classical algorithms for combinatorial optimization problems, a 4 year collaborative research with @argonne. He is looking for Ph.D. students to join the Algorithms & Computational Science lab. pic.twitter.com/YfIYVc7HzO
— School of Computing (@socclemson) April 3, 2020

Thursday, March 26, 2020

PhD thesis defense

Congratulations to Dr. Justin Sybrandt for successfully defending his Ph.D. thesis "Exploiting Latent Features of Text and Graphs"! Justin will join Google Brain this summer.

Wednesday, February 19, 2020

New papers on biomedical NLP and hypothesis generation

New papers on biomedical NLP+hypothesis generation.

Sybrandt, Safro "CBAG: Conditional Biomedical Abstract Generation", http://arxiv.org/pdf/2002.05637.pdf,

Sybrandt, Tyagin, Shtutman, Safro "AGATHA: Automatic Graph-mining and Transformer based Hypothesis Generation" http://arxiv.org/pdf/2002.05635.pdf

Friday, January 17, 2020

Congratulations to PhD student Ruslan Shaydulin

Congratulations to PhD student Ruslan Shaydulin, advised by Dr. Ilya Safro, for receiving 2020 Maria Goeppert Mayer fellowship for postdoctoral studies at Argonne National Lab. Ruslan will start in Fall 2020. He was selected among many candidates from top schools and different disciplines.

Saturday, October 19, 2019

How to find the best location for wireless charging lanes

Our paper is accepted in Journal of Industrial Management and Optimization

Ushijima-Mwesigwa, Khan, Chowdhury, Safro "Optimal Installation for Electric Vehicle Wireless Charging Lanes", 2019

The emergence of electric vehicle wireless charging technology, where a whole lane can be turned into a charging infrastructure, leads to new challenges in the design and analysis of road networks. From a network perspective, a major challenge is determining the most important nodes with respect to the placement of the wireless charging lanes. In other words, given a limited budget, cities could face the decision problem of where to place these wireless charging lanes. With a heavy price tag, a placement without a careful study can lead to inefficient use of limited resources. In this work, the placement of wireless charging lanes is modeled as an integer programming problem. The basic formulation is used as a building block for different realistic scenarios. We carry out experiments using real geospatial data and compare our results to different network-based heuristics.

Reproducibility: all datasets, algorithm implementations and mathematical programming formulation presented in this work are available at https://github.com/hmwesigwa/smartcities.git

Monday, October 14, 2019

Congratulations to PhD student Justin Sybrandt for being selected in top 12 among more than 3000 summer interns based on his achievements. Over the summer, Justin was an intern at Facebook working on Instagram.

Wednesday, September 25, 2019

Multistart Methods for Quantum Approximate Optimization

Best student paper award at IEEE HPEC2019 goes to our lab! Congratulations to the leading student Ruslan Shaydulin!

Ruslan Shaydulin, Ilya Safro, Jeffrey Larson "Multistart Methods for Quantum Approximate Optimization", accepted at IEEE High Performance Extreme Computing Conference (HPEC) 2019, preprint at https://arxiv.org/abs/1905.08768

Hybrid quantum-classical algorithms such as the quantum approximate optimization algorithm (QAOA) are considered one of the most promising approaches for leveraging near-term quantum computers for practical applications. Such algorithms are often implemented in a variational form, combining classical optimization methods with a quantum machine to find parameters to maximize performance. The quality of the QAOA solution depends heavily on quality of the parameters produced by the classical optimizer. Moreover, multiple local optima in the space of parameters make it harder for the classical optimizer. In this paper we study the use of a multistart optimization approach within a QAOA framework to improve the performance of quantum machines on important graph clustering problems. We also demonstrate that reusing the optimal parameters from similar problems can improve the performance of classical optimization methods, expanding on similar results for MAXCUT.

Thursday, June 20, 2019

The best of both worlds: how to solve real problems on modern quantum computers

Our work on hybrid quantum-classical algorithms is featured in news

The best of both worlds: how to solve real problems on modern quantum computers

Wednesday, June 19, 2019

Hybrid quantum-classical algorithms

Our paper on hybrid quantum-classical algorithms is featured in IEEE Computer, the June's issue on quantum realism.

Ruslan Shaydulin, Hayato Ushijima-Mwesigwa, Christian F.A. Negre, Ilya Safro, Susan M. Mniszewski, Yuri Alexeev "Hybrid Approach for Solving Optimization Problems on Small Quantum Computers", IEEE Computer, vol. 52(6), pp. 18-26, 2019

Solving larger-sized problems is an important area of research in quantum computing. Designing hybrid quantum-classical algorithms is a promising approach to solving this. We discuss decomposition-based hybrid approaches for solving optimization problems and demonstrate them for applications related to community detection.

Thursday, May 23, 2019

Solving network community detection problem on quantum computers

Accepted paper in Advanced Quantum Technology journal

Shaydulin, Ushijima-Mwesigwa, Safro, Mniszewski, Alexeev "Network Community Detection On Small Quantum Computers", 2019, preprint at https://arxiv.org/abs/1810.12484

In recent years, a number of quantum computing devices with small numbers of qubits have become available. A hybrid quantum local search (QLS) approach that combines a classical machine and a small quantum device to solve problems of practical size is presented. The proposed approach is applied to the network community detection problem. QLS is hardware‐agnostic and easily extendable to new quantum computing devices as they become available. It is demonstrated to solve the 2‐community detection problem on graphs of sizes of up to 410 vertices using the 16‐qubit IBM quantum computer and D‐Wave 2000Q, and compare their performance with the optimal solutions. The results herein demonstrate that QLS performs similarly in terms of quality of the solution and the number of iterations to convergence on both types of quantum computers and it is capable of achieving results comparable to state‐of‐the‐art solvers in terms of quality of the solution including reaching the optimal solutions.

Monday, May 13, 2019

PhD thesis proposal defense

Congratulations to Justin Sybrandt for successfully defending his PhD thesis proposal!

Monday, May 6, 2019

Synthetic planar graph generation

Accepted paper in Applied Network Science journal

Chauhan, Gutfraind, Safro "Multiscale planar graph generation", 2019, preprint at https://arxiv.org/abs/1802.09617

The study of network representations of physical, biological, and social phenomena can help us better understand their structure and functional dynamics as well as formulate predictive models of these phenomena. However, due to the scarcity of real-world network data owing to factors such as cost and effort required in collection of network data and the sensitivity of this data towards theft and misuse, engineers and researchers often rely on synthetic data for simulations, hypothesis testing, decision making, and algorithm engineering. An important characteristic of infrastructure networks such as roads, water distribution and other utility systems is that they can be (almost fully) embedded in a plane, therefore to simulate these system we need realistic networks which are also planar. While the currently-available synthetic network generators can model networks that exhibit realism, they do not guarantee or achieve planarity. In this paper we present a flexible algorithm that can synthesize realistic networks that are planar. The method follows a multi-scale randomized editing approach generating a hierarchy of coarsened networks of a given planar graph and introducing edits at various levels in the hierarchy. The method preserves the structural properties with minimal bias including the planarity of the network, while introducing realistic variability at multiple scales.

Reproducibility: All datasets and algorithm implementation presented in this work are available at https://bit.ly/2CjOUAS

Monday, April 29, 2019

Designing scalable nonlinear support vector machines

Accepted paper in Machine Learning journal

Sadrfaridpour, Razzaghi, Safro, "Engineering fast multilevel support vector machines", 2019, preprint at https://arxiv.org/abs/1707.07657

The computational complexity of solving nonlinear support vector machine (SVM) is prohibitive on large-scale data. In particular, this issue becomes very sensitive when the data represents additional difficulties such as highly imbalanced class sizes. Typically, nonlinear kernels produce significantly higher classification quality to linear kernels but introduce extra kernel and model parameters which requires computationally expensive fitting. This increases the quality but also reduces the performance dramatically. We introduce a generalized fast multilevel framework for regular and weighted SVM and discuss several versions of its algorithmic components that lead to a good trade-off between quality and time. Our framework is implemented using PETSc which allows an easy integration with scientific computing tasks. The experimental results demonstrate significant speed up compared to the state-of-the-art nonlinear SVM libraries.

Reproducibility: our source code, documentation and parameters are available at https://github.com/esadr/mlsvm

Saturday, March 30, 2019

Finding influential nodes in networks with consumable resources

Accepted paper in Network Science journal

Ushijima-Mwesigwa, Khan, Chowdhury, Safro, "Centralities for Networks with Consumable Resources", 2019, preprint at https://arxiv.org/abs/1903.00642

Identification of influential nodes is an important step in understanding and controlling the dynamics of information, traffic, and spreading processes in networks. As a result, a number of centrality measures have been proposed and used across different application domains. At the heart of many of these measures lies an assumption describing the manner in which traffic (of information, social actors, particles, etc.) flows through the network. For example, some measures only count shortest paths while others consider random walks. This paper considers a spreading process in which a resource necessary for transit is partially consumed along the way while being refilled at special nodes on the network. Examples include fuel consumption of vehicles together with refueling stations, information loss during dissemination with error-correcting nodes, and consumption of ammunition of military troops while moving. We propose generalizations of the well-known measures of betweenness, random-walk betweenness, and Katz centralities to take such a spreading process with consumable resources into account. In order to validate the results, experiments on real-world networks are carried out by developing simulations based on well-known models such as Susceptible-Infected-Recovered and congestion with respect to particle hopping from vehicular flow theory. The simulation-based models are shown to be highly correlated with the proposed centrality measures.

Friday, March 29, 2019

Future of medicine - man or machine

Some thoughts about literature based discovery and our automated biomedical hypothesis generation tool MOLIERE for Clemson World Magazine

Saturday, February 9, 2019

Does it help to charge the electric cars at intersections?

Accepted paper in Computer-Aided Civil and Infrastructure Engineering

Khan, Khan Chowdhury, Safro, Ushijima-Mwesigwa "Wireless Charging Utility Maximization and Intersection Control Delay Minimization Framework for Electric Vehicles"

This study presents the Wireless Charging Utility Maximization (WCUM) framework, which aims to maximize the utility of Wireless Charging Units (WCUs) for electric vehicle (EV) charging through the optimal WCU deployment at signalized intersections. Furthermore, the framework aims to minimize the control delay at all signalized intersections of the network. The framework consists of a two‐step optimization formulation, a dynamic traffic assignment model to calculate the user equilibrium, a traffic microsimulator to formulate the objective functions, and a global Mixed Integer Non‐Linear Programming (MINLP) optimization solver. An optimization problem is formulated for each intersection, and another for the entire network. The performance of the WCUM framework is tested using the Sioux Falls network. We perform a comparative study of 12 global MINLP solvers with a case study. Based on solution quality and computation time, we choose the Couenne solver for this framework.

https://onlinelibrary.wiley.com/doi/abs/10.1111/mice.12439

Sunday, February 3, 2019

Predicting bariatric surgery outcomes

Accepted paper in Annals of Operations Research

Razzaghi, Safro, Ewing, Sadrfaridpour, Scott "Predictive models for bariatric surgery risks with imbalanced medical datasets"

Bariatric surgery (BAR) has become a popular treatment for type 2 diabetes mellitus which is among the most critical obesity-related comorbidities. Patients who have bariatric surgery, are exposed to complications after surgery. Furthermore, the mid- to long-term complications after bariatric surgery can be deadly and increase the complexity of managing safety of these operations and healthcare costs. Current studies on BAR complications have mainly used risk scoring for identifying patients who are more likely to have complications after surgery. Though, these studies do not take into consideration the imbalanced nature of the data where the size of the class of interest (patients who have complications after surgery) is relatively small. We propose the use of imbalanced classification techniques to tackle the imbalanced bariatric surgery data: synthetic minority oversampling technique (SMOTE), random undersampling, and ensemble learning classification methods including Random Forest, Bagging, and AdaBoost. Moreover, we improve classification performance through using Chi-squared, Information Gain, and Correlation-based feature selection techniques. We study the Premier Healthcare Database with focus on the most-frequent complications including Diabetes, Angina, Heart Failure, and Stroke. Our results show that the ensemble learning-based classification techniques using any feature selection method mentioned above are the best approach for handling the imbalanced nature of the bariatric surgical outcome data. In our evaluation, we find a slight preference toward using SMOTE method compared to the random undersampling method. These results demonstrate the potential of machine-learning tools as clinical decision support in identifying risks/outcomes associated with bariatric surgery and their effectiveness in reducing the surgery complications as well as improving patient care.

Monday, January 28, 2019

AI@Clemson

It was great discussing AI and text mining at Clemson University research symposium on AI.

Saturday, January 12, 2019

Welcome new students!

Two new students, Zirou Qiu (MSc) and Korey Palmer (senior undergrad) are joining our research group.

Friday, January 11, 2019

Relaxation-Based Coarsening for Multilevel Hypergraph Partitioning

Accepted paper at SIAM Multiscale Modeling and Simulations

Ruslan Shaydulin, Jie Chen, Ilya Safro "Relaxation-Based Coarsening for Multilevel Hypergraph Partitioning", 2019, preprint at arXiv:1710.06552

Multilevel partitioning methods that are inspired by principles of multiscaling are the most powerful practical hypergraph partitioning solvers. Hypergraph partitioning has many applications in disciplines ranging from scientific computing to data science. In this paper we introduce the concept of algebraic distance on hypergraphs and demonstrate its use as an algorithmic component in the coarsening stage of multilevel hypergraph partitioning solvers. The algebraic distance is a vertex distance measure that extends hyperedge weights for capturing the local connectivity of vertices which is critical for hypergraph coarsening schemes. The practical effectiveness of the proposed measure and corresponding coarsening scheme is demonstrated through extensive computational experiments on a diverse set of problems. Finally, we propose a benchmark of hypergraph partitioning problems to compare the quality of other solvers.

Monday, November 26, 2018

Thesis defense

Congratulations to Varsha Chauhan for successfully defending her MSc thesis on planar graph generation!

Travel awards

Congratulations to Justin Sybrandt and Ruslan Shaydulin for receiving travel awards to present their papers at IEEE Big Data 2018 and APS 2018!

Thesis defense

Congratulations to Dr. Hayato Ushijima-Mwesigwa for successfully defending his Ph.D. thesis "Models for Networks with Consumable Resources"!

Community detection on NISQ devices

Accepted paper at at 3rd International Workshop on Post Moore's Era 2018

Supercomputing (PMES 2018)

Ruslan Shaydulin, Haayto Ushijima-Mwesigwa, Ilya Safro, Susan Mniszewski, Yuri Alexeev "Community Detection Across Emerging Quantum Architectures", preprint at arXiv:1810.07765, 2018

Sunday, November 25, 2018

Can we predict crimes in Chicago?

Our paper is accepted at IEEE Big Data 2018

Saroj K. Dash, I. Safro, Ravisutha S. Srinivasamurthy "Spatio-temporal prediction of crimes using network analytic approach", preprint at arXiv:1808.06241, 2018

It is quite evident that majority of the population lives in urban area today than in any time of the human history. This trend seems to increase in coming years. Studies say that nearly 80.7% of total population in USA stays in urban area. By 2030 nearly 60% of the population in the world will live in or move to cities. With the increase in urban population, it is important to keep an eye on criminal activities. By doing so, governments can enforce intelligent policing systems and hence many government agencies and local authorities have made the crime data publicly available. In this paper, we analyze Chicago city crime data fused with other social information sources using network analytic techniques to predict criminal activity for the next year. We observe that as we add more layers of data which represent different aspects of the society, the quality of prediction is improved. Our prediction models not just predict total number of crimes for the whole Chicago city, rather they predict number of crimes for all types of crimes and for different regions in City of Chicago.

Saturday, November 24, 2018

Two papers accepted at IEEE Big Data 2018

Sybrandt, Carrabba, Herzog, Safro "Are Abstracts Enough for Hypothesis Generation?", arXiv:1804.05942

Sybrandt, Shtutman, Safro "Large-Scale Validation of Hypothesis Generation Systems via Candidate Ranking", arXiv:1802.03793