Safro Research Group @ University of Delaware

Thursday, January 4, 2024

Benchmarking Biomedical Literature-based Discovery and Hypothesis Generation Systems

Update: The paper is accepted for publication in BMC Bioinformatics.

We introduce a benchmarking framework Dyport for evaluating biomedical hypothesis generation (HG) and literature based discovery (LBD) systems. The evaluation of HG and LBD is still one of the major problems of these systems, especially when it comes to fully automated large-scale general purpose systems. For these, a massive assessment (that is normal in the machine learning and general AI domains) is often infeasible. One traditional evaluation approach is to make a system “rediscover” some of the landmark findings. However, this approach does not scale. Another traditional approach is to automatically discover some information in biomedical texts, train the system on historical data and test it on that "future" information. While this approach does scale well, the reliability and biomedical importance of the extracted test set are far from being illuminating.

Utilizing curated datasets, Dyport tests HG/LBD systems under realistic conditions, enhancing the relevance of our evaluations. We integrate knowledge from the curated databases into a dynamic graph, accompanied by a method to quantify discovery importance. This not only assesses hypothesis accuracy but also their potential impact in biomedical research which significantly extends traditional link prediction benchmarks. Applicability of our benchmarking process is demonstrated on several link prediction systems applied on biomedical semantic knowledge graphs. Being flexible, our benchmarking system is designed for broad application in hypothesis generation quality verification, aiming to expand the scope of scientific discovery within the biomedical research community.

Dyport is available at https://github.com/IlyaTyagin/Dyport

Paper: https://arxiv.org/pdf/2312.03303.pdf

Thursday, December 14, 2023

Ankit Kulshrestha's PhD Defense

Congratulations to Dr. Ankit Kulshrestha for successfully defending his Ph.D. thesis "A Machine Learning Approach To Improve Scalability and Robustness of Variational Quantum Circuits"! Ankit will join the quantum computing research group at Fujitsu Research USA.

Friday, September 1, 2023

Welcoming our new PhD students!

We're pleased to share that our group has expanded with the addition of two PhD students.

Bao Bach, joining from the Quantum Science and Engineering program, will be working on quantum machine learning and variational quantum algorithms.

Saeideh Valipour has started her journey with us through the Computer Science PhD program. Her research areas include text mining, machine learning, and biomedical data analysis.

Let's extend a warm welcome to Bao and Saeideh as they begin their research endeavors with our group.

Tuesday, August 29, 2023

Two papers are accepted at IEEE High-Performance and Extreme Computing

We are happy to announce that two of our papers have been accepted at the IEEE High-performance and Extreme Computing Conference (HPEC 2023).

"Hybrid Quantum-Classical Multilevel Approach for Maximum Cuts on Graphs" by Anthony Angone, Xiaoyuan Liu, Ruslan Shaydulin, and Ilya Safro.

This paper is focusing on the Max-Cut problem. The study introduces a scalable hybrid multilevel approach to solve large instances of Max-Cut using both classical solvers and the quantum approximate optimization algorithm (QAOA). The results showcase the excellent performance of both classical and hybrid quantum-classical methods.

The solver is publicly available at https://github.com/angone/MLMax-cut.

Congratulations to the lead student and first author, Anthony Angone!

"Decomposition Based Refinement for the Network Interdiction Problem" by Krish Matta, Xiaoyuan Liu, and Ilya Safro.

This paper introduces a novel algorithm for the shortest path network interdiction problem. By decomposing the problem into sub-problems, the method efficiently utilizes Ising Processing Units alongside classical solvers. The results demonstrate comparable quality to existing exact solvers and also highlight a significant reduction in computational time for large-scale instances.

The source code and experimental results can be accessed at https://github.com/krishxmatta/network-interdiction.

A special shout-out to the lead student and first author, Krish Matta. A significant portion of his paper was accomplished during his high school years! Once again, congratulations to Anthony and Krish for their outstanding work and contributions to the field. We look forward to presenting these papers at HPEC 2023!

Monday, July 10, 2023

Learning To Optimize Quantum Neural Network Without Gradients

Our paper is accepted in IEEE Quantum Computing and Engineering (QCE) 2023. The preprint is available at https://arxiv.org/abs/2304.07442

Congratulations to PhD student Ankit Kulshrestha and two former students from our lab, now quantum computing research scientists at Fujitsu Research, Xiaoyuan Liu and Hyato Ushijima-Mwesigwa!

Quantum Machine Learning is an emerging sub-field in machine learning where one of the goals is to perform pattern recognition tasks by encoding data into quantum states. This extension from classical to quantum domain has been made possible due to the development of hybrid quantum-classical algorithms that allow a parameterized quantum circuit to be optimized using gradient based algorithms that run on a classical computer. The similarities in training of these hybrid algorithms and classical neural networks has further led to the development of Quantum Neural Networks (QNNs). However, in the current training regime for QNNs, the gradients w.r.t objective function have to be computed on the quantum device. This computation is highly non-scalable and is affected by hardware and sampling noise present in the current generation of quantum hardware. In this paper, we propose a training algorithm that does not rely on gradient information. Specifically, we introduce a novel meta-optimization algorithm that trains a meta-optimizer network to output parameters for the quantum circuit such that the objective function is minimized. We empirically and theoretically show that we achieve a better quality minima in fewer circuit evaluations than existing gradient based algorithms on different datasets.

Thursday, June 1, 2023

Accepted paper in Nature Physics Reviews

Our paper "Quantum Computing for Finance" is accepted in Nature Physics Reviews.

https://www.nature.com/articles/s42254-023-00603-1

The paper and its long version available in arXiv https://arxiv.org/pdf/2201.02773.pdf discuss not only the quantum algorithms in one of the most fascinating application areas but also the realistic obstacles the quantum computing faces to achieve any practical advantage over classical computing.

Monday, May 1, 2023

Graduate student achievement

Congratulations to our PhD student Ankit Kulshrestha for receiving Rama and Shashi Marda fellowship for graduate studies! Ankit's research is in the areas of quantum and classical machine learning.

Saturday, April 1, 2023

Biomedical hypothesis generation and literature-based discovery for landscape design

It is incredibly exciting to see our work transcend the bounds of its intended domain and find novel applications in unexpected areas. We initially developed AGATHA, a biomedical hypothesis generation system, to facilitate knowledge discovery within biomedical sciences. Yet, to our surprise and excitement, AGATHA's algorithm found an unanticipated yet transformative use in landscape design. AGATHA's unique ability to identify intricate conceptual relationships was exploited to understand the nexus between emerging infectious diseases (EIDs) and deforestation, enabling landscape planners to tread innovative research pathways.

Our paper is available at arXiv https://arxiv.org/abs/2306.02588

David Marasco, Ilya Tyagin, Justin Sybrandt, James H. Spencer, Ilya Safro "Literature-based Discovery for Landscape Planning"

Sunday, January 15, 2023

Generating and optimizing water networks

Our paper is accepted in the Journal of Pipeline Systems Engineering!

Ahmad Momeni, Varsha Chauhan, Abdulrahman Bin Mahmoud, Kalyan Piratla, Ilya Safro "Generation of Synthetic Water Distribution Data Using a Multi-Scale Generator-Optimizer", Journal of Pipeline Systems Engineering and Practice, vol. 14(1), https://ascelibrary.org/doi/full/10.1061/JPSEA2.PSENG-1358, 2023

Rare or limited access to real-world data has widely been a stumbling block for the development and employment of design optimization and simulation models in water distribution systems (WDS). Primary reasons for such accessibility issues could include data unavailability and high security protocols. Synthetic data can play a major role as a reliable alternative to mimic and replicate real-world WDS for modeling purposes. This study puts forth a comprehensive approach to generate synthetic WDS infrastructural data by (1) employing graph-theory concepts to generate multitudinous WDS skeleton layouts through retaining the critical topological features of a given real WDS; and (2) assigning component sizes and operational features such as nodal demands, pump curves, pipe sizes, and tank elevations to the generated WDS skeleton layouts through a multiobjective genetic algorithm (GA)–based design optimization scheme. Thousands of such generated-optimized networks are statistically analyzed in terms of the fundamental WDS characteristics both collectively and granularly. An outstanding novelty in this study includes an automatedly integrated algorithmic function that attempts to (1) simultaneously optimize the generated network in a biobjective scheme, (2) rectify pipe intersections that violate pipeline embedding standards, and (3) correct the unusual triangular loops in the generator by honoring the conventional square-shaped loop connectivity in a WDS. The proposed modeling approach was demonstrated in this study using the popular Anytown water distribution benchmark system. Generation and optimization of such representative synthetic networks pave the way for extensive access to representative case-study models for academic and industrial purposes while the security of the real-world infrastructure data is not compromised.

Friday, September 30, 2022

Optimal Contraction Trees for Tensor Network Quantum Circuit Simulation

We are excited to announce that our paper received the best student paper award at #IEEE High Performance Extreme Computing #HPEC2022! Leading student is Cameron Ibrahim. Congratulations to all!

Cameron Ibrahim, Danylo Lykov, Zichang He, Yuri Alexeev, Ilya Safro "Constructing Optimal Contraction Trees for Tensor Network Quantum Circuit Simulation", 2022, preprint at https://arxiv.org/abs/2209.02895

One of the key problems in tensor network based quantum circuit simulation is the construction of a contraction tree which minimizes the cost of the simulation, where the cost can be expressed in the number of operations as a proxy for the simulation running time. This same problem arises in a variety of application areas, such as combinatorial scientific computing, marginalization in probabilistic graphical models, and solving constraint satisfaction problems. In this paper, we reduce the computationally hard portion of this problem to one of graph linear ordering, and demonstrate how existing approaches in this area can be utilized to achieve results up to several orders of magnitude better than existing state of the art methods for the same running time. To do so, we introduce a novel polynomial time algorithm for constructing an optimal contraction tree from a given order. Furthermore, we introduce a fast and high quality linear ordering solver, and demonstrate its applicability as a heuristic for providing orderings for contraction trees. Finally, we compare our solver with competing methods for constructing contraction trees in quantum circuit simulation on a collection of randomly generated Quantum Approximate Optimization Algorithm Max Cut circuits and show that our method achieves superior results on a majority of tested quantum circuits.

Thursday, July 14, 2022

Recently accepted papers in optimization and quantum and quantum inspired computing.

Xiaoyuan Liu, Hayato Ushijima-Mwesigwa, Indradeep Ghosh, Ilya Safro "Partitioning Dense Graphs with Hardware Accelerators", International Conference on Computational Science (ICCS), 2022, preprint at https://arxiv.org/pdf/2202.09420.pdf
Ankit Kulshrestha, Ilya Safro "BEINIT: Avoiding Barren Plateaus in Variational Quantum Algorithms", IEEE Quantum Computing and Engineering (QCE), 2022, preprint at https://arxiv.org/abs/2204.13751
Xiaoyuan Liu, Ruslan Shaydulin, Ilya Safro "Quantum Approximate Optimization Algorithm with Sparsified Phase Operator", IEEE Quantum Computing and Engineering (QCE), 2022, preprint at https://arxiv.org/abs/2205.00118
Xiaoyuan Liu, Hayato Ushijima-Mwesigwa, Avradip Mandal, Sarvagya Upadhyay, Ilya Safro, Arnab Roy "Leveraging Special-Purpose Hardware for Local Search Heuristics", Computational Optimization and Applications, 2022, preprint at https://arxiv.org/pdf/1911.09810.pdf

Wednesday, June 1, 2022

Quantum Computing for Finance

Quantum computers are expected to surpass the computational capabilities of classical computers and have transformative impact on numerous industry sectors, particularly finance. Finance is estimated to be one of the first industry sectors to benefit from quantum computing, not only in the medium and long terms, but even in the short term (however, this is subject to overcoming several engineering and algorithmic obstacles).

Our team (Argonne National Lab - JPMorgan Chase - Menten AI - University of Chicago - University of Delaware) presents a comprehensive summary of the state of the art of quantum computing for financial applications, with particular emphasis on stochastic modeling, optimization, and machine learning, describing how these solutions, adapted to work on a quantum computer, can potentially help to solve financial problems, such as derivative pricing, risk modeling, portfolio optimization, natural language processing, and fraud detection, more efficiently and accurately. We also discuss the feasibility of these algorithms on nearterm quantum computers with various hardware implementations and demonstrate how they relate to a wide range of use cases in finance as well as discuss various obstacles to achieve these goals. We hope this article will not only serve as a reference for academic researchers and industry practitioners but also inspire new ideas for future research.

Herman, Dylan, Cody Googin, Xiaoyuan Liu, Alexey Galda, Ilya Safro, Yue Sun, Marco Pistoia, and Yuri Alexeev. "A survey of quantum computing for finance." arXiv preprint https://arxiv.org/abs/2201.02773 (2022).

Monday, May 23, 2022

Graduate Student Achievement Award

Congratulations to my former PhD student Xiaoyuan Joey Liu for winning Frank A. Pehrson Graduate Student Achievement Award! Her thesis is titled "Combinatorial Optimization Algorithms on Quantum and Quantum Inspired Devices". Xiaoyuan is now a research scientist at Fujitsu working in the area of quantum computing. This award is given to a computer science graduate student in recognition of outstanding performance, and potential for future significant contribution to the field of computer science.

Wednesday, December 29, 2021

Accelerating COVID-19 scientific discovery with hypothesis generation system

In 2020, the White House released the, “Call to Action to the Tech Community on New Machine Readable COVID-19 Dataset,” wherein artificial intelligence experts are asked to collect data and develop text mining techniques that can help the science community answer high-priority scientific questions related to COVID-19. The Allen Institute for AI and collaborators announced the availability of a rapidly growing open dataset of publications, the COVID-19 Open Research Dataset (CORD-19). As the pace of research accelerates, biomedical scientists struggle to stay current. To expedite their investigations, scientists leverage hypothesis generation systems, which can automatically inspect published papers to discover novel implicit connections.

Ilya Tyagin, Ankit Kulshrestha, Justin Sybrandt, Krish Matta, Michael Shtutman, Ilya Safro "Accelerating COVID-19 research with graph mining and transformer-based learning", Innovative Applications of Artificial Intelligence (AAAI/IAAI), preprint at https://www.biorxiv.org/content/10.1101/2021.02.11.430789v1, 2022

We present an automated general purpose hypothesis generation systems AGATHA-C and AGATHA-GP for COVID-19 research. The systems are based on graph-mining and the transformer model. The systems are massively validated using retrospective information rediscovery and proactive analysis involving human-in-the-loop expert analysis. Both systems achieve high-quality predictions across domains (in some domains up to 0.97% ROC AUC) in fast computational time and are released to the broad scientific community to accelerate biomedical research. In addition, by performing the domain expert curated study, we show that the systems are able to discover on-going research findings such as the relationship between COVID-19 and oxytocin hormone.

Transferring QAOA parameters from small instances to large

The Quantum approximate optimization algorithm (QAOA) is one of the most promising candidates for achieving quantum advantage through quantum-enhanced combinatorial optimization. However, finding optimal parameters to run QAOA is a hard problem that is usually tackled by variational loop, i.e., the circuit is executed with the current parameters on a quantum device, and then the parameters are updated on a classical device. Such variational loops are extremely time consuming. In our recent paper we find the QAOA parameters for small optimization instances and transfer them to the larger ones accelerating the entire process.

Alexey Galda, Xiaoyuan Liu, Danylo Lykov, Yuri Alexeev, Ilya Safro "Transferability of optimal QAOA parameters between random graphs", IEEE International Conference on Quantum Computing and Engineering (QCE), preprint at https://arxiv.org/pdf/2106.07531.pdf, 2021

In a typical QAOA setup, a set of quantum circuit parameters is optimized to prepare a quantum state used to find the optimal solution of a combinatorial optimization problem. Several empirical observations about optimal parameter concentration effects for special QAOA MaxCut problem instances have been made in recent literature, however, a rigorous study of the subject is still lacking. We show that convergence of the optimal QAOA parameters around specific values and, consequently, successful transferability of parameters between different QAOA instances can be explained and predicted based on the local properties of the graphs, specifically the types of subgraphs (lightcones) from which the graphs are composed. We apply this approach to random regular and general random graphs. For example, we demonstrate how optimized parameters calculated for a 6-node random graph can be successfully used without modification as nearly optimal parameters for a 64-node random graph, with less than 1% reduction in approximation ratio as a result. This work presents a pathway to identifying classes of combinatorial optimization instances for which such variational quantum algorithms as QAOA can be substantially accelerated.

Friday, December 10, 2021

Congratulations to Dr. Joey!

Congratulations to Dr. Xiaoyuan Joey Liu for successfully defending her Ph.D. thesis "Combinatorial Optimization Algorithms on Quantum and Quantum-inspired Devices"! Joey will join quantum computing research group at Fujitsu Research USA.

Wednesday, September 1, 2021

NIH funding for literature based discovery

NIH awarded $2.11M grant for University of South Carolina (PI Shtutman) - University of Delaware (PI Safro) collaborative project "Knowledge discovery and machine learning to elucidate the mechanisms of HIV activity and interaction with substance use disorder." This work will leverage our hypothesis generation model AGATHA that is based on information extracted from full MEDLINE.

https://github.com/IlyaTyagin/AGATHA-C-GP

Here is its recent customization for COVID-19 in which Medline is fused with CORD-19, the dataset of all COVID-19 related papers.

https://arxiv.org/abs/2102.07631

Monday, August 30, 2021

Tutorial on the quantum approximate optimization algorithm, its applications and simulation

We recorded this tutorial for IEEE International Conference on Quantum Computing and Engineering (QCE) 2020. This tutorial consists of four parts:

QAOA theory and quantum computing basics
Hands on example of QAOA and MaxCut
Introduction to problem decomposition and solving large-scale problems with QAOA
Tensor networks and simulation of QAOA with classical computers

Thursday, February 25, 2021

Multilevel Graph Partitioning for Three-Dimensional Discrete Fracture Network Flow Simulations

Combinatorial scientific computing in action! Our paper on accelerating 3D discrete fracture network flow simulations by multilevel graph partitioning is accepted in Mathematical Geosciences!

Hayato Ushijima-Mwesigwa, Jeffrey D. Hyman, Aric Hagberg, Ilya Safro, Satish Karra, Carl W. Gable, Gowri Srinivasan "Multilevel Graph Partitioning for Three-Dimensional Discrete Fracture Network Flow Simulations", accepted in Mathematical Geosciences, preprint at https://arxiv.org/abs/1902.08029, 2020

We present a topology-based method for mesh-partitioning in three-dimensional discrete fracture network (DFN) simulations that takes advantage of the intrinsic multi-level nature of a DFN. DFN models are used to simulate flow and transport through low-permeability fracture media in the subsurface by explicitly representing fractures as discrete entities. The governing equations for flow and transport are numerically integrated on computational meshes generated on the interconnected fracture networks. Modern high-fidelity DFN simulations require high-performance computing on multiple processors where performance and scalability depends partially on obtaining a high-quality partition of the mesh to balance work work-loads and minimize communication across all processors.

The discrete structure of a DFN naturally lends itself to various graph representations, which can be thought of as coarse-scale representations of the computational mesh. Using this concept, we develop a variant of the multilevel graph partitioning algorithm to partition the mesh of a DFN. We compare the performance of this DFN-based mesh-partitioning with standard multi-level graph partitioning using graphbased metrics (cut, imbalance, partitioning time), computational-based metrics (FLOPS, iterations, solver time), and total run time. The DFN-based partition and the mesh-based partition are comparable in terms of the graph-based metrics, but the time required to obtain the partition is several orders of magnitude faster using the DFN-based partition. The computation-based metrics show comparable performance between both methods so, in combination, the DFN-based partition is several orders of magnitude faster than the mesh-based partition.

ELRUNA: Network Alignment Algorithm

Our network alignment algorithm ELRUNA is accepted in ACM Journal of Experimental Algorithmics. Turns out that three relatively simple node similarity rules can successfully compete with several state of the art algorithms and improve both the running time and alignment quality. You can get it at https://github.com/BridgelessAlexQiu/ELRUNA

Zirou Qiu, Ruslan Shaydulin, Xiaoyuan Liu, Yuri Alexeev, Christopher S. Henry, Ilya Safro "ELRUNA: Elimination Rule-based Network Alignment", accepted in ACM Journal of Experimental Algorithmics, preprint at https://arxiv.org/abs/1911.05486, 2020

Networks model a variety of complex phenomena across different domains. In many applications, one of the most essential tasks is to align two or more networks to infer the similarities between cross-network vertices and discover potential node-level correspondence. In this paper, we propose ELRUNA (Elimination rule-based network alignment), a novel network alignment algorithm that relies exclusively on the underlying graph structure. Under the guidance of the elimination rules that we defined, ELRUNA computes the similarity between a pair of cross-network vertices iteratively by accumulating the similarities between their selected neighbors. The resulting cross-network similarity matrix is then used to infer a permutation matrix that encodes the final alignment of cross-network vertices. In addition to the novel alignment algorithm, we also improve the performance of local search, a commonly used post-processing step for solving the network alignment problem, by introducing a novel selection method RAWSEM (Random walk based selection method) based on the propagation of the levels of mismatching (defined in the paper) of vertices across the networks. The key idea is to pass on the initial levels of mismatching of vertices throughout the entire network in a random-walk fashion. Through extensive numerical experiments on real networks, we demonstrate that ELRUNA significantly outperforms the state-of-the-art alignment methods in terms of alignment accuracy under lower or comparable running time. Moreover, ELRUNA is robust to network perturbations such that it can maintain a close to optimal objective value under a high level of noise added to the original networks. Finally, the proposed RAWSEM can further improve the alignment quality with a less number of iterations compared with the naive local search method.

Thursday, February 11, 2021

Can we outperform Quantum Approximate Optimization Algorithm?

Check our new paper:

Xiaoyuan Liu, Anthony Angone, Ruslan Shaydulin, Ilya Safro, Yuri Alexeev, Lukasz Cincio "Layer VQE: A Variational Approach for Combinatorial Optimization on Noisy Quantum Computers", preprint at https://arxiv.org/abs/2102.05566, 2021

We propose a hybrid quantum-classical algorithm, Layer Variational Quantum Eigensolver (L-VQE), inspired by the Variational Quantum Eigensolver (VQE). L-VQE is a heuristic approach to solve combinatorial optimization problems on near term intermediate-scale quantum devices. We demonstrate the potential of the proposed approach by applying it to the problem of community detection, a famous problem in network science. Our large-scale numerical simulation study shows that L-VQE has the potential to outperform Quantum Approximate Optimization Algorithm (QAOA), and is more robust to sampling noise as compared with standard VQE approaches.

Literature-based knowledge discovery to accelerate COVID-19 research

Our new paper on customization of AGATHA knowledge discovery model for COVID-19 is out!

Ilya Tyagin, Ankit Kulshrestha, Justin Sybrandt, Krish Matta, Michael Shtutman, Ilya Safro

"Accelerating COVID-19 research with graph mining and transformer-based learning", 2021

https://www.biorxiv.org/content/10.1101/2021.02.11.430789v1

In 2020, the White House released the, "Call to Action to the Tech Community on New Machine Readable COVID-19 Dataset," wherein artificial intelligence experts are asked to collect data and develop text mining techniques that can help the science community answer high-priority scientific questions related to COVID-19. The Allen Institute for AI and collaborators announced the availability of a rapidly growing open dataset of publications, the COVID-19 Open Research Dataset (CORD-19). As the pace of research accelerates, biomedical scientists struggle to stay current. To expedite their investigations, scientists leverage hypothesis generation systems, which can automatically inspect published papers to discover novel implicit connections. We present an automated general purpose hypothesis generation systems AGATHA-C and AGATHA-GP for COVID-19 research. The systems are based on graph-mining and the transformer model. The systems are massively validated using retrospective information rediscovery and proactive analysis involving human-in-the-loop expert analysis. Both systems achieve high-quality predictions across domains (in some domains up to 0.97% ROC AUC) in fast computational time and are released to the broad scientific community to accelerate biomedical research. In addition, by performing the domain expert curated study, we show that the systems are able to discover on-going research findings such as the relationship between COVID-19 and oxytocin hormone.

Wednesday, December 23, 2020

Reboot the Computing-Research Publication Systems

Many good points by Moshe Vardi on current CS research publication system in the Communications of ACM

Reboot the Computing-Research Publication Systems: The virtualization of conferences due to COVID-19 has sharpened my conviction that the computing-research publication system is badly broken and in need of a serious reboot.

Saturday, December 12, 2020

Do we always need big data in text mining?

Do we always need Big Data text mining? Can we filter it? Check our new paper "Accelerating Text Mining Using Domain-Specific Stop Word Lists" accepted at IWBDR https://arxiv.org/pdf/2012.02294.pdf

Text preprocessing is an essential step in text mining. Removing words that can negatively impact the quality of prediction algorithms or are not informative enough is a crucial storage-saving technique in text indexing and results in improved computational efficiency. Typically, a generic stop word list is applied to a dataset regardless of the domain. However, many common words are different from one domain to another but have no significance within a particular domain. Eliminating domain-specific common words in a corpus reduces the dimensionality of the feature space, and improves the performance of text mining tasks. In this paper, we present a novel mathematical approach for the automatic extraction of domain-specific words called the hyperplane-based approach. This new approach depends on the notion of low dimensional representation of the word in vector space and its distance from hyperplane. The hyperplane-based approach can significantly reduce text dimensionality by eliminating irrelevant features. We compare the hyperplane-based approach with other feature selection methods, namely \c{hi}2 and mutual information. An experimental study is performed on three different datasets and five classification algorithms, and measure the dimensionality reduction and the increase in the classification performance. Results indicate that the hyperplane-based approach can reduce the dimensionality of the corpus by 90% and outperforms mutual information. The computational time to identify the domain-specific words is significantly lower than mutual information.

Thursday, December 10, 2020

Quantum Approximate Optimization Algorithm and Symmetry

Our new paper QAOA symmetry and its predictability using classical symmetry is out! We analyze the connections between the symmetries of the objective function and the symmetries of QAOA dynamics, and applications to performance prediction, simulation and more https://scirate.com/arxiv/2012.04713

Ruslan Shaydulin, Stuart Hadfield, Tad Hogg, Ilya Safro "Classical symmetries and QAOA", 2020

We study the relationship between the Quantum Approximate Optimization Algorithm (QAOA) and the underlying symmetries of the objective function to be optimized. Our approach formalizes the connection between quantum symmetry properties of the QAOA dynamics and the group of classical symmetries of the objective function. The connection is general and includes but is not limited to problems defined on graphs. We show a series of results exploring the connection and highlight examples of hard problem classes where a nontrivial symmetry subgroup can be obtained efficiently. In particular we show how classical objective function symmetries lead to invariant measurement outcome probabilities across states connected by such symmetries, independent of the choice of algorithm parameters or number of layers. To illustrate the power of the developed connection, we apply machine learning techniques towards predicting QAOA performance based on symmetry considerations. We provide numerical evidence that a small set of graph symmetry properties suffices to predict the minimum QAOA depth required to achieve a target approximation ratio on the MaxCut problem, in a practically important setting where QAOA parameter schedules are constrained to be linear and hence easier to optimize.

Friday, November 6, 2020

On improving nonlinear SVM scalability using multilevel frameworks

Our paper "AML-SVM: Adaptive Multilevel Learning with Support Vector Machines" is accepted at 2020 IEEE International Conference on Big Data. https://arxiv.org/abs/2011.02592

The support vector machines (SVM) is one of the most widely used and practical optimization based classification models in machine learning because of its interpretability and flexibility to produce high quality results. However, the big data imposes a certain difficulty to the most sophisticated but relatively slow versions of SVM, namely, the nonlinear SVM. The complexity of nonlinear SVM solvers and the number of elements in the kernel matrix quadratically increases with the number of samples in training data. Therefore, both runtime and memory requirements are negatively affected. Moreover, the parameter fitting has extra kernel parameters to tune, which exacerbate the runtime even further. This paper proposes an adaptive multilevel learning framework for the nonlinear SVM, which addresses these challenges, improves the classification quality across the refinement process, and leverages multi-threaded parallel processing for better performance. The integration of parameter fitting in the hierarchical learning framework and adaptive process to stop unnecessary computation significantly reduce the running time while increase the overall performance. The experimental results demonstrate reduced variance on prediction over validation and test data across levels in the hierarchy, and significant speedup compared to state-of-the-art nonlinear SVM libraries without a decrease in the classification quality.

Saturday, September 19, 2020

How to use NISQ quantum devices to solve large-scale problems?

Our paper on multilevel hybrid quantum-classical optimization "Multilevel Combinatorial Ooptimization Across Quantum Architectures" is accepted in ACM Transactions on Quantum Computing! Big kudos to two lead former students Hayato and Ruslan! https://arxiv.org/abs/1910.09985

Emerging quantum processors provide an opportunity to explore new approaches for solving traditional problems in the post Moore's law supercomputing era. However, the limited number of qubits makes it infeasible to tackle massive real-world datasets directly in the near future, leading to new challenges in utilizing these quantum processors for practical purposes. Hybrid quantum-classical algorithms that leverage both quantum and classical types of devices are considered as one of the main strategies to apply quantum computing to large-scale problems. In this paper, we advocate the use of multilevel frameworks for combinatorial optimization as a promising general paradigm for designing hybrid quantum-classical algorithms. In order to demonstrate this approach, we apply this method to two well-known combinatorial optimization problems, namely, the Graph Partitioning Problem, and the Community Detection Problem. We develop hybrid multilevel solvers with quantum local search on D-Wave's quantum annealer and IBM's gate-model based quantum processor. We carry out experiments on graphs that are orders of magnitudes larger than the current quantum hardware size, and we observe results comparable to state-of-the-art solvers in terms of quality of the solution.

Friday, August 21, 2020

New NSF grant to develop a simulator for quantum computing

NSF awarded grant to develop large-scale QAOA simulator! This is a collaborative project with the Yuri Alexeev@University of Chicago.

Generating biomedical scientific hypotheses with AGATHA

Accepted paper in the 29TH ACM International Conference on Information and Knowledge Management (CIKM)

Sybrandt, Tyagin, Shtutman, Safro "AGATHA: Automatic Graph-mining and Transformer based Hypothesis Generation Approach", preprint at http://arxiv.org/pdf/2002.05635.pdf

Medical research is risky and expensive. Drug discovery requires researchers to efficiently winnow thousands of potential targets to a small candidate set. However, scientists spend significant time and money long before seeing the intermediate results that ultimately determine this smaller set. Hypothesis generation systems address this challenge by mining the wealth of publicly available scientific information to predict plausible research directions. We present AGATHA, a deep-learning hypothesis generation system that learns a data-driven ranking criteria to recommend new biomedical connections. We massively validate our system with a temporal holdout wherein we predict connections first introduced after 2015 using data published beforehand. We additionally explore biomedical sub-domains, and demonstrate AGATHA's predictive capacity across the twenty most popular relationship types. Furthermore, we perform an ablation study to examine the aspects of our semantic network that most contribute to recommendation quality. Overall, AGATHA achieves best-in-class recommendation quality when compared to other hypothesis generation systems built to predict across all available biomedical literature. Reproducibility: All code, experimental data, and pre-trained models are available online: sybrandt.com/2020/agatha.

Sunday, July 12, 2020

Two papers on graph representation learning

Accepted papers at Workshop on Mining and Learning with Graphs co-located with ACM KDD 2020!

1) Ding, Zhang, Sybrandt, Safro "Unsupervised Hierarchical Graph Representation Learning by Mutual Information Maximization", 2020

Graph representation learning based on graph neural networks (GNNs) can greatly improve the performance of downstream tasks, such as node and graph classification. However, the general GNN models do not aggregate node information in a hierarchical manner, and can miss key higher-order structural features of many graphs. The hierarchical aggregation also enables the graph representations to be explainable. In addition, supervised graph representation learning requires labeled data, which is expensive and error-prone. To address these issues, we present an unsupervised graph representation learning method, Unsupervised Hierarchical Graph Representation (UHGR), which can generate hierarchical representations of graphs. Our method focuses on maximizing mutual information between "local" and high-level "global" representations, which enables us to learn the node embeddings and graph embeddings without any labeled data. To demonstrate the effectiveness of the proposed method, we perform the node and graph classification using the learned node and graph embeddings. The results show that the proposed method achieves comparable results to state-of-the-art supervised methods on several benchmarks. In addition, our visualization of hierarchical representations indicates that our method can capture meaningful and interpretable clusters.

2) Sybrandt, Safro "FOBE and HOBE: First- and High-Order Bipartite Embeddings", 2020

Typical graph embeddings may not capture type-specific bipartite graph features that arise in such areas as recommender systems, data visualization, and drug discovery. Machine learning methods utilized in these applications would be better served with specialized embedding techniques. We propose two embeddings for bipartite graphs that decompose edges into sets of indirect relationships between node neighborhoods. When sampling higher-order relationships, we reinforce similarities through algebraic distance on graphs. We also introduce ensemble embeddings to combine both into a "best of both worlds" embedding. The proposed methods are evaluated on link prediction and recommendation tasks and compared with other state-of-the-art embeddings. While being all highly beneficial in applications, we demonstrate that none of the considered embeddings is clearly superior (in contrast to what is claimed in many papers), and discuss the trade offs present among them.

Wednesday, June 10, 2020

PhD thesis defense

Congratulations to Dr. Ruslan Shaydulin for successfully defending his Ph.D. thesis "Quantum and Classical Multilevel Algorithms for (Hyper)Graphs"! Ruslan will join Argonne National Lab with MGM fellowship in August. This is the second defense in our lab this week!

Monday, June 8, 2020

PhD thesis defense

Congratulations to Dr. Ehsan Sadrfaridpour for successfully defending his Ph.D. thesis "Fast Machine Learning Algorithms for Massive Datasets with Applications in Biomedical Domain"! Ehsan will join Lowe's data science team this summer.

Friday, May 15, 2020

Our work in HPCwire

News coverage in HPCwire about our collaboration with Argonne on hybrid quantum-classical computing
Argonne Receives Two Awards from DARPA for Quantum Information Science

Tuesday, April 21, 2020

NSF grant to tackle COVID-19

Our team received NSF grant to tackle COVID-19 using our AI hypothesis generation system AGATHA
Clemson news coverage: Artificial intelligence could aid in fight against COVID-19

Ilya Safro of @socclemson said that his team will soon roll out a new artificial intelligence system aimed at helping researchers explore the scientific literature as they strive for new discoveries to combat the novel coronavirus.https://t.co/TwQyZqEUra pic.twitter.com/fh3K8iAoWO
— Clemson Engineering, Computing & Applied Sciences (@ClemsonCECAS) April 21, 2020

Tuesday, April 14, 2020

Board game night

Board game night from quarantine! pic.twitter.com/6env2hICaa
— Justin Sybrandt, Ph.D. (@Justin_Sybrandt) March 28, 2020

Monday, April 13, 2020

Combing the best of quantum and classical computing

News coverage from Clemson: Combing the best of quantum and classical computing

Friday, April 3, 2020

From SoC@Clemson

Congrats to Dr. Ilya Safro on receiving funding from @DARPA to work on hybrid quantum-classical algorithms for combinatorial optimization problems, a 4 year collaborative research with @argonne. He is looking for Ph.D. students to join the Algorithms & Computational Science lab. pic.twitter.com/YfIYVc7HzO
— School of Computing (@socclemson) April 3, 2020

Thursday, March 26, 2020

PhD thesis defense

Congratulations to Dr. Justin Sybrandt for successfully defending his Ph.D. thesis "Exploiting Latent Features of Text and Graphs"! Justin will join Google Brain this summer.

Wednesday, February 19, 2020

New papers on biomedical NLP and hypothesis generation

New papers on biomedical NLP+hypothesis generation.

Sybrandt, Safro "CBAG: Conditional Biomedical Abstract Generation", http://arxiv.org/pdf/2002.05637.pdf,

Sybrandt, Tyagin, Shtutman, Safro "AGATHA: Automatic Graph-mining and Transformer based Hypothesis Generation" http://arxiv.org/pdf/2002.05635.pdf

Friday, January 17, 2020

Congratulations to PhD student Ruslan Shaydulin

Congratulations to PhD student Ruslan Shaydulin, advised by Dr. Ilya Safro, for receiving 2020 Maria Goeppert Mayer fellowship for postdoctoral studies at Argonne National Lab. Ruslan will start in Fall 2020. He was selected among many candidates from top schools and different disciplines.

Saturday, October 19, 2019

How to find the best location for wireless charging lanes

Our paper is accepted in Journal of Industrial Management and Optimization

Ushijima-Mwesigwa, Khan, Chowdhury, Safro "Optimal Installation for Electric Vehicle Wireless Charging Lanes", 2019

The emergence of electric vehicle wireless charging technology, where a whole lane can be turned into a charging infrastructure, leads to new challenges in the design and analysis of road networks. From a network perspective, a major challenge is determining the most important nodes with respect to the placement of the wireless charging lanes. In other words, given a limited budget, cities could face the decision problem of where to place these wireless charging lanes. With a heavy price tag, a placement without a careful study can lead to inefficient use of limited resources. In this work, the placement of wireless charging lanes is modeled as an integer programming problem. The basic formulation is used as a building block for different realistic scenarios. We carry out experiments using real geospatial data and compare our results to different network-based heuristics.

Reproducibility: all datasets, algorithm implementations and mathematical programming formulation presented in this work are available at https://github.com/hmwesigwa/smartcities.git

Monday, October 14, 2019

Congratulations to PhD student Justin Sybrandt for being selected in top 12 among more than 3000 summer interns based on his achievements. Over the summer, Justin was an intern at Facebook working on Instagram.

Wednesday, September 25, 2019

Multistart Methods for Quantum Approximate Optimization

Best student paper award at IEEE HPEC2019 goes to our lab! Congratulations to the leading student Ruslan Shaydulin!

Ruslan Shaydulin, Ilya Safro, Jeffrey Larson "Multistart Methods for Quantum Approximate Optimization", accepted at IEEE High Performance Extreme Computing Conference (HPEC) 2019, preprint at https://arxiv.org/abs/1905.08768

Hybrid quantum-classical algorithms such as the quantum approximate optimization algorithm (QAOA) are considered one of the most promising approaches for leveraging near-term quantum computers for practical applications. Such algorithms are often implemented in a variational form, combining classical optimization methods with a quantum machine to find parameters to maximize performance. The quality of the QAOA solution depends heavily on quality of the parameters produced by the classical optimizer. Moreover, multiple local optima in the space of parameters make it harder for the classical optimizer. In this paper we study the use of a multistart optimization approach within a QAOA framework to improve the performance of quantum machines on important graph clustering problems. We also demonstrate that reusing the optimal parameters from similar problems can improve the performance of classical optimization methods, expanding on similar results for MAXCUT.

Thursday, June 20, 2019

The best of both worlds: how to solve real problems on modern quantum computers

Our work on hybrid quantum-classical algorithms is featured in news

The best of both worlds: how to solve real problems on modern quantum computers

Wednesday, June 19, 2019

Hybrid quantum-classical algorithms

Our paper on hybrid quantum-classical algorithms is featured in IEEE Computer, the June's issue on quantum realism.

Ruslan Shaydulin, Hayato Ushijima-Mwesigwa, Christian F.A. Negre, Ilya Safro, Susan M. Mniszewski, Yuri Alexeev "Hybrid Approach for Solving Optimization Problems on Small Quantum Computers", IEEE Computer, vol. 52(6), pp. 18-26, 2019

Solving larger-sized problems is an important area of research in quantum computing. Designing hybrid quantum-classical algorithms is a promising approach to solving this. We discuss decomposition-based hybrid approaches for solving optimization problems and demonstrate them for applications related to community detection.

Thursday, May 23, 2019

Solving network community detection problem on quantum computers

Accepted paper in Advanced Quantum Technology journal

Shaydulin, Ushijima-Mwesigwa, Safro, Mniszewski, Alexeev "Network Community Detection On Small Quantum Computers", 2019, preprint at https://arxiv.org/abs/1810.12484

In recent years, a number of quantum computing devices with small numbers of qubits have become available. A hybrid quantum local search (QLS) approach that combines a classical machine and a small quantum device to solve problems of practical size is presented. The proposed approach is applied to the network community detection problem. QLS is hardware‐agnostic and easily extendable to new quantum computing devices as they become available. It is demonstrated to solve the 2‐community detection problem on graphs of sizes of up to 410 vertices using the 16‐qubit IBM quantum computer and D‐Wave 2000Q, and compare their performance with the optimal solutions. The results herein demonstrate that QLS performs similarly in terms of quality of the solution and the number of iterations to convergence on both types of quantum computers and it is capable of achieving results comparable to state‐of‐the‐art solvers in terms of quality of the solution including reaching the optimal solutions.

Monday, May 13, 2019

PhD thesis proposal defense

Congratulations to Justin Sybrandt for successfully defending his PhD thesis proposal!

Monday, May 6, 2019

Synthetic planar graph generation

Accepted paper in Applied Network Science journal

Chauhan, Gutfraind, Safro "Multiscale planar graph generation", 2019, preprint at https://arxiv.org/abs/1802.09617

The study of network representations of physical, biological, and social phenomena can help us better understand their structure and functional dynamics as well as formulate predictive models of these phenomena. However, due to the scarcity of real-world network data owing to factors such as cost and effort required in collection of network data and the sensitivity of this data towards theft and misuse, engineers and researchers often rely on synthetic data for simulations, hypothesis testing, decision making, and algorithm engineering. An important characteristic of infrastructure networks such as roads, water distribution and other utility systems is that they can be (almost fully) embedded in a plane, therefore to simulate these system we need realistic networks which are also planar. While the currently-available synthetic network generators can model networks that exhibit realism, they do not guarantee or achieve planarity. In this paper we present a flexible algorithm that can synthesize realistic networks that are planar. The method follows a multi-scale randomized editing approach generating a hierarchy of coarsened networks of a given planar graph and introducing edits at various levels in the hierarchy. The method preserves the structural properties with minimal bias including the planarity of the network, while introducing realistic variability at multiple scales.

Reproducibility: All datasets and algorithm implementation presented in this work are available at https://bit.ly/2CjOUAS

Monday, April 29, 2019

Designing scalable nonlinear support vector machines

Accepted paper in Machine Learning journal

Sadrfaridpour, Razzaghi, Safro, "Engineering fast multilevel support vector machines", 2019, preprint at https://arxiv.org/abs/1707.07657

The computational complexity of solving nonlinear support vector machine (SVM) is prohibitive on large-scale data. In particular, this issue becomes very sensitive when the data represents additional difficulties such as highly imbalanced class sizes. Typically, nonlinear kernels produce significantly higher classification quality to linear kernels but introduce extra kernel and model parameters which requires computationally expensive fitting. This increases the quality but also reduces the performance dramatically. We introduce a generalized fast multilevel framework for regular and weighted SVM and discuss several versions of its algorithmic components that lead to a good trade-off between quality and time. Our framework is implemented using PETSc which allows an easy integration with scientific computing tasks. The experimental results demonstrate significant speed up compared to the state-of-the-art nonlinear SVM libraries.

Reproducibility: our source code, documentation and parameters are available at https://github.com/esadr/mlsvm

Saturday, March 30, 2019

Finding influential nodes in networks with consumable resources

Accepted paper in Network Science journal

Ushijima-Mwesigwa, Khan, Chowdhury, Safro, "Centralities for Networks with Consumable Resources", 2019, preprint at https://arxiv.org/abs/1903.00642

Identification of influential nodes is an important step in understanding and controlling the dynamics of information, traffic, and spreading processes in networks. As a result, a number of centrality measures have been proposed and used across different application domains. At the heart of many of these measures lies an assumption describing the manner in which traffic (of information, social actors, particles, etc.) flows through the network. For example, some measures only count shortest paths while others consider random walks. This paper considers a spreading process in which a resource necessary for transit is partially consumed along the way while being refilled at special nodes on the network. Examples include fuel consumption of vehicles together with refueling stations, information loss during dissemination with error-correcting nodes, and consumption of ammunition of military troops while moving. We propose generalizations of the well-known measures of betweenness, random-walk betweenness, and Katz centralities to take such a spreading process with consumable resources into account. In order to validate the results, experiments on real-world networks are carried out by developing simulations based on well-known models such as Susceptible-Infected-Recovered and congestion with respect to particle hopping from vehicular flow theory. The simulation-based models are shown to be highly correlated with the proposed centrality measures.

Friday, March 29, 2019

Future of medicine - man or machine

Some thoughts about literature based discovery and our automated biomedical hypothesis generation tool MOLIERE for Clemson World Magazine

Saturday, February 9, 2019

Does it help to charge the electric cars at intersections?

Accepted paper in Computer-Aided Civil and Infrastructure Engineering

Khan, Khan Chowdhury, Safro, Ushijima-Mwesigwa "Wireless Charging Utility Maximization and Intersection Control Delay Minimization Framework for Electric Vehicles"

This study presents the Wireless Charging Utility Maximization (WCUM) framework, which aims to maximize the utility of Wireless Charging Units (WCUs) for electric vehicle (EV) charging through the optimal WCU deployment at signalized intersections. Furthermore, the framework aims to minimize the control delay at all signalized intersections of the network. The framework consists of a two‐step optimization formulation, a dynamic traffic assignment model to calculate the user equilibrium, a traffic microsimulator to formulate the objective functions, and a global Mixed Integer Non‐Linear Programming (MINLP) optimization solver. An optimization problem is formulated for each intersection, and another for the entire network. The performance of the WCUM framework is tested using the Sioux Falls network. We perform a comparative study of 12 global MINLP solvers with a case study. Based on solution quality and computation time, we choose the Couenne solver for this framework.

https://onlinelibrary.wiley.com/doi/abs/10.1111/mice.12439

Sunday, February 3, 2019

Predicting bariatric surgery outcomes

Accepted paper in Annals of Operations Research

Razzaghi, Safro, Ewing, Sadrfaridpour, Scott "Predictive models for bariatric surgery risks with imbalanced medical datasets"

Bariatric surgery (BAR) has become a popular treatment for type 2 diabetes mellitus which is among the most critical obesity-related comorbidities. Patients who have bariatric surgery, are exposed to complications after surgery. Furthermore, the mid- to long-term complications after bariatric surgery can be deadly and increase the complexity of managing safety of these operations and healthcare costs. Current studies on BAR complications have mainly used risk scoring for identifying patients who are more likely to have complications after surgery. Though, these studies do not take into consideration the imbalanced nature of the data where the size of the class of interest (patients who have complications after surgery) is relatively small. We propose the use of imbalanced classification techniques to tackle the imbalanced bariatric surgery data: synthetic minority oversampling technique (SMOTE), random undersampling, and ensemble learning classification methods including Random Forest, Bagging, and AdaBoost. Moreover, we improve classification performance through using Chi-squared, Information Gain, and Correlation-based feature selection techniques. We study the Premier Healthcare Database with focus on the most-frequent complications including Diabetes, Angina, Heart Failure, and Stroke. Our results show that the ensemble learning-based classification techniques using any feature selection method mentioned above are the best approach for handling the imbalanced nature of the bariatric surgical outcome data. In our evaluation, we find a slight preference toward using SMOTE method compared to the random undersampling method. These results demonstrate the potential of machine-learning tools as clinical decision support in identifying risks/outcomes associated with bariatric surgery and their effectiveness in reducing the surgery complications as well as improving patient care.

Monday, January 28, 2019

AI@Clemson

It was great discussing AI and text mining at Clemson University research symposium on AI.

Saturday, January 12, 2019

Welcome new students!

Two new students, Zirou Qiu (MSc) and Korey Palmer (senior undergrad) are joining our research group.

Friday, January 11, 2019

Relaxation-Based Coarsening for Multilevel Hypergraph Partitioning

Accepted paper at SIAM Multiscale Modeling and Simulations

Ruslan Shaydulin, Jie Chen, Ilya Safro "Relaxation-Based Coarsening for Multilevel Hypergraph Partitioning", 2019, preprint at arXiv:1710.06552

Multilevel partitioning methods that are inspired by principles of multiscaling are the most powerful practical hypergraph partitioning solvers. Hypergraph partitioning has many applications in disciplines ranging from scientific computing to data science. In this paper we introduce the concept of algebraic distance on hypergraphs and demonstrate its use as an algorithmic component in the coarsening stage of multilevel hypergraph partitioning solvers. The algebraic distance is a vertex distance measure that extends hyperedge weights for capturing the local connectivity of vertices which is critical for hypergraph coarsening schemes. The practical effectiveness of the proposed measure and corresponding coarsening scheme is demonstrated through extensive computational experiments on a diverse set of problems. Finally, we propose a benchmark of hypergraph partitioning problems to compare the quality of other solvers.

Monday, November 26, 2018

Thesis defense

Congratulations to Varsha Chauhan for successfully defending her MSc thesis on planar graph generation!

Travel awards

Congratulations to Justin Sybrandt and Ruslan Shaydulin for receiving travel awards to present their papers at IEEE Big Data 2018 and APS 2018!

Thesis defense

Congratulations to Dr. Hayato Ushijima-Mwesigwa for successfully defending his Ph.D. thesis "Models for Networks with Consumable Resources"!

Community detection on NISQ devices

Accepted paper at at 3rd International Workshop on Post Moore's Era 2018

Supercomputing (PMES 2018)

Ruslan Shaydulin, Haayto Ushijima-Mwesigwa, Ilya Safro, Susan Mniszewski, Yuri Alexeev "Community Detection Across Emerging Quantum Architectures", preprint at arXiv:1810.07765, 2018

Sunday, November 25, 2018

Can we predict crimes in Chicago?

Our paper is accepted at IEEE Big Data 2018

Saroj K. Dash, I. Safro, Ravisutha S. Srinivasamurthy "Spatio-temporal prediction of crimes using network analytic approach", preprint at arXiv:1808.06241, 2018

It is quite evident that majority of the population lives in urban area today than in any time of the human history. This trend seems to increase in coming years. Studies say that nearly 80.7% of total population in USA stays in urban area. By 2030 nearly 60% of the population in the world will live in or move to cities. With the increase in urban population, it is important to keep an eye on criminal activities. By doing so, governments can enforce intelligent policing systems and hence many government agencies and local authorities have made the crime data publicly available. In this paper, we analyze Chicago city crime data fused with other social information sources using network analytic techniques to predict criminal activity for the next year. We observe that as we add more layers of data which represent different aspects of the society, the quality of prediction is improved. Our prediction models not just predict total number of crimes for the whole Chicago city, rather they predict number of crimes for all types of crimes and for different regions in City of Chicago.

Saturday, November 24, 2018

Two papers accepted at IEEE Big Data 2018

Sybrandt, Carrabba, Herzog, Safro "Are Abstracts Enough for Hypothesis Generation?", arXiv:1804.05942

Sybrandt, Shtutman, Safro "Large-Scale Validation of Hypothesis Generation Systems via Candidate Ranking", arXiv:1802.03793