Showing posts with label papers. Show all posts
Showing posts with label papers. Show all posts

Saturday, September 21, 2024

Best Paper Award in Quantum Algorithms and IEEE Quantum Week

Our team had an amazing time at IEEE Quantum Computing and Engineering 2024 (aka IEEE Quantum Week) in Montreal, where we were honored to receive the Best Paper in Quantum Applications Award for the paper "MLQAOA: Graph Learning Accelerated Hybrid Quantum-Classical Multilevel QAOA"! The conference was packed with excellent talks, workshops, tutorials and panels. 

It was also wonderful to catch up with our lab’s alumni, the Algorithms and Computational Science Lab legends Ankit Kulshrestha Joey Xiaoyuan Liu Ruslan Shaydulin, and see their success in the field. Graduate students Bao Bach and Cameron Ibrahim gave excellent talks on their papers. Big thanks to everyone who made this event such a success. This year there were over 1,550 participants from universities, national labs, and industry. Looking forward to even better event next year in Albuquerque, NM! 

#IEEEQuantum #QCE24 #QuantumComputing #QuantumWeek #IEEE #QuantumWeek2024





Tuesday, March 12, 2024

Quantum computing for finance: Our work is in UDaily.

 Great recent article in UDaily about our joint project with Argonne National Lab, JPMorgan Chase, Fujitsu and Menten AI. We discuss the obstacles and opportunities of quantum computing in finance.

Click here for the full article

Thursday, January 4, 2024

Benchmarking Biomedical Literature-based Discovery and Hypothesis Generation Systems

Update: The paper is accepted for publication in BMC Bioinformatics.

We introduce a benchmarking framework Dyport for evaluating biomedical hypothesis generation (HG) and literature based discovery (LBD) systems. The evaluation of HG and LBD is still one of the major problems of these systems, especially when it comes to fully automated large-scale general purpose systems. For these, a massive assessment (that is normal in the machine learning and general AI domains) is often infeasible. One traditional evaluation approach is to make a system “rediscover” some of the landmark findings. However, this approach does not scale. Another traditional approach is to automatically discover some information in biomedical texts, train the system on historical data and test it on that "future" information. While this approach does scale well, the reliability and biomedical importance of the extracted test set are far from being illuminating. 


Utilizing curated datasets, Dyport tests HG/LBD systems under realistic conditions, enhancing the relevance of our evaluations. We integrate knowledge from the curated databases into a dynamic graph, accompanied by a method to quantify discovery importance. This not only assesses hypothesis accuracy but also their potential impact in biomedical research which significantly extends traditional link prediction benchmarks. Applicability of our benchmarking process is demonstrated on several link prediction systems applied on biomedical semantic knowledge graphs. Being flexible, our benchmarking system is designed for broad application in hypothesis generation quality verification, aiming to expand the scope of scientific discovery within the biomedical research community. 


Dyport is available at https://github.com/IlyaTyagin/Dyport

Paper: https://arxiv.org/pdf/2312.03303.pdf

Friday, September 30, 2022

Optimal Contraction Trees for Tensor Network Quantum Circuit Simulation

We are excited to announce that our paper received the best student paper award at #IEEE High Performance Extreme Computing #HPEC2022! Leading student is Cameron Ibrahim. Congratulations to all!

Cameron Ibrahim, Danylo Lykov, Zichang He, Yuri Alexeev, Ilya Safro "Constructing Optimal Contraction Trees for Tensor Network Quantum Circuit Simulation", 2022, preprint at https://arxiv.org/abs/2209.02895

One of the key problems in tensor network based quantum circuit simulation is the construction of a contraction tree which minimizes the cost of the simulation, where the cost can be expressed in the number of operations as a proxy for the simulation running time. This same problem arises in a variety of application areas, such as combinatorial scientific computing, marginalization in probabilistic graphical models, and solving constraint satisfaction problems. In this paper, we reduce the computationally hard portion of this problem to one of graph linear ordering, and demonstrate how existing approaches in this area can be utilized to achieve results up to several orders of magnitude better than existing state of the art methods for the same running time. To do so, we introduce a novel polynomial time algorithm for constructing an optimal contraction tree from a given order. Furthermore, we introduce a fast and high quality linear ordering solver, and demonstrate its applicability as a heuristic for providing orderings for contraction trees. Finally, we compare our solver with competing methods for constructing contraction trees in quantum circuit simulation on a collection of randomly generated Quantum Approximate Optimization Algorithm Max Cut circuits and show that our method achieves superior results on a majority of tested quantum circuits.





Thursday, July 14, 2022

Recently accepted papers in optimization and quantum and quantum inspired computing.

  • Xiaoyuan Liu, Hayato Ushijima-Mwesigwa, Indradeep Ghosh, Ilya Safro "Partitioning Dense Graphs with Hardware Accelerators", International Conference on Computational Science (ICCS), 2022, preprint at https://arxiv.org/pdf/2202.09420.pdf
  • Ankit Kulshrestha, Ilya Safro "BEINIT: Avoiding Barren Plateaus in Variational Quantum Algorithms", IEEE Quantum Computing and Engineering (QCE), 2022, preprint at https://arxiv.org/abs/2204.13751
  • Xiaoyuan Liu, Ruslan Shaydulin, Ilya Safro "Quantum Approximate Optimization Algorithm with Sparsified Phase Operator", IEEE Quantum Computing and Engineering (QCE), 2022, preprint at https://arxiv.org/abs/2205.00118
  • Xiaoyuan Liu, Hayato Ushijima-Mwesigwa, Avradip Mandal, Sarvagya Upadhyay, Ilya Safro, Arnab Roy "Leveraging Special-Purpose Hardware for Local Search Heuristics", Computational Optimization and Applications, 2022, preprint at https://arxiv.org/pdf/1911.09810.pdf


Wednesday, December 29, 2021

Accelerating COVID-19 scientific discovery with hypothesis generation system

In 2020, the White House released the, “Call to Action to the Tech Community on New Machine Readable COVID-19 Dataset,” wherein artificial intelligence experts are asked to collect data and develop text mining techniques that can help the science community answer high-priority scientific questions related to COVID-19. The Allen Institute for AI and collaborators announced the availability of a rapidly growing open dataset of publications, the COVID-19 Open Research Dataset (CORD-19). As the pace of research accelerates, biomedical scientists struggle to stay current. To expedite their investigations, scientists leverage hypothesis generation systems, which can automatically inspect published papers to discover novel implicit connections. 

Ilya Tyagin, Ankit Kulshrestha, Justin Sybrandt, Krish Matta, Michael Shtutman, Ilya Safro "Accelerating COVID-19 research with graph mining and transformer-based learning", Innovative Applications of Artificial Intelligence (AAAI/IAAI), preprint at https://www.biorxiv.org/content/10.1101/2021.02.11.430789v1, 2022

We present an automated general purpose hypothesis generation systems AGATHA-C and AGATHA-GP for COVID-19 research. The systems are based on graph-mining and the transformer model. The systems are massively validated using retrospective information rediscovery and proactive analysis involving human-in-the-loop expert analysis. Both systems achieve high-quality predictions across domains (in some domains up to 0.97% ROC AUC) in fast computational time and are released to the broad scientific community to accelerate biomedical research. In addition, by performing the domain expert curated study, we show that the systems are able to discover on-going research findings such as the relationship between COVID-19 and oxytocin hormone.


Transferring QAOA parameters from small instances to large

The Quantum approximate optimization algorithm (QAOA) is one of the most promising candidates for  achieving quantum advantage through quantum-enhanced combinatorial optimization. However, finding optimal parameters to run QAOA is a hard problem that is usually tackled by variational loop, i.e., the circuit is executed with the current parameters on a quantum device, and then the parameters are updated on a classical device. Such variational loops are extremely time consuming. In our recent paper we find the QAOA parameters for small optimization instances and transfer them to the larger ones accelerating the entire process.

Alexey Galda, Xiaoyuan Liu, Danylo Lykov, Yuri Alexeev, Ilya Safro "Transferability of optimal QAOA parameters between random graphs", IEEE International Conference on Quantum Computing and Engineering (QCE), preprint at https://arxiv.org/pdf/2106.07531.pdf, 2021



In a typical  QAOA setup, a set of quantum circuit parameters is optimized to prepare a quantum state used to find the optimal solution of a combinatorial optimization problem. Several empirical observations about optimal parameter concentration effects for special QAOA MaxCut problem instances have been made in recent literature, however, a rigorous study of the subject is still lacking. We show that convergence of the optimal QAOA parameters around specific values and, consequently, successful transferability of parameters between different QAOA instances can be explained and predicted based on the local properties of the graphs, specifically the types of subgraphs (lightcones) from which the graphs are composed. We apply this approach to random regular and general random graphs. For example, we demonstrate how optimized parameters calculated for a 6-node random graph can be successfully used without modification as nearly optimal parameters for a 64-node random graph, with less than 1% reduction in approximation ratio as a result. This work presents a pathway to identifying classes of combinatorial optimization instances for which such variational quantum algorithms as QAOA can be substantially accelerated.

Thursday, February 25, 2021

Multilevel Graph Partitioning for Three-Dimensional Discrete Fracture Network Flow Simulations

Combinatorial scientific computing in action! Our paper on accelerating 3D discrete fracture network flow simulations by multilevel graph partitioning is accepted in Mathematical Geosciences!



Hayato Ushijima-Mwesigwa, Jeffrey D. Hyman, Aric Hagberg, Ilya Safro, Satish Karra, Carl W. Gable, Gowri Srinivasan "Multilevel Graph Partitioning for Three-Dimensional Discrete Fracture Network Flow Simulations", accepted in Mathematical Geosciences, preprint at https://arxiv.org/abs/1902.08029, 2020

We present a topology-based method for mesh-partitioning in three-dimensional discrete fracture network (DFN) simulations that takes advantage of the intrinsic multi-level nature of a DFN. DFN models are used to simulate flow and transport through low-permeability fracture media in the subsurface by explicitly representing fractures as discrete entities. The governing equations for flow and transport are numerically integrated on computational meshes generated on the interconnected fracture networks. Modern high-fidelity DFN simulations require high-performance computing on multiple processors where performance and scalability depends partially on obtaining a high-quality partition of the mesh to balance work work-loads and minimize communication across all processors.

The discrete structure of a DFN naturally lends itself to various graph representations, which can be thought of as coarse-scale representations of the computational mesh. Using this concept, we develop a variant of the multilevel graph partitioning algorithm to partition the mesh of a DFN. We compare the performance of this DFN-based mesh-partitioning with standard multi-level graph partitioning using graphbased metrics (cut, imbalance, partitioning time), computational-based metrics (FLOPS, iterations, solver time), and total run time. The DFN-based partition and the mesh-based partition are comparable in terms of the graph-based metrics, but the time required to obtain the partition is several orders of magnitude faster using the DFN-based partition. The computation-based metrics show comparable performance between both methods so, in combination, the DFN-based partition is several orders of magnitude faster than the mesh-based partition.

ELRUNA: Network Alignment Algorithm

Our network alignment algorithm ELRUNA is accepted in ACM Journal of Experimental Algorithmics. Turns out that three relatively simple node similarity rules can successfully compete with several state of the art algorithms and improve both the running time and alignment quality. You can get it at https://github.com/BridgelessAlexQiu/ELRUNA



Zirou Qiu, Ruslan Shaydulin, Xiaoyuan Liu, Yuri Alexeev, Christopher S. Henry, Ilya Safro "ELRUNA: Elimination Rule-based Network Alignment", accepted in ACM Journal of Experimental Algorithmics, preprint at https://arxiv.org/abs/1911.05486, 2020

Networks model a variety of complex phenomena across different domains. In many applications, one of the most essential tasks is to align two or more networks to infer the similarities between cross-network vertices and discover potential node-level correspondence. In this paper, we propose ELRUNA (Elimination rule-based network alignment), a novel network alignment algorithm that relies exclusively on the underlying graph structure. Under the guidance of the elimination rules that we defined, ELRUNA computes the similarity between a pair of cross-network vertices iteratively by accumulating the similarities between their selected neighbors. The resulting cross-network similarity matrix is then used to infer a permutation matrix that encodes the final alignment of cross-network vertices. In addition to the novel alignment algorithm, we also improve the performance of local search, a commonly used post-processing step for solving the network alignment problem, by introducing a novel selection method RAWSEM (Random walk based selection method) based on the propagation of the levels of mismatching (defined in the paper) of vertices across the networks. The key idea is to pass on the initial levels of mismatching of vertices throughout the entire network in a random-walk fashion. Through extensive numerical experiments on real networks, we demonstrate that ELRUNA significantly outperforms the state-of-the-art alignment methods in terms of alignment accuracy under lower or comparable running time. Moreover, ELRUNA is robust to network perturbations such that it can maintain a close to optimal objective value under a high level of noise added to the original networks. Finally, the proposed RAWSEM can further improve the alignment quality with a less number of iterations compared with the naive local search method.

Thursday, February 11, 2021

Can we outperform Quantum Approximate Optimization Algorithm?

Check our new paper:

Xiaoyuan Liu, Anthony Angone, Ruslan Shaydulin, Ilya Safro, Yuri Alexeev, Lukasz Cincio "Layer VQE: A Variational Approach for Combinatorial Optimization on Noisy Quantum Computers", preprint at https://arxiv.org/abs/2102.05566, 2021

We propose a hybrid quantum-classical algorithm, Layer Variational Quantum Eigensolver (L-VQE), inspired by the Variational Quantum Eigensolver (VQE). L-VQE is a heuristic approach to solve combinatorial optimization problems on near term intermediate-scale quantum devices. We demonstrate the potential of the proposed approach by applying it to the problem of community detection, a famous problem in network science. Our large-scale numerical simulation study shows that L-VQE has the potential to outperform Quantum Approximate Optimization Algorithm (QAOA), and is more robust to sampling noise as compared with standard VQE approaches.


Literature-based knowledge discovery to accelerate COVID-19 research

Our new paper on customization of AGATHA knowledge discovery model for COVID-19 is out!

Ilya Tyagin, Ankit Kulshrestha, Justin Sybrandt, Krish Matta, Michael Shtutman,  Ilya Safro
"Accelerating COVID-19 research with graph mining and transformer-based learning", 2021

https://www.biorxiv.org/content/10.1101/2021.02.11.430789v1

In 2020, the White House released the, "Call to Action to the Tech Community on New Machine Readable COVID-19 Dataset," wherein artificial intelligence experts are asked to collect data and develop text mining techniques that can help the science community answer high-priority scientific questions related to COVID-19. The Allen Institute for AI and collaborators announced the availability of a rapidly growing open dataset of publications, the COVID-19 Open Research Dataset (CORD-19). As the pace of research accelerates, biomedical scientists struggle to stay current. To expedite their investigations, scientists leverage hypothesis generation systems, which can automatically inspect published papers to discover novel implicit connections. We present an automated general purpose hypothesis generation systems AGATHA-C and AGATHA-GP for COVID-19 research. The systems are based on graph-mining and the transformer model. The systems are massively validated using retrospective information rediscovery and proactive analysis involving human-in-the-loop expert analysis. Both systems achieve high-quality predictions across domains (in some domains up to 0.97% ROC AUC) in fast computational time and are released to the broad scientific community to accelerate biomedical research. In addition, by performing the domain expert curated study, we show that the systems are able to discover on-going research findings such as the relationship between COVID-19 and oxytocin hormone.

Saturday, December 12, 2020

Do we always need big data in text mining?

Do we always need Big Data text mining? Can we filter it? Check our new paper "Accelerating Text Mining Using Domain-Specific Stop Word Lists" accepted at IWBDR https://arxiv.org/pdf/2012.02294.pdf

Text preprocessing is an essential step in text mining. Removing words that can negatively impact the quality of prediction algorithms or are not informative enough is a crucial storage-saving technique in text indexing and results in improved computational efficiency. Typically, a generic stop word list is applied to a dataset regardless of the domain. However, many common words are different from one domain to another but have no significance within a particular domain. Eliminating domain-specific common words in a corpus reduces the dimensionality of the feature space, and improves the performance of text mining tasks. In this paper, we present a novel mathematical approach for the automatic extraction of domain-specific words called the hyperplane-based approach. This new approach depends on the notion of low dimensional representation of the word in vector space and its distance from hyperplane. The hyperplane-based approach can significantly reduce text dimensionality by eliminating irrelevant features. We compare the hyperplane-based approach with other feature selection methods, namely \c{hi}2 and mutual information. An experimental study is performed on three different datasets and five classification algorithms, and measure the dimensionality reduction and the increase in the classification performance. Results indicate that the hyperplane-based approach can reduce the dimensionality of the corpus by 90% and outperforms mutual information. The computational time to identify the domain-specific words is significantly lower than mutual information.

Thursday, December 10, 2020

Quantum Approximate Optimization Algorithm and Symmetry

 Our new paper QAOA symmetry and its predictability using classical symmetry is out! We analyze the connections between the symmetries of the objective function and the symmetries of QAOA dynamics, and applications to performance prediction, simulation and more https://scirate.com/arxiv/2012.04713

Ruslan Shaydulin, Stuart Hadfield, Tad Hogg, Ilya Safro "Classical symmetries and QAOA", 2020

We study the relationship between the Quantum Approximate Optimization Algorithm (QAOA) and the underlying symmetries of the objective function to be optimized. Our approach formalizes the connection between quantum symmetry properties of the QAOA dynamics and the group of classical symmetries of the objective function. The connection is general and includes but is not limited to problems defined on graphs. We show a series of results exploring the connection and highlight examples of hard problem classes where a nontrivial symmetry subgroup can be obtained efficiently. In particular we show how classical objective function symmetries lead to invariant measurement outcome probabilities across states connected by such symmetries, independent of the choice of algorithm parameters or number of layers. To illustrate the power of the developed connection, we apply machine learning techniques towards predicting QAOA performance based on symmetry considerations. We provide numerical evidence that a small set of graph symmetry properties suffices to predict the minimum QAOA depth required to achieve a target approximation ratio on the MaxCut problem, in a practically important setting where QAOA parameter schedules are constrained to be linear and hence easier to optimize.

Friday, August 21, 2020

Generating biomedical scientific hypotheses with AGATHA

 Accepted paper in the 29TH ACM International Conference on Information and Knowledge Management (CIKM)

Sybrandt, Tyagin, Shtutman, Safro "AGATHA: Automatic Graph-mining and Transformer based Hypothesis Generation Approach", preprint at http://arxiv.org/pdf/2002.05635.pdf

Medical research is risky and expensive. Drug discovery requires researchers to efficiently winnow thousands of potential targets to a small candidate set. However, scientists spend significant time and money long before seeing the intermediate results that ultimately determine this smaller set. Hypothesis generation systems address this challenge by mining the wealth of publicly available scientific information to predict plausible research directions. We present AGATHA, a deep-learning hypothesis generation system that learns a data-driven ranking criteria to recommend new biomedical connections. We massively validate our system with a temporal holdout wherein we predict connections first introduced after 2015 using data published beforehand. We additionally explore biomedical sub-domains, and demonstrate AGATHA's predictive capacity across the twenty most popular relationship types. Furthermore, we perform an ablation study to examine the aspects of our semantic network that most contribute to recommendation quality. Overall, AGATHA achieves best-in-class recommendation quality when compared to other hypothesis generation systems built to predict across all available biomedical literature. Reproducibility: All code, experimental data, and pre-trained models are available online: sybrandt.com/2020/agatha.

Sunday, July 12, 2020

Two papers on graph representation learning

 Accepted papers at Workshop on Mining and Learning with Graphs co-located with ACM KDD 2020! 

1) Ding, Zhang, Sybrandt, Safro "Unsupervised Hierarchical Graph Representation Learning by Mutual Information Maximization", 2020

Graph representation learning based on graph neural networks (GNNs) can greatly improve the performance of downstream tasks, such as node and graph classification. However, the general GNN models do not aggregate node information in a hierarchical manner, and can miss key higher-order structural features of many graphs. The hierarchical aggregation also enables the graph representations to be explainable. In addition, supervised graph representation learning requires labeled data, which is expensive and error-prone. To address these issues, we present an unsupervised graph representation learning method, Unsupervised Hierarchical Graph Representation (UHGR), which can generate hierarchical representations of graphs. Our method focuses on maximizing mutual information between "local" and high-level "global" representations, which enables us to learn the node embeddings and graph embeddings without any labeled data. To demonstrate the effectiveness of the proposed method, we perform the node and graph classification using the learned node and graph embeddings. The results show that the proposed method achieves comparable results to state-of-the-art supervised methods on several benchmarks. In addition, our visualization of hierarchical representations indicates that our method can capture meaningful and interpretable clusters.

2) Sybrandt, Safro "FOBE and HOBE: First- and High-Order Bipartite Embeddings", 2020

Typical graph embeddings may not capture type-specific bipartite graph features that arise in such areas as recommender systems, data visualization, and drug discovery. Machine learning methods utilized in these applications would be better served with specialized embedding techniques. We propose two embeddings for bipartite graphs that decompose edges into sets of indirect relationships between node neighborhoods. When sampling higher-order relationships, we reinforce similarities through algebraic distance on graphs. We also introduce ensemble embeddings to combine both into a "best of both worlds" embedding. The proposed methods are evaluated on link prediction and recommendation tasks and compared with other state-of-the-art embeddings. While being all highly beneficial in applications, we demonstrate that none of the considered embeddings is clearly superior (in contrast to what is claimed in many papers), and discuss the trade offs present among them. 

Saturday, October 19, 2019

How to find the best location for wireless charging lanes

Our paper is accepted in Journal of Industrial Management and Optimization 

Ushijima-Mwesigwa, Khan, Chowdhury, Safro "Optimal Installation for Electric Vehicle Wireless Charging Lanes", 2019

The emergence of electric vehicle wireless charging technology, where a whole lane can be turned into a charging infrastructure, leads to new challenges in the design and analysis of road networks. From a network perspective, a major challenge is determining the most important nodes with respect to the placement of the wireless charging lanes. In other words, given a limited budget, cities could face the decision problem of where to place these wireless charging lanes. With a heavy price tag, a placement without a careful study can lead to inefficient use of limited resources. In this work, the placement of wireless charging lanes is modeled as an integer programming problem. The basic formulation is used as a building block for different realistic scenarios. We carry out experiments using real geospatial data and compare our results to different network-based heuristics.

Reproducibility: all datasets, algorithm implementations and mathematical programming formulation presented in this work are available at https://github.com/hmwesigwa/smartcities.git

Wednesday, September 25, 2019

Multistart Methods for Quantum Approximate Optimization

Best student paper award at IEEE HPEC2019 goes to our lab! Congratulations to the leading student Ruslan Shaydulin!

Ruslan Shaydulin, Ilya Safro, Jeffrey Larson "Multistart Methods for Quantum Approximate Optimization", accepted at IEEE High Performance Extreme Computing Conference (HPEC) 2019, preprint at https://arxiv.org/abs/1905.08768


Hybrid quantum-classical algorithms such as the quantum approximate optimization algorithm (QAOA) are considered one of the most promising approaches for leveraging near-term quantum computers for practical applications. Such algorithms are often implemented in a variational form, combining classical optimization methods with a quantum machine to find parameters to maximize performance. The quality of the QAOA solution depends heavily on quality of the parameters produced by the classical optimizer. Moreover, multiple local optima in the space of parameters make it harder for the classical optimizer. In this paper we study the use of a multistart optimization approach within a QAOA framework to improve the performance of quantum machines on important graph clustering problems. We also demonstrate that reusing the optimal parameters from similar problems can improve the performance of classical optimization methods, expanding on similar results for MAXCUT.

Monday, May 6, 2019

Synthetic planar graph generation

Accepted paper in Applied Network Science journal

Chauhan, Gutfraind, Safro "Multiscale planar graph generation", 2019, preprint at  https://arxiv.org/abs/1802.09617

The study of network representations of physical, biological, and social phenomena can help us better understand their structure and functional dynamics as well as formulate predictive models of these phenomena. However, due to the scarcity of real-world network data owing to factors such as cost and effort required in collection of network data and the sensitivity of this data towards theft and misuse, engineers and researchers often rely on synthetic data for simulations, hypothesis testing, decision making, and algorithm engineering. An important characteristic of infrastructure networks such as roads, water distribution and other utility systems is that they can be (almost fully) embedded in a plane, therefore to simulate these system we need realistic networks which are also planar. While the currently-available synthetic network generators can model networks that exhibit realism, they do not guarantee or achieve planarity. In this paper we present a flexible algorithm that can synthesize realistic networks that are planar. The method follows a multi-scale randomized editing approach generating a hierarchy of coarsened networks of a given planar graph and introducing edits at various levels in the hierarchy. The method preserves the structural properties with minimal bias including the planarity of the network, while introducing realistic variability at multiple scales.

Reproducibility: All datasets and algorithm implementation presented in this work are available at https://bit.ly/2CjOUAS

Monday, April 29, 2019

Designing scalable nonlinear support vector machines

Accepted paper in Machine Learning journal 

Sadrfaridpour, Razzaghi, Safro, "Engineering fast multilevel support vector machines", 2019, preprint at https://arxiv.org/abs/1707.07657

The computational complexity of solving nonlinear support vector machine (SVM) is prohibitive on large-scale data. In particular, this issue becomes very sensitive when the data represents additional difficulties such as highly imbalanced class sizes. Typically, nonlinear kernels produce significantly higher classification quality to linear kernels but introduce extra kernel and model parameters which requires computationally expensive fitting. This increases the quality but also reduces the performance dramatically. We introduce a generalized fast multilevel framework for regular and weighted SVM and discuss several versions of its algorithmic components that lead to a good trade-off between quality and time. Our framework is implemented using PETSc which allows an easy integration with scientific computing tasks. The experimental results demonstrate significant speed up compared to the state-of-the-art nonlinear SVM libraries. 

Reproducibility: our source code, documentation and parameters are available at https://github.com/esadr/mlsvm

Saturday, March 30, 2019

Finding influential nodes in networks with consumable resources

Accepted paper in Network Science journal 

Ushijima-Mwesigwa, Khan, Chowdhury, Safro, "Centralities for Networks with Consumable Resources", 2019, preprint at https://arxiv.org/abs/1903.00642

Identification of influential nodes is an important step in understanding and controlling the dynamics of information, traffic, and spreading processes in networks. As a result, a number of centrality measures have been proposed and used across different application domains. At the heart of many of these measures lies an assumption describing the manner in which traffic (of information, social actors, particles, etc.) flows through the network. For example, some measures only count shortest paths while others consider random walks. This paper considers a spreading process in which a resource necessary for transit is partially consumed along the way while being refilled at special nodes on the network. Examples include fuel consumption of vehicles together with refueling stations, information loss during dissemination with error-correcting nodes, and consumption of ammunition of military troops while moving. We propose generalizations of the well-known measures of betweenness, random-walk betweenness, and Katz centralities to take such a spreading process with consumable resources into account. In order to validate the results, experiments on real-world networks are carried out by developing simulations based on well-known models such as Susceptible-Infected-Recovered and congestion with respect to particle hopping from vehicular flow theory. The simulation-based models are shown to be highly correlated with the proposed centrality measures.

What we do/Team/In news

Quantum Computing     Quantum computers are expected to accelerate scientific discovery spanning many different areas such as medicine, AI, ...