Wednesday, December 23, 2020

Reboot the Computing-Research Publication Systems

Many good points by Moshe Vardi on current CS research publication system in the Communications of ACM

Reboot the Computing-Research Publication Systems: The virtualization of conferences due to COVID-19 has sharpened my conviction that the computing-research publication system is badly broken and in need of a serious reboot.

Saturday, December 12, 2020

Do we always need big data in text mining?

Do we always need Big Data text mining? Can we filter it? Check our new paper "Accelerating Text Mining Using Domain-Specific Stop Word Lists" accepted at IWBDR https://arxiv.org/pdf/2012.02294.pdf

Text preprocessing is an essential step in text mining. Removing words that can negatively impact the quality of prediction algorithms or are not informative enough is a crucial storage-saving technique in text indexing and results in improved computational efficiency. Typically, a generic stop word list is applied to a dataset regardless of the domain. However, many common words are different from one domain to another but have no significance within a particular domain. Eliminating domain-specific common words in a corpus reduces the dimensionality of the feature space, and improves the performance of text mining tasks. In this paper, we present a novel mathematical approach for the automatic extraction of domain-specific words called the hyperplane-based approach. This new approach depends on the notion of low dimensional representation of the word in vector space and its distance from hyperplane. The hyperplane-based approach can significantly reduce text dimensionality by eliminating irrelevant features. We compare the hyperplane-based approach with other feature selection methods, namely \c{hi}2 and mutual information. An experimental study is performed on three different datasets and five classification algorithms, and measure the dimensionality reduction and the increase in the classification performance. Results indicate that the hyperplane-based approach can reduce the dimensionality of the corpus by 90% and outperforms mutual information. The computational time to identify the domain-specific words is significantly lower than mutual information.

Thursday, December 10, 2020

Quantum Approximate Optimization Algorithm and Symmetry

 Our new paper QAOA symmetry and its predictability using classical symmetry is out! We analyze the connections between the symmetries of the objective function and the symmetries of QAOA dynamics, and applications to performance prediction, simulation and more https://scirate.com/arxiv/2012.04713

Ruslan Shaydulin, Stuart Hadfield, Tad Hogg, Ilya Safro "Classical symmetries and QAOA", 2020

We study the relationship between the Quantum Approximate Optimization Algorithm (QAOA) and the underlying symmetries of the objective function to be optimized. Our approach formalizes the connection between quantum symmetry properties of the QAOA dynamics and the group of classical symmetries of the objective function. The connection is general and includes but is not limited to problems defined on graphs. We show a series of results exploring the connection and highlight examples of hard problem classes where a nontrivial symmetry subgroup can be obtained efficiently. In particular we show how classical objective function symmetries lead to invariant measurement outcome probabilities across states connected by such symmetries, independent of the choice of algorithm parameters or number of layers. To illustrate the power of the developed connection, we apply machine learning techniques towards predicting QAOA performance based on symmetry considerations. We provide numerical evidence that a small set of graph symmetry properties suffices to predict the minimum QAOA depth required to achieve a target approximation ratio on the MaxCut problem, in a practically important setting where QAOA parameter schedules are constrained to be linear and hence easier to optimize.

What we do/Team/In news

Quantum Computing     Quantum computers are expected to accelerate scientific discovery spanning many different areas such as medicine, AI, ...