F-S-Cube: A sampling based method for top‐k frequent subgraph mining

Abstract

Mining labeled subgraph is a popular research task in data mining because of its potential application in many different scientific domains. All the existing methods for this task explicitly or implicitly solve the subgraph isomorphism task, which is computationally expensive, and thus they suffer from the lack of scalability problem when the graphs in the input database are large. In this work, we propose FS3, which is a sampling‐based method. It mines a small collection of subgraphs that are most frequent in the probabilistic sense. FS^3 performs a Markov chain Monte Carlo (MCMC) sampling over the space of a fixed‐size subgraphs such that the potentially frequent subgraphs are sampled more often. Besides, FS3 is equipped with an innovative queue manager. It stores the sampled subgraph in a finite queue over the course of mining in such a manner that the top‐k positions in the queue contain the most frequent subgraphs. Our experiments on the database of large graphs show that FS^3 is efficient and obtains subgraphs that are the most frequent among the subgraphs of a given size.

Publication
Statistical Analysis and Data Mining
Date