snap_cluster_lib (overlapping community detection)

Class to perform overlapping community detection using bigclam.

class snap_cluster_lib.BigClamArticleCluster(data, coms=40)

Bases: snap_cluster_lib.BigClamLineCluster

__init__(data, coms=40)

Modified BigClamLineCluster which is used to cluster articles together.

Parameters:
  • data (list) – A list where each item is a single article
  • coms (int) – The number of communities to be found using overlapping community detection
print_community()

Prints all the communities produced by overlapping community detection.

class snap_cluster_lib.BigClamChainCluster(data)

Bases: snap_cluster_lib.BigClamLineCluster

__init__(data)
load(data)

Loads the data into the object for later use. It performs feature extraction to build the dictionary and transforms the corpus into log-entropy vectors. This is so that we can find the top 50 words for each cluster.

Parameters:data (list) – A list of clusters, where each item is a cluster
print_cluster(cluster_id_list)

Prints the clusters in the given cluster list and performs coherence calculation.

Parameters:cluster_id_list (list) – A list of clusters that make up a coherent chain.
print_community()

Prints all the communities produced by overlapping community detection.

class snap_cluster_lib.BigClamCluster

Bases: object

__init__()

Initialise the BigClamCluster object. Creates objects for feature extraction and the bipartite graph.

create_graph(cutoff=0.1)

Creates the bipartite graph necessary for overlapping community detection. It assumes that the necessary edges between the words and the clusters have already been generated.

find_community(num_threads=4, min_com=10, max_com=100, div_com=5, step_alpha=0.3, step_beta=0.3, opt_com=-1, threshold=0.001)

Finds communities using bigclam. Each of these communities will represent a chain of articles. The communities are stored in self.EstCmtyVV.

Parameters:
  • num_threads – number of threads (parallel)
  • min_com – minimum number of communities
  • max_com – maximum number of communities
  • div_com – how many trials for number of communities
  • step_alpha – alpha for backtracking line search
  • step_beta – beta for backtracking line search
  • opt_com – number of communities to detect
class snap_cluster_lib.BigClamLineCluster(data)

Bases: snap_cluster_lib.BigClamCluster

__init__(data)
create_edges()

Builds the edges of the bipartite graph using the top 50 tf-idf words from each cluster. Once the edges are built, the graph can be created.

load(data)

Loads the data into the object for later use. It performs feature extraction to build the dictionary and transforms the corpus into log-entropy vectors. This is so that we can find the top 50 words for each cluster.

Parameters:data (list) – A list of clusters, where each item is a cluster
print_community()

Prints all the communities produced by overlapping community detection.