snap_cluster_lib (overlapping community detection)¶

Class to perform overlapping community detection using bigclam.

class snap_cluster_lib.BigClamArticleCluster(data, coms=40)¶

Bases: snap_cluster_lib.BigClamLineCluster

__init__(data, coms=40)¶

Modified BigClamLineCluster which is used to cluster articles together.

Parameters:	data (list) – A list where each item is a single article coms (int) – The number of communities to be found using overlapping community detection

print_community()¶: Prints all the communities produced by overlapping community detection.

class snap_cluster_lib.BigClamChainCluster(data)¶

Bases: snap_cluster_lib.BigClamLineCluster

__init__(data)¶

load(data)¶

Loads the data into the object for later use. It performs feature extraction to build the dictionary and transforms the corpus into log-entropy vectors. This is so that we can find the top 50 words for each cluster.

Parameters:	data (list) – A list of clusters, where each item is a cluster

print_cluster(cluster_id_list)¶

Prints the clusters in the given cluster list and performs coherence calculation.

Parameters:	cluster_id_list (list) – A list of clusters that make up a coherent chain.

print_community()¶: Prints all the communities produced by overlapping community detection.

class snap_cluster_lib.BigClamCluster¶

Bases: object

__init__()¶: Initialise the BigClamCluster object. Creates objects for feature extraction and the bipartite graph.

create_graph(cutoff=0.1)¶: Creates the bipartite graph necessary for overlapping community detection. It assumes that the necessary edges between the words and the clusters have already been generated.

find_community(num_threads=4, min_com=10, max_com=100, div_com=5, step_alpha=0.3, step_beta=0.3, opt_com=-1, threshold=0.001)¶

Finds communities using bigclam. Each of these communities will represent a chain of articles. The communities are stored in self.EstCmtyVV.

Parameters:	num_threads – number of threads (parallel) min_com – minimum number of communities max_com – maximum number of communities div_com – how many trials for number of communities step_alpha – alpha for backtracking line search step_beta – beta for backtracking line search opt_com – number of communities to detect

class snap_cluster_lib.BigClamLineCluster(data)¶

Bases: snap_cluster_lib.BigClamCluster

__init__(data)¶

create_edges()¶: Builds the edges of the bipartite graph using the top 50 tf-idf words from each cluster. Once the edges are built, the graph can be created.

load(data)¶

Loads the data into the object for later use. It performs feature extraction to build the dictionary and transforms the corpus into log-entropy vectors. This is so that we can find the top 50 words for each cluster.

Parameters:	data (list) – A list of clusters, where each item is a cluster

print_community()¶: Prints all the communities produced by overlapping community detection.