snap_cluster_lib (overlapping community detection)¶
Class to perform overlapping community detection using bigclam.
-
class
snap_cluster_lib.
BigClamArticleCluster
(data, coms=40)¶ Bases:
snap_cluster_lib.BigClamLineCluster
-
__init__
(data, coms=40)¶ Modified BigClamLineCluster which is used to cluster articles together.
Parameters: - data (list) – A list where each item is a single article
- coms (int) – The number of communities to be found using overlapping community detection
-
print_community
()¶ Prints all the communities produced by overlapping community detection.
-
-
class
snap_cluster_lib.
BigClamChainCluster
(data)¶ Bases:
snap_cluster_lib.BigClamLineCluster
-
__init__
(data)¶
-
load
(data)¶ Loads the data into the object for later use. It performs feature extraction to build the dictionary and transforms the corpus into log-entropy vectors. This is so that we can find the top 50 words for each cluster.
Parameters: data (list) – A list of clusters, where each item is a cluster
-
print_cluster
(cluster_id_list)¶ Prints the clusters in the given cluster list and performs coherence calculation.
Parameters: cluster_id_list (list) – A list of clusters that make up a coherent chain.
-
print_community
()¶ Prints all the communities produced by overlapping community detection.
-
-
class
snap_cluster_lib.
BigClamCluster
¶ Bases:
object
-
__init__
()¶ Initialise the BigClamCluster object. Creates objects for feature extraction and the bipartite graph.
-
create_graph
(cutoff=0.1)¶ Creates the bipartite graph necessary for overlapping community detection. It assumes that the necessary edges between the words and the clusters have already been generated.
-
find_community
(num_threads=4, min_com=10, max_com=100, div_com=5, step_alpha=0.3, step_beta=0.3, opt_com=-1, threshold=0.001)¶ Finds communities using bigclam. Each of these communities will represent a chain of articles. The communities are stored in self.EstCmtyVV.
Parameters: - num_threads – number of threads (parallel)
- min_com – minimum number of communities
- max_com – maximum number of communities
- div_com – how many trials for number of communities
- step_alpha – alpha for backtracking line search
- step_beta – beta for backtracking line search
- opt_com – number of communities to detect
-
-
class
snap_cluster_lib.
BigClamLineCluster
(data)¶ Bases:
snap_cluster_lib.BigClamCluster
-
__init__
(data)¶
-
create_edges
()¶ Builds the edges of the bipartite graph using the top 50 tf-idf words from each cluster. Once the edges are built, the graph can be created.
-
load
(data)¶ Loads the data into the object for later use. It performs feature extraction to build the dictionary and transforms the corpus into log-entropy vectors. This is so that we can find the top 50 words for each cluster.
Parameters: data (list) – A list of clusters, where each item is a cluster
-
print_community
()¶ Prints all the communities produced by overlapping community detection.
-