snap_cluster_lib (overlapping community detection)¶
Class to perform overlapping community detection using bigclam.
-
class
snap_cluster_lib.BigClamArticleCluster(data, coms=40)¶ Bases:
snap_cluster_lib.BigClamLineCluster-
__init__(data, coms=40)¶ Modified BigClamLineCluster which is used to cluster articles together.
Parameters: - data (list) – A list where each item is a single article
- coms (int) – The number of communities to be found using overlapping community detection
-
print_community()¶ Prints all the communities produced by overlapping community detection.
-
-
class
snap_cluster_lib.BigClamChainCluster(data)¶ Bases:
snap_cluster_lib.BigClamLineCluster-
__init__(data)¶
-
load(data)¶ Loads the data into the object for later use. It performs feature extraction to build the dictionary and transforms the corpus into log-entropy vectors. This is so that we can find the top 50 words for each cluster.
Parameters: data (list) – A list of clusters, where each item is a cluster
-
print_cluster(cluster_id_list)¶ Prints the clusters in the given cluster list and performs coherence calculation.
Parameters: cluster_id_list (list) – A list of clusters that make up a coherent chain.
-
print_community()¶ Prints all the communities produced by overlapping community detection.
-
-
class
snap_cluster_lib.BigClamCluster¶ Bases:
object-
__init__()¶ Initialise the BigClamCluster object. Creates objects for feature extraction and the bipartite graph.
-
create_graph(cutoff=0.1)¶ Creates the bipartite graph necessary for overlapping community detection. It assumes that the necessary edges between the words and the clusters have already been generated.
-
find_community(num_threads=4, min_com=10, max_com=100, div_com=5, step_alpha=0.3, step_beta=0.3, opt_com=-1, threshold=0.001)¶ Finds communities using bigclam. Each of these communities will represent a chain of articles. The communities are stored in self.EstCmtyVV.
Parameters: - num_threads – number of threads (parallel)
- min_com – minimum number of communities
- max_com – maximum number of communities
- div_com – how many trials for number of communities
- step_alpha – alpha for backtracking line search
- step_beta – beta for backtracking line search
- opt_com – number of communities to detect
-
-
class
snap_cluster_lib.BigClamLineCluster(data)¶ Bases:
snap_cluster_lib.BigClamCluster-
__init__(data)¶
-
create_edges()¶ Builds the edges of the bipartite graph using the top 50 tf-idf words from each cluster. Once the edges are built, the graph can be created.
-
load(data)¶ Loads the data into the object for later use. It performs feature extraction to build the dictionary and transforms the corpus into log-entropy vectors. This is so that we can find the top 50 words for each cluster.
Parameters: data (list) – A list of clusters, where each item is a cluster
-
print_community()¶ Prints all the communities produced by overlapping community detection.
-