Learn about communities and closeness centrality in social network analysis with Python and NetworkX
In Part 2, we expanded our understanding of social network analysis by charting the relationship between the Smashing Pumpkins and the Zwan band members. Then, we examined metrics such as degree centrality and betweenness centrality to examine the relationships between different band members. Also, we discussed how domain knowledge helps inform our understanding of the results.
In Part 3, we’ll cover the basics of proximity centrality and how it is calculated. Then, we’ll demonstrate how to calculate closeness centrality with NetworkX, using Billy Corgan’s network as an example.
before you start…
- Do you have basic knowledge of it? Python, if not, start here,
- are you familiar with Basic Concepts in Social Network Analysis, like nodes and edges? if not, start here,
- Are you comfortable with this? degree centrality And center centrality, if not, start here,
proximity centrality
proximity centrality is a measure in social network analysis that determines how close a node is to all other nodes in the context of the network. shortest path distance,
Proximity centrality focuses on the efficiency of information or resource flow within a network. The idea is that nodes with high closeness centrality are able to reach other nodes more quickly and efficiently, because they have a shorter average distance to the rest of the network.
The closeness centrality of a node is calculated as the inverse of the sum of the shortest path distances. (SPD) from that node to all other nodes in the network.
Closeness Centrality = 1 / (Sum of SPD from node to all other nodes)
Higher values indicate greater centrality and efficiency in information flow within the network.
Calculating adjacency centrality
Let’s break it down using a simple network with eight nodes.
- Calculate the shortest path distance (SPD) from node A to all other nodes. For our example, we’ll use a simple example of distances. In practice, this would be done with shortest path algorithms such as Breadth-first search or Dijkstra’s algorithm.
2. Calculate the sum of the shortest path distances from node A to all other nodes.
3. Apply the proximity centrality formula.
closeness and community
We can think of communities as groups of nodes that are more strongly connected within themselves than connections to nodes outside the group. Communities capture the idea of cohesive subgroups or modules within a network, where nodes in the same community have strong ties to each other. Communities are characterized by the presence of dense intra-community connections and relatively sparse intra-community connections.
When we consider the members of the bands Smashing Pumpkins and Zwaan, it is easy to imagine how the bands are connected to each other by their shared members. It displays both the intra-group connectivity between the members of each band and the intra-group connectivity between the two bands.
Whereas closeness centrality measures individual node importance and information flow efficiency, communities capture cohesive subgroups with dense connections. Together, they contribute to understanding the dynamics of information flow and the organization of networks.
Let us discuss some of the ways in which we can use proximity centrality and community to explain network dynamics.
- proximity centrality within communities
Nodes belonging to the same community often have high closeness centrality values within the community. This indicates that nodes within a community are closely connected and can reach each other quickly in terms of shortest path distance. High closeness centrality within communities reflects efficient information flow and communication within subgroups.
2. Connecting communities with proximity centrality
Nodes that connect different communities or act as bridges between communities may have greater closeness centrality than nodes in separate communities. These nodes play a vital role in connecting different communities, facilitating communication and information flow between them.
3. Community-Level Analysis Using Proximity Centrality
Proximity centrality can also be used at the community level to analyze the importance of communities within a network. By aggregating the proximity centrality values of nodes within a community, one can assess the overall efficiency of information flow within the community. Communities with higher average closeness centrality can be considered more central and influential in terms of their ability to access and disseminate information within the network.
Proximity centrality measures individual node importance and information flow efficiency, whereas communities capture cohesive subgroups with dense connections. Together, they contribute to understanding the dynamics of information flow and the organization of networks.
When considering Billy Corgan’s sphere of influence, proximity centrality can provide insight into how the members of Smashing Pumpkins and Zwaan directly and indirectly influence other musicians in Billy Corgan’s network. We can use the concept of community to describe each band, but we can also use it to describe the set of both bands. In fact, the community of 1990s alternative rock musicians is huge, and more communities will emerge as we add more bands to the network.
- As we did in Part 2, we are going to create a function that will generate all combinations of band members for each band.
2. Next, we define each band, and apply the function to generate a list of tuples. Then, we combine the lists and use a list comprehension to remove any doubles.
3. Now we can plot the graph.
It should look something like this:
4. Finally, let’s calculate the closeness centrality and analyze the values.
The output should look something like this:
So what can we say about values?
- Billy Corgan and Jimmy Chamberlin have the highest proximity centrality of 1.00, indicating that they are the most central members in terms of quickly reaching other members.
- James Iha, Katie Cole, Darcy Ritzky, Melissa Auf der Maur, Ginger Poole, Mike Byrne and Nicole Fiorentino have the same closeness centrality value of 0.785714. This suggests that these members are closely related and can reach each other quickly.
- Paz Lenchantin, David Pajo and Matt Sweeney have a closeness centrality value of slightly less than 0.611111. This indicates that they may be less central in terms of access to other members than the previous group, but they are still relatively well connected within the network.
Since we are still working with relatively simple networks, these results reveal nothing other than what we learned when we calculated degree centrality and betweenness centrality for Billy Corgan’s network. In Part 4, we will increase the complexity by adding more bands and musicians to the network. As a bonus, we’ll introduce some advanced techniques in Matplotlib to make your NetworkX graphs even more visually appealing!
if you want Fully Annotated Python Tutorialvisit my github,
👩🏻💻Christine Egan | Medium | Github | LinkedIn











