Identifying small worlds networks

Github repository: link

The present repository showcases the implementation of an experimental examination of the Small-World Networks (SWN) theory. The examination is based on real-world datasets, and the outcomes are presented through a Jupyter Notebook. As there is no corresponding written paper, the notebook serves as the sole source of information regarding the study. It encompasses all relevant theoretical background, details of the experimental design, presentation of results, and derivation of conclusions.

Time spent one the project since the beginning of 2023 (add ~20 hours for a real estimate)

Documentation

Notebook structure

The core component of the project is a Jupyter Notebook named main.ipynb. The notebook provides ample information to comprehend the study and its results.

The notebook starts with a theoretical background of the study, delving into the theory of random networks. The Erdős-Rényi and Watts-Strogatz models receive particular attention, as their properties form the basis of the study.
Next, the experimental setup is presented. There is a detailed description of the datasets, along with the methodology adopted for data extraction, transformation, and loading. In this section, the graphs are constructed from the raw data obtained from the datasets.
The properties of the constructed graphs are then calculated, including the average degree, average clustering coefficient, average shortest path length, and average betweenness centrality.
The results of the study are then presented, followed by a comprehensive analysis and discussion of the outcomes.
Finally, we attempt to understand whether the networks are small-world or not by employing different approaches and evaluating the consistency of the results.

Algorithms and functions implemented

In order to maintain a clean and organized notebook, I have integrated all algorithms and functions into the utils.py file. This module serves as a centralized repository of functions and is imported into the notebook for use in the study.

A thorough technical examination of the project requires a close examination of this file, as it holds nearly all the code utilized. Each function is accompanied by detailed documentation to ensure clear understanding and ease of use.

External scripts

The computation of the omega coefficient, a crucial measure of the small-worldness of a network, is one of the most vital functions in this project. However, its application proves to be extremely slow, rendering its usage in the notebook inadvisable. To address this issue, I have developed a separate script, named omega_sampled_server.py. The script is intended for execution on a server machine, capable of handling the extensive processing demands for extended periods of time, to compute the omega coefficient for a specified graph.

To run the script, execute the following command:

./omega_sampled_server.py graph --k --niter --nrand

Where:

graph is the name of the graph
k Percentage of nodes to be remove
niter Number of rewiring operations per edge
nrand Number of random graphs to be generated

For further details run ./omega_sampled_server.py --help

Parallel version

However, the computation of the omega coefficient was still very slow. To speed up the process, I developed a parallel version of the script above, called omega_parallel_server.py. Its development required substantial effort as the parallelization of a function utilizing a random number generator is a complex task.

To run the script, execute the following command:

./omega_sampled_parallel.py graph --k --niter --nrand --n_processes --seed

Where:

graph is the name of the graph
k Percentage of nodes to be remove
niter Number of rewiring operations per edge
nrand Number of random graphs to be generated
n_processes is an argument that specifies the number of processes to be used
seed is the seed for the random number generator

For further details run ./omega_sampled_parallel.py --help

Requirements

The project has been developed in Python 3.10.9, to install the required libraries run the following command:

pip install -r requirements.txt

NOTE: I have no access to Windows or MacOS machines, so I cannot guarantee whether the project will function optimally on these platforms. However, I have made a concerted effort to ensure broad compatibility by implementing the project in a manner that should allow for its seamless operation on any platform. All the functions have been test on a Arch Linux machine, with an AMD Ryzen 5 2600 processor and 16GB of RAM.

Experimental

There is a directory named experimental in the repo that contains a number of scripts written in C++ with the purpose of speeding up the computation of the omega coefficient. The scripts are not used in the project, but they are included for completeness. This scripts are not documented, not tested, and not guaranteed to work. I suggest to ignore them unless you are a C++ enthusiast that has time to waste.

References

Here I report a list of the references utilized for the implementation of the project. This information is also available in the notebook.

In no particular order

[1] On the evolution of random graphs, P. Erdős, A. Rényi, Publ. Math. Inst. Hungar. Acad. Sci., 5, 17-61 (1960).

[2] Complex Networks: Structure, Robustness, and Function, R. Cohen, S. Havlin, D. ben-Avraham, H. E. Stanley, Cambridge University Press, 2009.

[3] Collective dynamics of ‘small-world’ networks, D. J. Watts and S. H. Strogatz, Nature, 393, 440-442, 1998.

[4] On random graphs I, P. Erdős and A. Rényi, Publ. Math. Inst. Hungar. Acad. Sci., 5, 290-297, 1960.

[5] Generalizations of the clustering coefficient to weighted complex networks, M. E. J. Newman, Physical Review E, 74, 036104, 2006.

[6] The ubiquity of small-world networks. Telesford QK, Joyce KE, Hayasaka S, Burdette JH, Laurienti PJ. Brain Connect. 2011;1(5):367-75

[8] Humphries and Gurney (2008). “Network ‘Small-World-Ness’: A Quantitative Method for Determining Canonical Network Equivalence”. PLoS One. 3 (4)

[9] The brainstem reticular formation is a small-world, not scale-free, network M. D. Humphries, K. Gurney and T. J. Prescott, Proc. Roy. Soc. B 2006 273, 503-511,

[10] Sporns, Olaf, and Jonathan D. Zwi. “The small world of the cerebral cortex.” Neuroinformatics 2.2 (2004): 145-162.

[11] Maslov, Sergei, and Kim Sneppen. “Specificity and stability in topology of protein networks.” Science 296.5569 (2002): 910-913.

[13] B. Bollob ́as, Random Graphs, 1985. London: Academic Press

[14] R. Cohen, K. Erez, D. ben-Avraham, and S. Havlin, Resilience of the Internet to random breakdown, Physical Review Letters 85 (2000), 4626–4628

[15] Dingqi Yang, Bingqing Qu, Jie Yang, Philippe Cudre-Mauroux, Revisiting User Mobility and Social Relationships in LBSNs: A Hypergraph Embedding Approach, In Proc. of The Web Conference (WWW’19). May. 2019, San Francisco, USA.

[16] Ulrik Brandes, A Faster Algorithm for Betweenness Centrality, Journal of Mathematical Sociology, 25(2):163-177, 2001._

[17] Error and attack tolerance of complex networks, R. Albert, Nature volume 406, pages378–382 (2000)