Classic

These functions are classic graph theoretic metrics.

`ccc(matrix)`

Computes the classical clustering coefficient of a directed graph

Parameters:

Name	Type	Description	Default
`matrix`	`numpy array`	the adjaceny matrix of an unweighted graph	required

Returns:

Type	Description
`Series`	The index is the node index in `matrix` and the value is the clustering coefficient of the node.

References

The formula is taken from the following paper.

[1] G. Fagiolo, "Clustering in complex directed networks", 2006; DOI: 10.1103/PhysRevE.76.026107.

[2] Conceição, Pedro, et al. "An application of neighbourhoods in digraphs to the classification of binary dynamics.", 2022 DOI: 10.1162/netn_a_00228.

Source code in src/connalysis/network/classic.py

def ccc(matrix):
    """Computes the classical clustering coefficient of a directed graph

    Parameters
    ----------
    matrix : numpy array
        the adjaceny matrix of an unweighted graph

    Returns
    -------
    Series
        The index is the node index in ``matrix`` and the value is the clustering coefficient of the node. 

    References
    ----------
    The formula is taken from the following paper.

    [1] G. Fagiolo, "Clustering in complex directed networks", 2006;
            [DOI: 10.1103/PhysRevE.76.026107]().

    [2] Conceição, Pedro, et al. "An application of neighbourhoods in digraphs to the classification of binary dynamics.", 2022
            [DOI: 10.1162/netn_a_00228](). 

    """
    # We only analyze the udnerlying connectivity not the weights
    matrix=matrix.astype(bool).astype(int)
    # Numerator 
    if sp.issparse(matrix):
        dense_matrix = matrix + matrix.transpose()
        numerator = 0.5*np.diag((dense_matrix ** 3).toarray())
    else:
        numerator = 0.5*np.diag(np.linalg.matrix_power(matrix + np.transpose(matrix), 3))
    # Denominator
    deg = node_degree(matrix) 
    denominator = (deg*(deg-1)-2*rc_submatrix(matrix).toarray().sum(axis=0))
    return numerator/denominator

`closeness_connected_components(adj, directed=False, return_sum=True)`

Compute the closeness of each connected component of more than 1 vertex

Parameters:

Name	Type	Description	Default
`adj`	`array_like`	Adjacency matrix of the graph	required
`directed`	`bool`	If `True`, will be computed using strongly connected components and directed closeness.	`False`
`return_sum`	`bool`	If `True`, only one list will be returned, by summing over all the connected components.	`True`

Returns:

Type	Description
`array_like`	A single array( if `return_sum=True`) or a list of arrays of shape `n`, containting closeness of vertices in that component, or 0 if the vertex is not in the component. Closeness cannot be zero otherwise.

Source code in src/connalysis/network/classic.py

def closeness_connected_components(adj, directed=False, return_sum=True):
    """Compute the closeness of each connected component of more than 1 vertex

    Parameters
    ----------
    adj : array_like
        Adjacency matrix of the graph
    directed : bool
        If `True`, will be computed using strongly connected components and directed closeness.
    return_sum : bool
        If `True`, only one list will be returned, by summing over all the connected components.


    Returns
    -------
    array_like
        A single array( if `return_sum=True`) or a list of arrays of shape `n`, containting closeness of vertices in that component, or 0 if the vertex is not in the component. Closeness cannot be zero otherwise.

    """
    from sknetwork.ranking import Closeness
    from scipy.sparse.csgraph import connected_components

    matrix = sp.csr_matrix(adj)
    if directed:
        n_comp, comp = connected_components(matrix, directed=True, connection="strong")
    else:
        n_comp, comp = connected_components(matrix, directed=False)
        matrix = matrix + matrix.T  # we need to make the matrix symmetric

    closeness = Closeness()  # if matrix is not symmetric automatically use directed
    n = matrix.shape[0]
    all_c = []
    for i in range(n_comp):
        c = np.zeros(n)
        idx = np.where(comp == i)[0]
        sub_mat = matrix[np.ix_(idx, idx)]
        if sub_mat.getnnz() > 0:
            c[idx] = closeness.fit_predict(sub_mat)
            all_c.append(c)
    if return_sum:
        all_c = np.array(all_c)
        return np.sum(all_c, axis=0)
    else:
        return all_c

`connected_components(adj, directed=True, connection='weak', return_labels=False)`

Returns a list of the size of the connected components of the graph

Parameters:

Name	Type	Description	Default
`adj`	`array_like or sparse matrix`	Adjacency matrix of the graph	required
`directed`	`bool`	If `True`, will be compute connected components of the directed graph	`True`
`connection`	`str {'weak', 'strong'}`	If `weak`, it will compute the connected components of the underlying undirected graph. If `strong`, it will compute strongly connected components of the directed graph.	`'weak'`
`return_labels`	`bool`	If `True`, will return the labels of the connected components	`False`

Returns:

Type	Description
`array_like`	A list of the size of the connected components of the graph. If return_labels == True, it also returns the list of labels of the connected components.

Source code in src/connalysis/network/classic.py

def connected_components(adj,directed=True, connection='weak', return_labels=False):
    """Returns a list of the size of the connected components of the graph

    Parameters
    ----------
    adj : array_like or sparse matrix
        Adjacency matrix of the graph
    directed : bool
        If `True`, will be compute connected components of the directed graph
    connection : str {'weak', 'strong'}
        If `weak`, it will compute the connected components of the underlying undirected graph. 
        If `strong`, it will compute strongly connected components of the directed graph.
    return_labels : bool
        If `True`, will return the labels of the connected components

    Returns
    -------
    array_like
        A list of the size of the connected components of the graph. 
        If return_labels == True, it also returns the list of labels of the connected components.
    """

    matrix = sp.csr_matrix(adj)

    comps=sp.csgraph.connected_components(matrix , directed=directed, 
                                          connection=connection, return_labels=True)
    comps_size=np.unique(comps[1], return_counts=True)[1]
    if return_labels:
        return comps_size, comps[1]
    else:
        return comps_size

`connection_probability_within(m, v, cols=['x', 'y'], max_dist=100, type='directed', skip_symmetry_check=False)`

Returns the average density of submatrices of nodes within distance max_dist of each node in m.

Parameters:

Name	Type	Description	Default
`m`	`array or sparse matrix`	Adjacency matrix of the graph	required
`v`	`DataFrame or tuple`	DataFrame with the coordinates of the nodes if m is square and has the same pre and post nodes. If a tuple is passed, it should contain two DataFrames, one for the source nodes and one for the target nodes.	required
`cols`	`list`	Columns of the DataFrame containing the coordinates of the nodes.	`['x', 'y']`
`max_dist`	`float`	Maximum distance between nodes to be considered connected.	`100`
`type`	`str {'directed', 'undirected', 'reciprocal'}`	The type of the graph considered for the computation. If 'directed', the density as a directed graph is computed. If 'undirected', the density of the underlying undirected graph is computed. Only possible if the matrix is square. If 'reciprocal', the density of the underlying reciprocal graph is computed. Only possible if the matrix is square.	`'directed'`
`skip_symmetry_check`	`bool`	If `True`, it will skip the check for symmetry of the matrix.	`False`

Returns:

Type	Description
`float`	The average density of the submatrices.

Source code in src/connalysis/network/classic.py

def connection_probability_within(m, v, cols=["x", "y"], max_dist=100, type='directed', skip_symmetry_check= False):
    """Returns the average density of submatrices of nodes within distance `max_dist` of each node in `m`.

        Parameters
        ----------
        m : array or sparse matrix
            Adjacency matrix of the graph
        v : DataFrame or tuple
            DataFrame with the coordinates of the nodes if m is square and has the same pre and post nodes. 
            If a tuple is passed, it should contain two DataFrames, one for the source nodes and one for the target nodes.
        cols : list
            Columns of the DataFrame containing the coordinates of the nodes.
        max_dist : float
            Maximum distance between nodes to be considered connected.        
        type : str {'directed', 'undirected', 'reciprocal'}
            The type of the graph considered for the computation.
            If 'directed', the density as a directed graph is computed.
            If 'undirected', the density of the underlying undirected graph is computed. Only possible if the matrix is square.
            If 'reciprocal', the density of the underlying reciprocal graph is computed. Only possible if the matrix is square.
        skip_symmetry_check : bool
            If `True`, it will skip the check for symmetry of the matrix.

        Returns
        -------
        float
            The average density of the submatrices.
    """
    # Get coordinates of nodes and their nearest neighbors
    pairs_mat=get_pairs_within(m, v, cols=cols, max_dist=max_dist)
    if pairs_mat is np.nan:
        return np.nan
    else:
        if type=='reciprocal':
            assert m.shape[0]==m.shape[1], "The matrix must be square to compute the reciprocal connectivity"
            m=rc_submatrix(m).tocsc()
        elif type=='undirected':
            assert m.shape[0]==m.shape[1], "The matrix must be square to compute the undirected connectivity"
            if not skip_symmetry_check:
                if (m!=m.T).nnz >0:
                    print("The graph is directed. Taking the underlying undirected graph")
                    m=underlying_undirected_matrix(m)
            m=sp.csc_matrix(m)
        elif type=='directed':
            m=sp.csc_matrix(m)
        else:
            print(type)
            raise ValueError("Type must be 'directed', 'undirected' or 'reciprocal'")
        return m[pairs_mat].astype(bool).mean() 

`core_number(adj)`

Returns the core number for each node.

Parameters:

Name	Type	Description	Default
`adj`	`array_like or sparse matrix`	Adjacency matrix of the graph	required
`directed`	`bool`	If `True`, will be compute connected components of the directed graph	required
`connection`	`str {'weak', 'strong'}`	If `weak`, it will compute the connected components of the underlying undirected graph. If `strong`, it will compute strongly connected components of the directed graph.	required
`return_labels`	`bool`	If `True`, will return the labels of the connected components	required

Returns:

Type	Description
`dict`	A dictionary with keys the indices of the nodes of adj and values their corresponding core number.

Notes

The k-core of a graph is the maximal subgraph that contains nodes of degree k or more in the induced subgraph. The core number of a node is the largest value k of a k-core containing that node. For directed graphs the total node degree is use, i.e., the sum of in-degree + out-degree.

Source code in src/connalysis/network/classic.py

def core_number(adj):
    """Returns the core number for each node.

        Parameters
        ----------
        adj : array_like or sparse matrix
            Adjacency matrix of the graph
        directed : bool
            If `True`, will be compute connected components of the directed graph
        connection : str {'weak', 'strong'}
            If `weak`, it will compute the connected components of the underlying undirected graph. 
            If `strong`, it will compute strongly connected components of the directed graph.
        return_labels : bool
            If `True`, will return the labels of the connected components

        Returns
        -------
        dict
            A dictionary with keys the indices of the nodes of adj and values their corresponding core number.

        Notes
        -----
        The k-core of a graph is the maximal subgraph that contains nodes of degree k or more in the induced subgraph.
        The core number of a node is the largest value k of a k-core containing that node.
        For directed graphs the total node degree is use, i.e., the sum of in-degree + out-degree.
    """
    # TODO: Implement directed (k,l) core and k-core of underlying undirected graph (very similar to this)
    # TODO: Filtered simplex counts with different weights on vertices (coreness, intersection)
    #  or on edges (strength of connection).    
    adj=sp.csr_matrix(adj)
    G = nx.from_scipy_sparse_array(adj)
    # Very inefficient (returns a dictionary!). TODO: Look for different implementation
    return nx.algorithms.core.core_number(G)

`density(adj, type='directed', skip_symmetry_check=False)`

Returns the density of a matrix.

Parameters:

Name	Type	Description	Default
`adj`	`array_like or sparse matrix`	Adjacency matrix of the graph	required
`type`	`str {'directed', 'undirected', 'reciprocal'}`	The type of the graph considered for the computation. If 'directed', the density as a directed graph is computed. If 'undirected', the density of the underlying undirected graph is computed. If 'reciprocal', the density of the underlying reciprocal graph is computed.	`'directed'`
`skip_symmetry_check`	`bool`	If `True`, it will skip the check for symmetry of the matrix.	`False`

Returns:

Type	Description
`float`	The density of the graph.

Source code in src/connalysis/network/classic.py

def density(adj, type="directed", skip_symmetry_check=False):
    """Returns the density of a matrix.

        Parameters
        ----------
        adj : array_like or sparse matrix
            Adjacency matrix of the graph
        type : str {'directed', 'undirected', 'reciprocal'}
            The type of the graph considered for the computation.
            If 'directed', the density as a directed graph is computed.
            If 'undirected', the density of the underlying undirected graph is computed.
            If 'reciprocal', the density of the underlying reciprocal graph is computed.
        skip_symmetry_check : bool
            If `True`, it will skip the check for symmetry of the matrix.

        Returns
        -------
        float
            The density of the graph.
    """
    assert adj.shape[0] == adj.shape[1], "The matrix must be square!"
    n=adj.shape[0]
    adj=sp.csr_matrix(adj).astype('bool')
    if type=="undirected":
        if not skip_symmetry_check:
            if (adj!=adj.T).nnz >0:
                print("The graph is directed. Taking the underlying undirected graph")
                adj=underlying_undirected_matrix(adj)
    if type=="reciprocal":
        adj=rc_submatrix(adj)
    return adj.sum() /  (n*(n-1))

`efficient_rich_club_curve(M, direction='TOTAL', pre_calculated_filtration=None, sparse_bin_set=False)`

Fast computation of the rich-club curve of the matrix M with respect to a degree filtration and possibly any filtration chosen by the user.

Parameters:

Name	Type	Description	Default
`M`	`array_like or sparse matrix`	Adjacency matrix of the graph.	required
`direction`	`str {'OUT', 'IN', 'TOTAL'}`	'OUT' : Compute the rich-club curve for the out-degree. 'IN' : Compute the rich-club curve for the in-degree. 'TOTAL' or None : Compute the rich-club curve for the total degree i.e., in-degree + out-degree.	`'TOTAL'`
`pre_calculated_filtration`	`pandas series`	To provide user defined filtration values other than degree. The index are the nodes of M and values the filtration values.	`None`
`sparse_bin_set`	`bool`	If False, all integer values between 0 and the maximum degree/filtration value will be used to generate the bins. If True, unique values of the degrees/filtration will be used. This is useful when the degrees/filtration values are sparse over the whole range.	`False`

Returns:

Type	Description
`Pandas series`	With index the center of the binned degrees/filtration and values the rich club coefficient at that degree.

Notes

The rich-club coefficient is a measure of the tendency of high-degree nodes (nodes with a high filtration value) to form tightly interconnected communities.

Source code in src/connalysis/network/classic.py

def efficient_rich_club_curve(M, direction="TOTAL", pre_calculated_filtration=None, sparse_bin_set=False):
    """Fast computation of the rich-club curve of the matrix M with respect to a degree filtration and possibly any filtration chosen by the user.

    Parameters
    ----------
    M : array_like or sparse matrix
        Adjacency matrix of the graph.
    direction : str {'OUT', 'IN', 'TOTAL'}
        'OUT' : Compute the rich-club curve for the out-degree.

        'IN' : Compute the rich-club curve for the in-degree.

        'TOTAL' or None : Compute the rich-club curve for the total degree i.e., in-degree + out-degree.
    pre_calculated_filtration : pandas series 
        To provide user defined filtration values other than degree.  
        The index are the nodes of M and values the filtration values.
    sparse_bin_set : bool
        If False, all integer values between 0 and the maximum degree/filtration value will be used to generate the bins. 
        If True, unique values of the degrees/filtration will be used. 
        This is useful when the degrees/filtration values are sparse over the whole range.
    Returns
    -------
    Pandas series
        With index the center of the binned degrees/filtration and values the rich club coefficient at that degree.
    Notes
    -----
    The rich-club coefficient is a measure of the tendency of high-degree nodes (nodes with a high filtration value) to form tightly interconnected communities.
    """
    #TODO: Maybe expand the notes explaining this concept.
    if pre_calculated_filtration is not None:
        deg = pre_calculated_filtration
    else:
        if direction=="TOTAL": direction = None 
        deg=node_degree(M,direction=direction)
    M = M.tocoo()
    shape = M.shape
    M = pd.DataFrame.from_dict({"row": M.row, "col": M.col})



    if sparse_bin_set == False:
        degree_bins = np.arange(deg.max() + 2)
    elif sparse_bin_set == True:
        degree_bins = np.unique(np.append(deg, [0, deg.max() + 1]))
    degree_bins_rv = degree_bins[-2::-1]
    nrn_degree_distribution = np.histogram(deg.values, bins=degree_bins)[0]
    nrn_cum_degrees = np.cumsum(nrn_degree_distribution[-1::-1])
    nrn_cum_pairs = nrn_cum_degrees * (nrn_cum_degrees - 1)

    deg_arr = np.zeros(shape[0], dtype=int)
    deg_arr[deg.index.values] = deg.values

    deg = None

    con_degree = np.minimum(deg_arr[M["row"].values], deg_arr[M["col"].values])
    M = None
    con_degree = np.histogram(con_degree, bins=degree_bins)[0]

    cum_degrees = np.cumsum(con_degree[-1::-1])

    return pd.Series(cum_degrees / nrn_cum_pairs, degree_bins_rv)[::-1]

`get_pairs_within(m, v, cols=['x', 'y'], max_dist=100)`

Returns a matrix of the paris of nodes within distance max_dist of each other.

Parameters:

Name	Type	Description	Default
`m`	`array or sparse matrix`	Adjacency matrix of the graph	required
`v`	`DataFrame or tuple`	DataFrame with the coordinates of the nodes if m is square and has the same source and target nodes. If a tuple is passed, it should contain two DataFrames, one for the source nodes and one for the target nodes.	required
`cols`	`list`	Columns of the DataFrame containing the coordinates of the nodes.	`['x', 'y']`
`max_dist`	`float`	Maximum distance between nodes to be considered connected.	`100`

Returns:

Type	Description
`sparse matrix`	Boolean matrix with 1 indicating the pairs of nodes within `max_dist` of each other, excluding the diagonal for square matrices.

Source code in src/connalysis/network/classic.py

def get_pairs_within(m, v, cols=["x", "y"], max_dist=100):
    """Returns a matrix of the paris of nodes within distance `max_dist` of each other.

        Parameters
        ----------
        m : array or sparse matrix
            Adjacency matrix of the graph
        v : DataFrame or tuple
            DataFrame with the coordinates of the nodes if m is square and has the same source and target nodes. 
            If a tuple is passed, it should contain two DataFrames, one for the source nodes and one for the target nodes.        
        cols : list
            Columns of the DataFrame containing the coordinates of the nodes.
        max_dist : float
            Maximum distance between nodes to be considered connected.        

        Returns
        -------
        sparse matrix
            Boolean matrix with 1 indicating the pairs of nodes within `max_dist` of each other, 
            excluding the diagonal for square matrices.
    """
    if isinstance(v, tuple):
        vpre, vpost = v
    else:
        vpre = v; vpost = v
    tree = KDTree(vpre[cols])
    pairs = tree.query_ball_point(vpost[cols], max_dist)
    indptr = np.cumsum([0] + [len(_x) for _x in pairs])
    pairs_mat = sp.csc_matrix((np.ones(indptr[-1], dtype=bool),
                                   np.hstack(pairs), indptr),
                                   shape=m.shape)
    if not isinstance(v,tuple):
        pairs_mat.setdiag(0) # Don't consider edges from i to i when pre and post subpopulations are the same
    if indptr[-1] == 0:
        return np.nan
    return pairs_mat.astype(bool)

`largest_strongly_connected_component(adjacency_matrix)`

Computes the largest strongly connected component of the graph with adjacency matrix adjacency_matrix, and returns the adjacency matrix of said component

Parameters:

Name	Type	Description	Default
`adjacency_matrix`	`numpy array`	the adjaceny matrix of the DiGraph as a numpy array	required

Returns:

Type	Description
`numpy array`	The adjacency matrix of the largest strongly connected component

Source code in src/connalysis/network/classic.py

def largest_strongly_connected_component(adjacency_matrix):
    """Computes the largest strongly connected component of the graph with adjacency matrix adjacency_matrix,
        and returns the adjacency matrix of said component

    Parameters
    ----------
    adjacency_matrix : numpy array
        the adjaceny matrix of the DiGraph as a numpy array


    Returns
    -------
    numpy array
        The adjacency matrix of the largest strongly connected component

    """
    if sp.issparse(adjacency_matrix):
        adjacency_matrix = adjacency_matrix.toarray()
    current_tribe_nx = np_to_nx(adjacency_matrix)
    largest_comp = max(nx.strongly_connected_components(current_tribe_nx), key=len)
    current_tribe_strong_nx = current_tribe_nx.subgraph(largest_comp)
    current_tribe_strong = nx_to_np(current_tribe_strong_nx)
    return current_tribe_strong

`np_to_nx(adjacency_matrix)`

Converts numpy array of an adjacency matrix to a networkx digraph

Parameters:

Name	Type	Description	Default
`adjacency_matrix`	`numpy array`	the adjaceny matrix of the DiGraph as a numpy array	required

Returns:

Type	Description
`networkx DiGraph`	a directed graph

Source code in src/connalysis/network/classic.py

def np_to_nx(adjacency_matrix):
    """Converts numpy array of an adjacency matrix to a networkx digraph

    Parameters
    ----------
    adjacency_matrix : numpy array
        the adjaceny matrix of the DiGraph as a numpy array


    Returns
    -------
    networkx DiGraph
            a directed graph

    """
    return nx.from_numpy_array(adjacency_matrix,create_using=nx.DiGraph)

`nx_to_np(directed_graph)`

Converts networkx digraph to numpy array of the adjacency matrix

Parameters:

Name	Type	Description	Default
`directed_graph`	`networkx DiGraph`	a directed graph	required

Returns:

Type	Description
`numpy array`	the adjaceny matrix of the DiGraph as a numpy array

Source code in src/connalysis/network/classic.py

def nx_to_np(directed_graph):
    """Converts networkx digraph to numpy array of the adjacency matrix 

    Parameters
    ----------
    directed_graph : networkx DiGraph
        a directed graph

    Returns
    -------
    numpy array
        the adjaceny matrix of the DiGraph as a numpy array
    """
    return nx.to_numpy_array(directed_graph,dtype=int)

`rich_club_curve(m, direction='TOTAL')`

Compute the rich-club curve of the matrix m.

Parameters:

Name	Type	Description	Default
`m`	`array_like or sparse matrix`	Adjacency matrix of the graph.	required
`direction`	`str {'OUT', 'IN', 'TOTAL'}`	'OUT' : Compute the rich-club curve for the out-degree. 'IN' : Compute the rich-club curve for the in-degree. 'TOTAL' : Compute the rich-club curve for the total degree i.e., in-degree + out-degree.	`'TOTAL'`

Returns:

Type	Description
`Pandas series`	With index the center of the binned degrees and values the rich club coefficient at that degree.

Notes

The rich-club coefficient is a measure of the tendency of high-degree nodes to form tightly interconnected communities.

Source code in src/connalysis/network/classic.py

def rich_club_curve(m, direction='TOTAL'):
    """Compute the rich-club curve of the matrix m.

    Parameters
    ----------
    m : array_like or sparse matrix
        Adjacency matrix of the graph.
    direction : str {'OUT', 'IN', 'TOTAL'}
        'OUT' : Compute the rich-club curve for the out-degree.

        'IN' : Compute the rich-club curve for the in-degree.

        'TOTAL' : Compute the rich-club curve for the total degree i.e., in-degree + out-degree.

    Returns
    -------
    Pandas series
        With index the center of the binned degrees and values the rich club coefficient at that degree.
    Notes
    -----
    The rich-club coefficient is a measure of the tendency of high-degree nodes to form tightly interconnected communities.
    """
    m = sp.csc_matrix(m)
    if direction == 'TOTAL': direction = None 
    degrees = node_degree(m, direction=direction)
    if m.dtype == bool:
        udegrees = np.arange(1, degrees.max() + 1)
        ret_x = udegrees
    else:
        ret_x, udegrees, degrees = _bin_degrees(degrees)

    edge_counter = lambda i: (degrees >= i).sum() * ((degrees >= i).sum() - 1)  # number of pot. edges
    mat_counter = lambda i: m[np.ix_(degrees >= i, degrees >= i)].sum()  # number of actual edges
    ret = (np.array([mat_counter(i) for i in udegrees]).astype(float)
           / np.array([edge_counter(i) for i in udegrees]))
    return pd.Series(ret, index=pd.Index(ret_x, name="degree"))