Utility Functions

ddot.utils.NdexGraph_to_nx(G)[source]

Converts a NetworkX into a NdexGraph object.

Parameters:G (ndex.networkn.NdexGraph) –
Returns:
Return type:networkx.classes.DiGraph
ddot.utils.bubble_layout_nx(G, xmin=-750, xmax=750, ymin=-750, ymax=750, verbose=False)[source]

Bubble-tree Layout using the Tulip library.

The input tree must be a graph. The layout is scaled so that it is fit exactly within a bounding box.

Grivet, S., Auber, D., Domenger, J. P., & Melancon, G. (2006). Bubble tree drawing algorithm. In Computer Vision and Graphics (pp. 633-641). Springer Netherlands.

Parameters:
  • G (networkx.Graph) – Tree
  • xmin (float, optional) – Minimum x-coordinate of the bounding box
  • xmax (float, optional) – Maximum x-coordinate of the bounding box
  • ymin (float, optional) – Minimum y-coordinate of the bounding box
  • ymax (float, optional) – Maximum y-coordinate of the bounding box
Returns:

Dictionary mapping nodes to 2D coordinates. pos[node_name] -> (x,y)

Return type:

dict

ddot.utils.color_gradient(ratio, min_col='#FFFFFF', max_col='#D65F5F', output_hex=True)[source]

Calculate a proportional mix between two colors.

ddot.utils.create_edgeMatrix(X, X_cols, X_rows, verbose=True, G=None, ndex2=True)[source]

Converts an NumPy array into a NdexGraph with a special CX aspect called “edge_matrix”. The array is serialized using base64 encoding.

Parameters:
  • X (np.ndarray) –
  • X_cols (list) – Column names
  • X_rows (list) – Row names
Returns:

Return type:

ndex.networkn.NdexGraph

ddot.utils.expand_seed(seed, sim, sim_names, agg='mean', min_sim=-inf, filter_perc=None, seed_perc=None, agg_perc=0.5, expand_size=None, include_seed=True, figure=False, verbose=False)[source]

Identify genes that are most similar to a seed set of genes.

A gene is included in the expanded set only if it meets all of the specified criteria. If include_seed is True, then genes that are seeds will be included regardless of the criteria. At the same time, the number of genes returned is still limited by expand_size. One way to get n novel genes returned is therefore to set expand_size = n + |seed| and include_seed = True, and then to remove the seed list from expand.

Parameters:
  • seed (list) –
  • sim (np.ndarray) –
  • sim_names (list of str) –
  • agg (str or function) – Aggregation method. Possible values are mean, min, max, perc.
  • min_sim (float) – Minimum similarity to the seed set.
  • filter_perc (float) – Filter based on a percentile of similarities between all genes and the seed set.
  • seed_perc (float) – Filter based on a percentile of similarities between seed set to itself.
  • agg_perc (float) – The <agg_perc> percentile of similarities to the seed set. For example, if a gene has similarities of (0, 0.2, 0.4, 0.6, 0.8) to five seed genes, then the 10% similarity is 0.2
  • expand_size (int) – Maximum limit on the number of returned genes.
  • include_seed (bool) – Include the seed genes even if they didn’t meet the criteria.
  • figure (bool) – Generate a figure showing the average distances within the seed an d the average distances between seed and the background.
Returns:

  • expand – The list of expanded genes passing all filters.
  • expand_idx – Indices of the ranking. I.e. expand_idx[0] is the index of the top gene, so you can get the name of the top gene with sim_names[expand_idx[0]] where sim_names is the input parameter.
  • sim_2_seed – The returned array sim_2_seed is the calculated similarities of the genes to the seed set. So sim_2_seed[0] is the similarity of the gene
  • fig – The generated figure. Can be saved like this: plt.savefig(‘foo.pdf’)

ddot.utils.gridify(parents, pos, G)[source]

Relayout leaf nodes into a grid.

Nodes must be connected and already laid out in “star”-like topologies. In each “star”, a set of nodes are positioned to form the shape of a circle and connect to a common parent node that is positioned at the circle’s center.

This function repositions the nodes in each start into a square grid that inscribes the circle.

Parameters:
  • parents (list) – For each parent, its children will be arranged in a grid.
  • pos (dict) – Dictionary that maps names of nodes to their (x,y) coordinates
  • G (nx.Graph) – Network
Returns:

Modifies <pos> inplace

Return type:

None

ddot.utils.ig_edges_to_pandas(G, attr_list=None)[source]

Create pandas.DataFrame of edge attributes of a igraph Graph object.

Parameters:
  • G (igraph.Graph) –
  • attr_list (list, optional) – Names of edge attributes. Default: all edge attributes
Returns:

DataFrame where index is a MultIndex with two levels (u,v) referring to edges and the columns refer to edge attributes.

Return type:

pandas.DataFrame

ddot.utils.ig_nodes_to_pandas(G, attr_list=None)[source]

Create pandas.DataFrame of node attributes of a igraph.Graph object.

Parameters:
  • G (igraph.Graph) –
  • attr_list (list, optional) – Names of node attributes. Default: all node attributes
Returns:

DataFrame where index is the names of nodes and the columns are node attributes.

Return type:

pandas.DataFrame

ddot.utils.ig_unfold_tree_with_attr(g, sources, mode)[source]

Call igraph.Graph.unfold_tree while preserving vertex and edge attributes.

ddot.utils.invert_dict(dic, sort=True, keymap={}, valmap={})[source]

Inverts a dictionary of the form key1 : [val1, val2] key2 : [val1]

to a dictionary of the form

val1 : [key1, key2] val2 : [key2]

Parameters:dic (dict) –
Returns:
Return type:dict
ddot.utils.load_edgeMatrix(ndex_uuid, ndex_server, ndex_user, ndex_pass, ndex=None, json=None, verbose=True)[source]

Loads a NumPy array from a NdexGraph with a special CX aspect called “edge_matrix”.

Parameters:
  • ndex_uuid (str) – NDEx UUID of ontology
  • ndex_server (str) – URL of NDEx server
  • ndex_user (str) – NDEx username
  • ndex_pass (str) – NDEx password
  • json (module) – JSON module with “loads” function. Default: the simplejson package (must be installed)
Returns:

  • X (np.ndarray)
  • X_cols (list) – Column names
  • X_rows (list) – Row names

ddot.utils.make_index(it)[source]

Create a dictionary mapping elements of an iterable to the index position of that element

ddot.utils.make_seed_ontology(sim, sim_names, expand_kwargs={}, build_kwargs={}, align_kwargs={}, ndex_kwargs={}, node_attr=None, verbose=False, ndex=True)[source]

Assembles and analyzes a data-driven ontology to study a process or disease

Parameters:
  • sim (np.ndarray) – gene-by-gene similarity array
  • sim_names (array-like) – Names of genes as they appear in the rows and columns of <sim>
  • expand_kwargs (dict) – Parameters for ddot.expand_seed() to identify an expanded set of genes
  • build_kwargs (dict) – Parameters for Ontology.build_from_network(…) to build a data-driven ontology.
  • align_kwargs (dict) – Parameters for Ontology.align() to align against a reference ontology.
  • ndex_kwargs (dict) – Parameters for Ontology.to_ndex() to upload ontology to NDEx.
  • node_attr (pd.DataFrame) – A DataFrame of node attributes to assign to the ontology.
  • ndex (bool) – If True, then upload ontology to NDEx using parameters <ndex_kwargs>
ddot.utils.melt_square(df, columns=['Gene1', 'Gene2'], similarity='similarity', empty_value=0, upper_triangle=True)[source]

Melts square dataframe into sparse representation.

Parameters:
  • df (pandas.DataFrame) – Square-shaped dataframe where df[i,j] is the value of edge (i,j)
  • columns (iterable) – Column names for nodes in the output dataframe
  • similarity (string) – Column for edge value in the output dataframe
  • empty_value – Not yet supported
  • upper_triangle (bool) – Only use the values in the upper-right triangle (including the diagonal) of the input square dataframe
Returns:

3-column dataframe that provides a sparse representation of the edges. Two of the columns indicate the node name, and the third column indicates the edge value

Return type:

pandas.DataFrame

ddot.utils.ndex_to_sim_matrix(ndex_url, ndex_server=None, ndex_user=None, ndex_pass=None, similarity=None, input_fmt='cx_matrix', output_fmt='matrix', subset=None, verbose=True)[source]

Read a similarity network from NDEx and return it as either a square np.array (compact representation) or a pandas.DataFrame of the non-zero similarity values (sparse representation)

Parameters:
  • ndex_url (str) – NDEx URL (or UUID) of ontology
  • ndex_server (str) – URL of NDEx server
  • ndex_user (str) – NDEx username
  • ndex_pass (str) – NDEx password
  • similarity (str) – Name of the edge attribute that represents the similarity/weight between two nodes. If None, then the name of the edge attribute in the output is named ‘similarity’ and all edges are assumed to have a similarity value of 1.
  • input_fmt (str) –
  • output_fmt (str) – If ‘matrix’, return a NumPy array. If ‘sparse’, return a pandas.DataFrame
  • subset (optional) –
Returns:

Return type:

np.ndarray or pandas.DataFrame

ddot.utils.nx_edges_to_pandas(G, attr_list=None)[source]

Create pandas.DataFrame of edge attributes of a NetworkX graph.

Parameters:
  • G (networkx.Graph) –
  • attr_list (list, optional) – Names of edge attributes. Default: all edge attributes
Returns:

DataFrame where index is a MultIndex with two levels (u,v) referring to edges and the columns refer to edge attributes. For multi(di)graphs, the MultiIndex have three levels of the form (u, v, key).

Return type:

pandas.DataFrame

ddot.utils.nx_nodes_to_pandas(G, attr_list=None)[source]

Create pandas.DataFrame of node attributes of a NetworkX graph.

Parameters:
  • G (networkx.Graph) –
  • attr_list (list, optional) – Names of node attributes. Default: all node attributes
Returns:

DataFrame where index is the names of nodes and the columns are node attributes.

Return type:

pandas.DataFrame

ddot.utils.nx_to_NdexGraph(G_nx, discard_null=True)[source]

Converts a NetworkX into a NdexGraph object.

Parameters:G_nx (networkx.Graph) –
Returns:
Return type:ndex.networkn.NdexGraph
ddot.utils.parse_ndex_uuid(ndex_url)[source]

Extracts the NDEx UUID from a URL

Parameters:ndex_url (str) – URL for a network stored on NDEx
Returns:UUID of the network
Return type:str
ddot.utils.pivot_square(df, index, columns, values, fill_value=0)[source]

Convert a dataframe into a square compact representation.

Parameters:df (pandas.DataFrame) – DataFrame in long-format where every row represents one gene pair
Returns:df – DataFrame with gene-by-gene dimensions
Return type:pandas.DataFrame
ddot.utils.set_edge_attributes_from_pandas(G, edge_attr)[source]

Modify edge attributes according to a pandas.DataFrame.

Parameters:
  • G (networkx.Graph) –
  • edge_attr (pandas.DataFrame) –
ddot.utils.set_node_attributes_from_pandas(G, node_attr)[source]

Modify node attributes according to a pandas.DataFrame.

Parameters:
  • G (networkx.Graph) –
  • node_attr (pandas.DataFrame) –
ddot.utils.sim_matrix_to_NdexGraph(sim, names, similarity, output_fmt, node_attr=None)[source]

Convert similarity matrix into NdexGraph object

Parameters:
  • sim (np.ndarray) – Square-shaped NumPy array representing similarities
  • names (list) – Genes names, in the same order as the rows and columns of sim
  • similarity (str) – Edge attribute name for similarities in the resulting NdexGraph object
  • output_fmt (str) – Either ‘cx’ (Standard CX format), or ‘cx_matrix’ (custom edgeMatrix aspect)
  • node_attr (pandas.DataFrame, optional) – Node attributes, as a pandas.DataFrame, to be set in NdexGraph object
Returns:

Return type:

ndex.networkn.NdexGraph

ddot.utils.transform_pos(pos, xmin=-250, xmax=250, ymin=-250, ymax=250)[source]

Transforms coordinates to fit a bounding box.

Parameters:
  • pos (dict) – Dictionary mapping node names to (x,y) coordinates
  • xmin (float, optional) – Minimum x-coordinate of the bounding box
  • xmax (float, optional) – Maximum x-coordinate of the bounding box
  • ymin (float, optional) – Minimum y-coordinate of the bounding box
  • ymax (float, optional) – Maximum y-coordinate of the bounding box
Returns:

New dictionary with transformed coordinates

Return type:

dict

ddot.utils.update_nx_with_alignment(G, alignment, term_descriptions=None, use_node_name=True)[source]

Add node attributes to a NetworkX graph.

Parameters:
  • G – NetworkX object
  • alignment – pandas.DataFrame where the index is the name of terms, and where there are 3 columns: ‘Term’, ‘Similarity’, ‘FDR’
  • use_node_name (bool) –
  • term_descriptions (dict) –
Returns:

Return type:

None