{"title": "Supervised Bipartite Graph Inference", "book": "Advances in Neural Information Processing Systems", "page_first": 1841, "page_last": 1848, "abstract": "We formulate the problem of bipartite graph inference as a supervised learning problem, and propose a new method to solve it from the viewpoint of distance metric learning. The method involves the learning of two mappings of the heterogeneous objects to a unified Euclidean space representing the network topology of the bipartite graph, where the graph is easy to infer. The algorithm can be formulated as an optimization problem in a reproducing kernel Hilbert space. We report encouraging results on the problem of compound-protein interaction network reconstruction from chemical structure data and genomic sequence data.", "full_text": "Supervised Bipartite Graph Inference\n\nYoshihiro Yamanishi\nMines ParisTech CBIO\n\nInstitut Curie, INSERM U900,\n\n35 rue Saint-Honore, Fontainebleau, F-77300 France\n\nyoshihiro.yamanishi@ensmp.fr\n\nAbstract\n\nWe formulate the problem of bipartite graph inference as a supervised learning\nproblem, and propose a new method to solve it from the viewpoint of distance\nmetric learning. The method involves the learning of two mappings of the hetero-\ngeneous objects to a uni\ufb01ed Euclidean space representing the network topology of\nthe bipartite graph, where the graph is easy to infer. The algorithm can be formu-\nlated as an optimization problem in a reproducing kernel Hilbert space. We report\nencouraging results on the problem of compound-protein interaction network re-\nconstruction from chemical structure data and genomic sequence data.\n\n1 Introduction\n\nThe problem of bipartite graph inference is to predict the presence or absence of edges between\nheterogeneous objects known to form the vertices of the bipartite graph, based on the observation\nabout the heterogeneous objects. This problem is becoming a challenging issue in bioinformatics\nand computational biology, because there are many biological networks which are represented by a\nbipartite graph structure with vertices being heterogeneous molecules and edges being interactions\nbetween them. Examples include compound-protein interaction network consisting of interactions\nbetween ligand compounds and target proteins, metabolic network consisting of interactions be-\ntween substrates and enzymes, and host-pathogen protein-protein network consisting of interactions\nbetween host proteins and pathogen proteins.\n\nEspecially, the prediction of compound-protein interaction networks is a key issue toward genomic\ndrug discovery, because drug development depends heavily on the detection of interactions between\nligand compounds and target proteins. The human genome sequencing project has made avail-\nable the sequences of a large number of human proteins, while the high-throughput screening of\nlarge-scale chemical compound libraries is enabling us to explore the chemical space of possible\ncompounds[1]. However, our knowledge about the such compound-protein interactions is very lim-\nited. It is therefore important is to detect unknown compound-protein interactions in order to identify\npotentially useful compounds such as imaging probes and drug leads from huge amount of chemical\nand genomic data.\n\nA major traditional method for predicting compound-protein interactions is docking simulation\n[2]. However, docking simulation requires 3D structure information for the target proteins. Most\npharmaceutically useful target proteins are membrane proteins such as ion channels and G protein-\ncoupled receptors (GPCRs). It is still extremely dif\ufb01cult and expensive to determine the 3D struc-\ntures of membrane proteins, which limits the use of docking. There is therefore a strong incentive to\ndevelop new useful prediction methods based on protein sequences, chemical compound structures,\nand the available known compound-protein interaction information simultaneously.\n\nRecently, several supervised methods for inferring a simple graph structure (e.g., protein network,\nenzyme network) have been developed in the framework of kernel methods [3, 4, 5]. The corre-\nsponding algorithms of the previous methods are based on kernel canonical correlation analysis\n\n1\n\n\f[3], distance metric learning [4], and em-algorithm [5], respectively. However, the previous meth-\nods can only predict edges between homogeneous objects such as protein-protein interactions and\nenzyme-enzyme relations, so it is not possible to predict edges between heterogeneous objects such\nas compound-protein interactions and substrate-enzyme interactions, because their frameworks are\nbased only on a simple graph structure with homogeneous vertices. In contrast, in this paper we\naddress the problem of supervised learning of the bipartite graph rather than the simple graph.\n\nIn this contribution, we develop a new supervised method for inferring the bipartite graph, borrowing\nthe idea of distance metric learning used in the framework for inferring the simple graph [4]. The\nproposed method involves the learning of two mappings of the heterogeneous objects to a uni\ufb01ed\nEuclidean space representing the network topology of the bipartite graph, where the graph is easy to\ninfer. The algorithm can be formulated as an optimization problem in a reproducing kernel Hilbert\nspace. To our knowledge, there are no statistical methods to predict bipartite graphs from observed\ndata in a supervised context. In the results, we show the usefulness of the proposed method on the\npredictions of compound-protein interaction network reconstruction from chemical structure data\nand genomic sequence data.\n\nVertex with a\u01a9ribute 1 in known graph \n\nVertex with a\u01a9ribute 2 in known graph \n\nAddi\u019fonal vertex with a\u01a9ribute 1\n\nAddi\u019fonal vertex with a\u01a9ribute 2\n\nKnown edge\n\nPredicted edge\n\nFigure 1: An illustration of the problem of the supervised bipartite graph inference\n\n2 Formalism of the supervised bipartite graph inference problem\n\nLet us formally de\ufb01ne the supervised bipartite graph inference problem. Suppose that we are given\nan undirected bipartite graph G = (U + V, E), where U = (u1, \u00b7 \u00b7 \u00b7 , un1 ) and V = (v1, \u00b7 \u00b7 \u00b7 , vn2 )\nare sets of heterogeneous vertices and E \u2282 (U \u00d7 V ) \u222a (V \u00d7 U ) is a set of edges. Note that the\nattribute of U is completely different from that of V . The problem is, given additional sets of vertices\nm2), to infer a set of new edges E \u2032 \u2282 U \u2032 \u00d7 (V + V \u2032) \u222a\nU \u2032 = (u\u2032\nV \u2032 \u00d7 (U + U \u2032) \u222a (U + U \u2032) \u00d7 V \u2032 \u222a (V + V \u2032) \u00d7 U \u2032 involving the additional vertices in U \u2032 and V \u2032.\nFigure 1 shows an illustration of this problem.\n\nm1) and V \u2032 = (v\u2032\n\n1, \u00b7 \u00b7 \u00b7 , u\u2032\n\n1, \u00b7 \u00b7 \u00b7 , v\u2032\n\nThe prediction of compound-protein interaction networks is a typical problem which is suitable\nin this framework from a practical viewpoint. In this case, U corresponds to a set of compounds\n(known ligands), V corresponds to a set of proteins (known targets), and E corresponds to a set of\nknown compound-protein interactions (known ligand-target interactions). U \u2032 corresponds to a set of\nadditional compounds (new ligand candidates), V \u2032 corresponds to a set of additional proteins (new\ntarget candidates), and E \u2032 corresponds to a set of unknown compound-protein interactions (potential\nligand-target interactions).\n\n1, \u00b7 \u00b7 \u00b7 , y\u2032\n\n1, \u00b7 \u00b7 \u00b7 , v\u2032\n\nThe prediction is performed based on available observations about the vertices. Sets of vertices\nm2) are repre-\nU = (u1, \u00b7 \u00b7 \u00b7 , un1 ), V = (v1, \u00b7 \u00b7 \u00b7 , vn2 ), U \u2032 = (u\u2032\nsented by sets of observed data X = (x1, \u00b7 \u00b7 \u00b7 , xn1 ), Y = (y1, \u00b7 \u00b7 \u00b7 , yn2 ), X \u2032 = (x\u2032\nm1) and\nm2 ), respectively. For example, compounds are represented by molecular structures\nY \u2032 = (y\u2032\nand proteins are represented by amino acid sequences. The question is how to predict unknown\ncompound-protein interactions from compound structures and protein sequences using prior knowl-\nedge about known compound-protein interactions. Sets of U and V (X and Y) are referred to as\ntraining sets, and heterogeneous objects are represented by u and v in the sense of vertices on the\nbipartite graph or by x and y in the sense of objects in the observed data below.\n\nm1) and V \u2032 = (v\u2032\n\n1, \u00b7 \u00b7 \u00b7 , u\u2032\n\n1, \u00b7 \u00b7 \u00b7 , x\u2032\n\nIn order to deal with the data heterogeneity and take advantage of recent works on kernel similarity\nfunctions on general data structures [6], we will assume that X is a set endowed with a positive def-\ni,j=1 aiajku(xi, xj) \u2265 0\n\ninite kernel ku, that is, a symmetric function ku : X 2 \u2192 R satisfying Pn1\n\n2\n\n\ffor any n1 \u2208 N, (a1, a2, \u00b7 \u00b7 \u00b7 , an1 ) \u2208 Rn1 and (x1, x2, \u00b7 \u00b7 \u00b7 , xn1 ) \u2208 X . Similarly, we will as-\nsume that Y is a set endowed with a positive de\ufb01nite kernel kv, that is, a symmetric function\ni,j=1 aiajkv(yi, yj) \u2265 0 for any n2 \u2208 N, (a1, a2, \u00b7 \u00b7 \u00b7 , an2 ) \u2208 Rn2\n\nkv : Y 2 \u2192 R satisfying Pn2\n\nand (y1, y2, \u00b7 \u00b7 \u00b7 , yn2 ) \u2208 Y.\n\n3 Distance metric learning (DML) for the bipartite graph inference\n\n3.1 Euclidean embedding and distance metric learning (DML)\n\nSuppose that a bipartite graph must be reconstructed from the similarity information about n1 objects\n(x1, \u00b7 \u00b7 \u00b7 , xn1 ) in X (observed data for U) and n2 objects (y1, \u00b7 \u00b7 \u00b7 , yn2 ) in Y (observed data for V ).\nOne dif\ufb01culty is that the attribute of observed data differs between X and Y in nature, so it is\nnot possible to evaluate the link between (x1, \u00b7 \u00b7 \u00b7 , xn1 ) and (y1, \u00b7 \u00b7 \u00b7 , yn2 ) from the observed data\ndirectly. For example, in the case of compounds and proteins, each x has a chemical graph structure\nand each y has a sequence structure, so the data structures completely differ between x and y.\nTherefore, we make an assumption that n1 objects (x1, \u00b7 \u00b7 \u00b7 , xn1 ) and n2 objects (y1, \u00b7 \u00b7 \u00b7 , yn2 ) are\nimplicitly embedded in a uni\ufb01ed Euclidean space Rd, and a graph is inferred on those heterogeneous\npoints by the nearest neighbor approach, i.e., putting an edge between heterogeneous points that are\nclose to each other.\n\nWe propose the following two step procedure for the supervised bipartite graph inference:\n\n1. embed the heterogeneous objects into a uni\ufb01ed Euclidean space representing the network\ntopology of the bipartite graph, where connecting heterogeneous vertices are close to each\nother, through mappings f : X \u2192 Rd and g : Y \u2192 Rd\n\n2. apply the mappings f and g to X \u2032 and Y \u2032 respectively, and predict new edges between\nthe heterogeneous objects if the distance between the points {f (x), x \u2208 X \u222a X \u2032} and\n{g(y), y \u2208 Y \u222a Y \u2032} is smaller than a \ufb01xed threshold \u03b4.\n\nWhile the second step in this procedure is \ufb01xed, the \ufb01rst step can be optimized by supervised learning\nof f and g using the known bipartite graph. To do so, we require the mappings f and g to map\nadjacent heterogeneous vertices in the known bipartite graph onto nearby positions in a uni\ufb01ed\nEuclidian space Rd, in order to ensure that the known bipartite graph can be recovered to some\nextent by the nearest neighbor approach.\n\nGiven functions f : X \u2192 R and g : Y \u2192 R, a possible criterion to assess whether connected (resp.\ndisconnected) heterogeneous vertices are mapped onto similar (resp. dissimilar) points in R is the\nfollowing:\n\nR(f, g) = P(ui,vj )\u2208E(f (xi) \u2212 g(yj))2 \u2212P(ui,vj ) /\u2208E(f (xi) \u2212 g(yj))2\n\n.\n\n(1)\n\nP(ui,vj )\u2208U \u00d7V (f (xi) \u2212 g(yj))2\n\nA small value of R(f, g) ensures that connected heterogeneous vertices tend to be closer than dis-\nconnected heterogeneous vertices in the sense of quadratic error.\n\nTo represent the connectivity between heterogeneous vertices on the bipartite graph G = (U +V, E),\nwe de\ufb01ne a kind of the adjacency matrix Auv, where element (Auv)ij is equal to 1 (resp. 0) if\nvertices ui and vj are connected (resp. disconnected). Note that the size of the matrix Auv is\nn1 \u00d7 n2. We also de\ufb01ne a kind of the degree matrix of the heterogeneous vertices as Du and Dv,\nwhere diagonal elements (Du)ii and (Dv)jj are the degrees of vertices ui and vj (the numbers of\nedges involving vertices ui and vj), respectively. Note that all non-diagonal elements in Du and Dv\nare zero, and the sizes of the matrices are n1 \u00d7 n1 and n2 \u00d7 n2, respectively.\nLet us denote by fU = (f (x1), \u00b7 \u00b7 \u00b7 , f (xn1 ))T \u2208 Rn1 and gV = (g(y1), \u00b7 \u00b7 \u00b7 , g(yn2 ))T \u2208 Rn2\nthe values taken by f and g on the training set. If we restrict fU and fV to have zero means as\n\ni=1 g(yi) = 0, then the criterion (1) can be rewritten as follows:\n\ni=1 f (xi) = 0 and Pn2\nPn1\n\nR(f, g) = 4\n\ngV \u00b6T \u00b5 Du\n\u00b5 fU\ngV \u00b6T \u00b5 fU\n\u00b5 fU\n\nDv \u00b6\u00b5 fU\ngV \u00b6\n\n\u2212AT\nuv\n\n\u2212Auv\n\ngV \u00b6\n\n\u2212 2\n\n(2)\n\n3\n\n\fTo avoid the over-\ufb01tting problem and obtain meaningful solutions, we propose to regularize the\ncriterion (1) by a smoothness functional on f and g based on a classical approach in statistical\nlearning [7, 8]. We assume that that f and g belong to the reproducing kernel Hilbert space (r.k.h.s.)\nHU and HV de\ufb01ned by the kernels ku on X and kv on Y, and to use the norms of f and g as\nregularization operators. Let us de\ufb01ne by ||f || and ||g|| the norms of f and g in HU and HV . Then,\nthe regularized criterion to be minimized becomes:\n\ngV \u00b6T \u00b5 Du\n\u00b5 fU\n\n\u2212AT\nuv\n\nR(f, g) =\n\n\u2212Auv\n\nDv \u00b6\u00b5 fU\ngV \u00b6T \u00b5 fU\n\u00b5 fU\n\ngV \u00b6 + \u03bb1||f ||2 + \u03bb2||g||2\ngV \u00b6\n\n,\n\n(3)\n\nwhere \u03bb1 and \u03bb2 are regularization parameters which control the trade-off between minimizing the\noriginal criterion (1) and ensuring that the solution has a small norm in the r.k.h.s.\n\nThe criterion is de\ufb01ned up to a scaling of the functions and the solution is therefore a direction in\nthe r.k.h.s. Here we set additional constraints. In this case we impose the norm ||f || = ||g|| = 1,\nwhich corresponds to an orthogonal projection onto the direction selected in the r.k.h.s. Note that\nthe criterion can be used for extracting one-dimentional feature of the objects. In order to obtain a\nd-dimensional feature representation of the objects, we propose to iterate the minimization of the\nregularized criterion (3) under orthogonality constraints in the r.k.h.s., that is, we recursively de\ufb01ne\nthe p-th features fp and gp for p = 1, \u00b7 \u00b7 \u00b7 , d as follows:\n\ngV \u00b6T \u00b5 Du\n\u00b5 fU\n\n\u2212AT\nuv\n\n(fp, gp) = arg min\n\n\u2212Auv\n\nDv \u00b6\u00b5 fU\ngV \u00b6T \u00b5 fU\n\u00b5 fU\n\ngV \u00b6 + \u03bb1||f ||2 + \u03bb2||g||2\ngV \u00b6\n\n(4)\n\nunder the orthogonality constraints: f \u22a5f1, \u00b7 \u00b7 \u00b7 , fp\u22121, and g\u22a5g1, \u00b7 \u00b7 \u00b7 , gp\u22121.\nIn the prediction process, we map any new objects x\u2032 \u2208 X \u2032 and y\u2032 \u2208 Y \u2032 by the mappings f and g\nrespectively, and predict new edges between the heterogeneous objects if the distance between the\npoints {f (x), x \u2208 X \u222a X \u2032} and {g(y), y \u2208 Y \u222a Y \u2032} is smaller than a \ufb01xed threshold \u03b4.\n\n3.2 Algorithm\n\nLet ku and kv be the kernels on the sets X and Y, where the kernels are both centered in HU and\nHV . According to the representer theorem [9] in the r.k.h.s., for any p = 1, \u00b7 \u00b7 \u00b7 , d, the solution to\nequation (4) has the following expansions:\n\nfp(x) =\n\nn1\n\nXj=1\n\n\u03b1p,jku(xj, x),\n\ngp(y) =\n\nn2\n\nXj=1\n\n\u03b2p,jkv(yj, y),\n\n(5)\n\nfor some vector \u03b1p = (\u03b1p,1, \u00b7 \u00b7 \u00b7 , \u03b1p,n1 )T \u2208 Rn1 and \u03b2p = (\u03b2p,1, \u00b7 \u00b7 \u00b7 , \u03b2p,n2 )T \u2208 Rn2.\nLet Ku and Kv be the Gram matrices of the kernels ku and ku such that (Ku)ij = ku(xi, xj), i, j =\n1, \u00b7 \u00b7 \u00b7 , n1 and (Kv)ij = kv(yi, yj), i, j = 1, \u00b7 \u00b7 \u00b7 , n2. The corresponding feature vectors fp,U and\ngp,V can be written as fp,U = Ku\u03b1p and gp,V = Kv\u03b2p, respectively. The squared norms of features\nf and g in HU and HV are equal to ||f ||2 = \u03b1T Ku\u03b1 and ||g||2 = \u03b2T Kv\u03b2, so the normalization\nconstraints for f and g can be written as \u03b1T Ku\u03b1 = \u03b2T Kv\u03b2 = 1. The orthogonarity constraints\nfp\u22a5fq and gp\u22a5gq (p 6= q) can be written by \u03b1T\nUsing the above representations, the minimization problem of R(f, g) is equivalent to \ufb01nding \u03b1 and\n\u03b2 which minimize\n\np Ku\u03b1q = 0 and \u03b2T\n\np Kv\u03b2q = 0.\n\nR(f, g) =\n\n\u03b2 \u00b6T \u00b5 KuDuKu\n\u00b5 \u03b1\nuvKu KvDvKv \u00b6\u00b5 \u03b1\n\u03b2 \u00b6T \u00b5 KuKu\n\u00b5 \u03b1\n\n\u2212KuAuvKv\n\n\u2212KvAT\n\n0\n\n0\n\nKvKv \u00b6\u00b5 \u03b1\n\n\u03b2 \u00b6\n\n\u03b2 \u00b6 + \u03bb1\u03b1T Ku\u03b1 + \u03bb2\u03b2T Kv\u03b2\n\n, (6)\n\n4\n\n\funder the following orthogonality constraints:\n\n\u03b1T Ku\u03b11 = \u00b7 \u00b7 \u00b7 = \u03b1T Ku\u03b1(p\u22121) = 0, \u03b2T Kv\u03b21 = \u00b7 \u00b7 \u00b7 = \u03b2T Kv\u03b2(p\u22121) = 0.\n\nTaking the differential of equation (6) with respect to \u03b1 and \u03b2 and setting to zero, the solution of\nthe \ufb01rst vectors \u03b11 and \u03b21 can be obtained as the eigenvectors associated with the smallest (non-\nnegative) eigenvalue in the following generalized eigenvalue problem:\n\n\u2212KvAT\n\nuvKu\n\n\u03b2 \u00b6 (7)\n\u00b5 KuDuKu + \u03bb1Ku\nSequentially, the solutions of vectors \u03b11, \u00b7 \u00b7 \u00b7 , \u03b1d and \u03b21, \u00b7 \u00b7 \u00b7 , \u03b2d can be obtained as the eigenvectors\nassociated with d smallest (non-negative) eigenvalues in the above generalized eigenvalue problem.\n\nKvDvKv + \u03bb2Kv \u00b6\u00b5 \u03b1\n\n\u03b2 \u00b6 = \u03c1\u00b5 KuKu\n\n\u2212KuAuvKv\n\n0\n\nKvKv \u00b6\u00b5 \u03b1\n\n0\n\n4 Relationship with other methods\n\nThe process of embedding heterogeneous objects into the same space is similar to correspondence\nanalysis (CA) [10] and Co-Occurence Data Embedding (CODE) [11] which are unsupervised meth-\nods to embed the rows and columns of a contingency table (adjacency matrix Auv in this study) into\na low dimensional Euclidean space. However, critical differences with our proposed method are as\nfollows: i) the above methods cannot use observed data (X and Y in this study) about heterogeneous\nnodes for prediction, because the algorithms are based only on co-occurence information (Auv in\nthis study), and ii) we need to de\ufb01ne a new representation of not only the objects in the training set\nbut also additional objects outside of the training set. Therefore, it is not possible to directly apply\nthe above methods to the bipartite graph inference problem.\n\nRecall that the goal of the ordinary CA is to \ufb01nd embedding functions \u03c6 : U \u2192 R and \u03c8 : V \u2192 R\nwhich maximize the following correlation coef\ufb01cient:\n\ncorr(\u03c6, \u03c8) = Pi,j I{(ui, vj) \u2208 E}\u03c6(ui)\u03c8(vj)\npPi dui \u03c6(ui)2qPj dvj \u03c8(vj)2\n\n,\n\n(8)\n\nwhere I{\u00b7} is an indicator function which returns 1 if the argument is true or 0 otherwise, dui (resp.\n\ndvj ) is the degree of node ui (resp. vj), and Pi \u03c6(ui) = 0 (resp. Pj \u03c8(vj) = 0) is assumed [10].\n\nHere we attempt to consider an extension of the CA using the idea of kernel methods so that it can\nwork in the context of the bipartite graph inference problem. The method is referred to as kernel\ncorrespondence analysis (KCA) below.\n\nTo formulate the KCA, we propose to replace the embedding functions \u03c6 : U \u2192 R and \u03c8 : V \u2192 R\nby functions f : X \u2192 R and g : Y \u2192 R, where f and g belong to the r.k.h.s. HU and HV de\ufb01ned\nby the kernels ku on X and kv on Y. Then, we consider maximizing the following regularized\ncorrelation coef\ufb01cient:\n\ncorr(f, g) =\n\nPi,j I{(ui, vj) \u2208 E}f (xi)g(yj)\n\npPi dui f (xi)2 + \u03bb1||f ||2qPj dvj g(yj)2 + \u03bb2||g||2\n\n,\n\n(9)\n\nwhere \u03bb1 and \u03bb2 are regularization parameters which control the trade-off between maximizing\nthe original correlation coef\ufb01cient between two features and ensuring that the solution has a small\nnorm in the r.k.h.s.\nIn order to obtain a d-dimensional feature representation and deal with the\nscale issue, we propose to iterate the maximization of the regularized correlation coef\ufb01cient (9)\nunder orthogonality constraints in the r.k.h.s., that is, we recursively de\ufb01ne the p-th features fp\nand gp for p = 1, \u00b7 \u00b7 \u00b7 , d as (fp, gp) = arg max corr(f, g) under the orthogonality constraints:\nf \u22a5f1, \u00b7 \u00b7 \u00b7 , fp\u22121 and g\u22a5g1, \u00b7 \u00b7 \u00b7 , gp\u22121 and the normalization constraints: ||f || = ||g|| = 1.\nUsing the function expansions in equation (5) and related matrix representations de\ufb01ned in the pre-\nvious section, the maximization problem of the regularized correlation coef\ufb01cient in equation (9) is\nequivalent to \ufb01nding \u03b1 and \u03b2 which maximize\n\ncorr(f, g) =\n\n\u03b1T KuAuvKv\u03b2\n\np\u03b1T KuDuKu\u03b1 + \u03bb1\u03b1T Ku\u03b1q\u03b2T KvDvKv\u03b2 + \u03bb2\u03b2T Kv\u03b2\n\n.\n\n(10)\n\n5\n\n\fTaking the differential of equation (10) with respect to \u03b1 and \u03b2 and setting to zero, the solution of\nthe \ufb01rst vectors \u03b11 and \u03b21 can be obtained as the eigenvectors associated with the largest eigenvalue\nin the following generalized eigenvalue problem:\n\nKuAuvKv\n\n0\n\n\u00b6\u00b5 \u03b1\n\n0\n\n0\nKvAT\nuvKu\n\n\u03b2 \u00b6 .\nKvDvKv + \u03bb2Kv \u00b6\u00b5 \u03b1\n\u00b5\n(11)\nSequentially, the solutions of vectors \u03b11, \u00b7 \u00b7 \u00b7 , \u03b1d and \u03b21, \u00b7 \u00b7 \u00b7 , \u03b2d can be obtained as the eigenvectors\nassociated with d largest eigenvalues in the above generalized eigenvalue problem.\n\n\u03b2 \u00b6 = \u03c1\u00b5 KuDuKu + \u03bb1Ku\n\n0\n\nThe \ufb01nal form of KCA is similar to that of kernel canonical correlation analysis (KCCA) [12, 13],\nso KCA can be regarded as a variant of KCCA. However, the critical differences between KCA and\nKCCA are as follows: i) the objects are the same across two different data in KCCA, while the\nobjects are different across two different data in KCA, and ii) KCCA cannot deal with co-occurence\ninformation about the objects. In the experiment below, we are interested in the performance com-\nparison between the distance learning in DML and correlation maximization in KCA. A similar\nextension might be possible for CODE as well, but it is out of scope in this paper.\n\n5 Experiment\n\n5.1 Data\n\nIn this study we focus on compound-protein interaction networks made by four pharmaceutically\nuseful protein classes: enzymes, ion channels, G protein-coupled receptors (GPCRs), and nuclear\nreceptors. The information about compound-protein interactions were obtained from the KEGG\nBRITE [14], SuperTarget [15] and DrugBank databases [16]. The number of known interactions\ninvolving enzymes, ion channels, GPCRs, and nuclear receptors is 5449, 3020, 1124, and 199, re-\nspectively. The number of proteins involving the interactions is 1062, 242, 134, and 38, respectively,\nand the number of compounds involving the interactions is 1123, 475, 417, and 115, respectively.\nThe compound set includes not only drugs but also experimentally con\ufb01rmed ligand compounds.\nThese data are regarded as gold standard sets to evaluate the prediction performance below.\n\nChemical structures of the compounds and amino acid sequences of the human proteins were ob-\ntained from the KEGG database [14]. We computed the kernel similarity value of chemical struc-\ntures between compounds using the SIMCOMP algorithm [17], where the kernel similarity value\nbetween two compounds is computed by Tanimoto coef\ufb01cient de\ufb01ned as the ratio of common sub-\nstructures between two compounds based on a graph alignment algorithm. We computed the se-\nquence similarities between the proteins using Smith-Waterman scores based on the local alignment\nbetween two amino acid sequences [18]. In this study we used the above similarity measures as\nkernel functions, but the Smith-Waterman scores are not always positive de\ufb01nite, so we added an\nappropriate identify matrix such that the corresponding kernel Gram matrix is positive de\ufb01nite,\nwhich is related with [19]. All the kernel matrices are normalized such that all diagonals are ones.\n\n5.2 Performance evaluation\n\nAs a baseline method, we used the nearest neighbor (NN) method, because this idea has been used in\ntraditional molecular screening in many public databases. Given a new ligand candidate compound,\nwe \ufb01nd a known ligand compound (in the training set) sharing the highest structure similarity with\nthe new compound, and predict the new compound to interact with proteins known to interact with\nthe nearest ligand compound. Likewise, given a new target candidate protein, we \ufb01nd a known target\nprotein (in the training set) sharing the highest sequence similarity with the new protein, and predict\nthe new protein to interact with ligand compounds known to interact with the nearest target protein.\nNewly predicted compound-protein interaction pairs are assigned prediction scores with the highest\nstructure or sequence similarity values involving new compounds or new proteins in order to draw\nthe ROC curve below.\n\nWe tested the three different methods: NN, KCA, and DML on their abilities to reconstruct the four\ncompound-protein interaction networks. We performed the following 5-fold cross-validation proce-\ndure: the gold standard set was split into 5 subsets of roughly equal size by compounds and proteins,\neach subset was then taken in turn as a test set, and the training is performed on the remaining 4\n\n6\n\n\fTable 1: AUC (ROC scores) for each interaction class, where \u201dtrain c.\u201d, \u201dtrain p.\u201d, \u201dtest c.\u201d, and \u201dtest\np.\u201d indicates training compounds, training proteins, test compounds and test proteins, respectively.\n\nData\n\nPrediction class\n\nEnzyme\n\nIon\nchannel\n\nGPCR\n\nNuclear\nreceptor\n\ni) test c. vs train p.\nii) train c. vs test p.\niii) test c. vs test p.\niv) all c. vs all p.\ni) test c. vs train p.\nii) train c. vs test p.\niii) test c. vs test p.\niv) all c. vs all p.\ni) test c. vs train p.\nii) train c. vs test p.\niii) test c. vs test p.\niv) all c. vs all p.\ni) test c. vs train p.\nii) train c. vs test p.\niii) test c. vs test p.\niv) all c. vs all p.\n\nNearest neighbor Kernel correspondence Distance metric\nlearning (DML)\n0.843 \u00b1 0.006\n0.878 \u00b1 0.003\n0.782 \u00b1 0.013\n0.852 \u00b1 0.020\n0.800 \u00b1 0.004\n0.945 \u00b1 0.002\n0.771 \u00b1 0.008\n0.864 \u00b1 0.002\n0.882 \u00b1 0.005\n0.936 \u00b1 0.004\n0.864 \u00b1 0.013\n0.904 \u00b1 0.003\n0.832 \u00b1 0.013\n0.812 \u00b1 0.036\n0.747 \u00b1 0.049\n0.815 \u00b1 0.024\n\nanalysis (KCA)\n0.741 \u00b1 0.011\n0.839 \u00b1 0.009\n0.692 \u00b1 0.008\n0.778 \u00b1 0.008\n0.768 \u00b1 0.008\n0.927 \u00b1 0.004\n0.748 \u00b1 0.004\n0.838 \u00b1 0.005\n0.848 \u00b1 0.002\n0.895 \u00b1 0.025\n0.823 \u00b1 0.038\n0.866 \u00b1 0.015\n0.808 \u00b1 0.018\n0.784 \u00b1 0.012\n0.670 \u00b1 0.053\n0.784 \u00b1 0.011\n\n(NN)\n\n0.655 \u00b1 0.011\n0.758 \u00b1 0.008\n0.500 \u00b1 0.000\n0.684 \u00b1 0.006\n0.712 \u00b1 0.004\n0.896 \u00b1 0.008\n0.500 \u00b1 0.000\n0.770 \u00b1 0.004\n0.714 \u00b1 0.005\n0.781 \u00b1 0.026\n0.500 \u00b1 0.000\n0.720 \u00b1 0.013\n0.715 \u00b1 0.009\n0.683 \u00b1 0.010\n0.500 \u00b1 0.000\n0.675 \u00b1 0.004\n\nsets. We draw a receiver operating curve (ROC), the plot of true positives as a function of false\npositives based on various thresholds \u03b4, where true positives are correctly predicted interactions and\nfalse positives are predicted interactions that are not present in the gold standard interactions. The\nperformance was evaluated by AUC (area under the ROC curve) score. The regularization parameter\n\u03bb and the number of features d are optimized by applying the internal cross-validation within the\ntraining set with the AUC score as a target criterion in the case of KCA and DML. To obtain robust\nresults, we repeated the above cross-validation experiment \ufb01ve times, and computed the average and\nstandard deviation of the resulting AUC scores.\n\nTable 1 shows the resulting AUC scores for different sets of predictions depending on whether the\ncompound and/or the protein were in the initial training set or not. Compounds and proteins in the\ntraining set are called training compounds and proteins whereas those not in the training set are\ncalled test compounds and proteins. Four different classes are then possible: i) test compounds vs\ntraining proteins, ii) training compounds vs test proteins, iii) test compounds vs test proteins, and\niv) all the possible predictions (the average of the above three parts). Comparing the three different\nmethods, DML seems to have the best performance for all four types of compound-protein inter-\naction networks, and outperform the other methods KCA and NN at a signi\ufb01cant level. The worst\nperformance of NN implies that raw compound structure or protein sequence similarities do not\nalways re\ufb02ect the tendency of interaction partners in true compound-protein interaction networks.\nAmong the four prediction classes, predictions where neither the protein nor the compound are in\nthe training set (iii) are weakest, but even then reliable predictions are possible in DML. Note that\nthe NN method cannot predict iii) test vs test interaction, because it depends on the template infor-\nmation about known ligand compounds and known target proteins. These results suggest that the\nfeature space learned by DML successfully represents the network topology of the bipartite graph\nstructure of compound-protein networks, and the correlation maximization learning used in KCA is\nnot enough to re\ufb02ect the network topology of the bipartite graph.\n\n6 Concluding remarks\n\nIn this paper, we developed a new supervised method to infer the bipartite graph from the viewpoint\nof distance metric learning (DML). The originality of the proposed method lies in the embedding of\nheterogeneous objects forming vertices on the bipartite graph into a uni\ufb01ed Euclidian space and in\nthe learning of the distance between heterogeneous objects with different data structures in the uni-\n\ufb01ed feature space. We also discussed the relationship with correspondence analysis (CA) and kernel\ncanonical correlation analysis (KCCA). In the experiment, it is shown that the proposed method\nDML outperforms the other methods on the problem of compound-protein interaction network re-\nconstruction from chemical structure and genomic sequence data. From a practical viewpoint, the\n\n7\n\n\fproposed method is useful for virtual screening of a huge number of ligand candidate compounds\nbeing generated with various biological assays and target candidate proteins toward genomic drug\ndiscovery. It should be also pointed out that the proposed method can be applied to other network\nprediction problems such as metabolic network reconstruction, host-pathogen protein-protein inter-\naction prediction, and customer-product recommendation system as soon as they are represented by\nbipartite graphs.\n\nReferences\n\n[1] C.M. Dobson. Chemical space and biology. Nature, 432:824\u2013828, 2004.\n[2] M. Rarey, B. Kramer, T. Lengauer, and G. Klebe. A fast \ufb02exible docking method using an incremental\n\nconstruction algorithm. J Mol Biol, 261:470\u2013489, 1996.\n\n[3] Y. Yamanishi, J.P. Vert, and M. Kanehisa. Protein network inference from multiple genomic data: a\n\nsupervised approach. Bioinformatics, 20 Suppl 1:i363\u2013370, 2004.\n\n[4] J.-P. Vert and Y. Yamanishi. Supervised graph inference. Advances in Neural Information and Processing\n\nSystem, pages 1433\u20131440, 2005.\n\n[5] T. Kato, K. Tsuda, and K. Asai. Selective integration of multiple biological data for supervised network\n\ninference. Bioinformatics, 21:2488\u20132495, 2005.\n\n[6] B. Sch\u00a8olkopf, K. Tsuda, and J.P. Vert. Kernel Methods in Computational Biology. MIT Press, 2004.\n[7] G. Wahba. Splines Models for Observational Data: Series in Applied Mathematics. SIAM, Philadelphia,\n\n1990.\n\n[8] F. Girosi, M. Jones, and T. Poggio. Regularization theory and neural networks architectures. Neural\n\nComputation, 7:219\u2013269, 1995.\n\n[9] J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Camb. Univ. Press, 2004.\n[10] M.J. Greenacre. Theory and applications of correspondence analysis. Academic Press, 1984.\n[11] A. Globerson, G. Chechik, F. Pereira, and N. Tishby. Euclidean embedding of co-occurrence data. Ad-\n\nvances in Neural Information and Processing System, pages 497\u2013504, 2005.\n\n[12] S. Akaho. A kernel method for canonical correlation analysis. International. Meeting on Psychometric\n\nSociety (IMPS2001), 2001.\n\n[13] F.R. Bach and M.I. Jordan. Kernel independent component analysis. Journal of Machine Learning\n\nResearch, 3:1\u201348, 2002.\n\n[14] M. Kanehisa, S. Goto, M. Hattori, K.F. Aoki-Kinoshita, M. Itoh, S. Kawashima, T. Katayama, M. Araki,\nand M. Hirakawa. From genomics to chemical genomics: new developments in kegg. Nucleic Acids Res.,\n34(Database issue):D354\u2013357, Jan 2006.\n\n[15] S. Gunther, S. Guenther, M. Kuhn, M. Dunkel, and et al. Supertarget and matador: resources for exploring\n\ndrug-target relationships. Nucleic Acids Res, 2007.\n\n[16] D.S. Wishart, C. Knox, A.C. Guo, D. Cheng, S. Shrivastava, D. Tzur, B. Gautam, and M. Hassanali.\n\nDrugbank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res, 2007.\n\n[17] M. Hattori, Y. Okuno, S. Goto, and M. Kanehisa. Development of a chemical structure comparison\nmethod for integrated analysis of chemical and genomic information in the metabolic pathways. J. Am.\nChem. Soc., 125:11853\u201311865, 2003.\n\n[18] T.F. Smith and M.S. Waterman. Identi\ufb01cation of common molecular subsequences. J Mol Biol, 147:195\u2013\n\n197, 1981.\n\n[19] H. Saigo, J.P. Vert, N. Ueda, and T. Akutsu. Protein homology detection using string alignment kernels.\n\nBioinformatics, 20:1682\u20131689, 2004.\n\n8\n\n\f", "award": [], "sourceid": 285, "authors": [{"given_name": "Yoshihiro", "family_name": "Yamanishi", "institution": null}]}