{"title": "Integrating Locally Learned Causal Structures with Overlapping Variables", "book": "Advances in Neural Information Processing Systems", "page_first": 1665, "page_last": 1672, "abstract": "In many domains, data are distributed among datasets that share only some variables; other recorded variables may occur in only one dataset. There are several asymptotically correct, informative algorithms that search for causal information given a single dataset, even with missing values and hidden variables. There are, however, no such reliable procedures for distributed data with overlapping variables, and only a single heuristic procedure (Structural EM). This paper describes an asymptotically correct procedure, ION, that provides all the information about structure obtainable from the marginal independence relations. Using simulated and real data, the accuracy of ION is compared with that of Structural EM, and with inference on complete, unified data.", "full_text": "Integrating locally learned causal structures\n\nwith overlapping variables\n\nRobert E. Tillman\n\nCarnegie Mellon University\n\nPittsburgh, PA 15213\n\nrtillman@andrew.cmu.edu\n\nDavid Danks, Clark Glymour\nCarnegie Mellon University &\n\nInstitute for Human & Machine Cognition\n\nPittsburgh, PA 15213\n\n{ddanks,cg09}@andrew.cmu.edu\n\nAbstract\n\nIn many domains, data are distributed among datasets that share only some vari-\nables; other recorded variables may occur in only one dataset. While there are\nasymptotically correct, informative algorithms for discovering causal relation-\nships from a single dataset, even with missing values and hidden variables, there\nhave been no such reliable procedures for distributed data with overlapping vari-\nables. We present a novel, asymptotically correct procedure that discovers a min-\nimal equivalence class of causal DAG structures using local independence infor-\nmation from distributed data of this form and evaluate its performance using syn-\nthetic and real-world data against causal discovery algorithms for single datasets\nand applying Structural EM, a heuristic DAG structure learning procedure for data\nwith missing values, to the concatenated data.\n\n1 Introduction\n\nIn many domains, researchers are interested in predicting the effects of interventions, or manipulat-\ning variables, on other observed variables. Such predictions require knowledge of causal relation-\nships between observed variables. There are existing asymptotically correct algorithms for learning\nsuch relationships from data, possibly with missing values and hidden variables [1][2][3], but these\nalgorithms all assume that every variable is measured in a single study. Datasets for such studies are\nnot always readily available, often due to privacy, ethical, \ufb01nancial, and practical concerns. How-\never, given the increasing availability of large amounts of data, it is often possible to obtain several\nsimilar studies that individually measure subsets of the variables a researcher is interested in and\ntogether include all such variables. For instance, models of the United States and United Kingdom\neconomies share some but not all variables, due to different \ufb01nancial recording conventions; fMRI\nstudies with similar stimuli may record different variables, since the images vary according to mag-\nnet strength, data reduction procedures, etc.; and U.S. states report some of the same educational\ntesting variables, but also report state-speci\ufb01c variables. In these cases, if each dataset has over-\nlapping variable(s) with at least one other dataset, e.g. if two datasets D1 and D2, which measure\nvariables V1 and V2, respectively, have at least one variable in common (V1 \u2229 V2 6= \u2205), then we\nshould be able to learn many of the causal relationships between the observed variables using this\nset of datasets. The existing algorithms, however, cannot in general be directly applied to such cases,\nsince they may require joint observations for variables that are not all measured in a single dataset.\n\nWhile this problem has been discussed in [4] and [5], there are no general, useful algorithms for\nlearning causal relationships from data of this form. A typical response is to concatenate the datasets\nto form a single common dataset with missing values for the variables that are not measured in each\nof the original datasets. Statistical matching [6] or multiple imputation [7] procedures may then\nbe used to \ufb01ll in the missing values by assuming an underlying model (or small class of models),\nestimating model parameters using the available data, and then using this model to interpolate the\n\n1\n\n\fmissing values. While the assumption of some underlying model may be unproblematic in many\nstandard prediction scenarios, i.e. classi\ufb01cation, it is unreliable for causal inference; the causal re-\nlationships learned using the interpolated dataset that are between variables which are never jointly\nmeasured in single dataset will only be correct if the corresponding relationships between variables\nin the assumed model happen to be causal relationships in the correct model. The Structural EM\nalgorithm [8] avoids this problem by iteratively updating the assumed model using the current inter-\npolated dataset and then reestimating values for the missing data to form a new interpolated dataset\nuntil the model converges. The Structural EM algorithm is only justi\ufb01ed, however, when missing\ndata are missing at random (or indicator variables can be used to make them so) [8]. The pattern\nof missing values in the concatenated datasets described above is highly structured. Furthermore,\nStructural EM is a heuristic procedure and may converge to local maxima. While this may not be\nproblematic in practice when doing prediction, it is problematic when learning causal relationships.\nOur experiments in section 4 show that Structural EM performs poorly in this scenario.\n\nWe present a novel, asymptotically correct algorithm\u2014the Integration of Overlapping Networks\n(ION) algorithm\u2014for learning causal relationships (or more properly, the complete set of possible\ncausal DAG structures) from data of this form. Section 2 provides the relevant background and\nterminology. Section 3 discusses the algorithm. Section 4 presents experimental evaluations of the\nalgorithm using synthetic and real-world data. Finally, section 5 provides conclusions.\n\n2 Formal preliminaries\n\nWe now introduce some terminology. A directed graph G = hV, Ei is a set of nodes V, which rep-\nresent variables, and a set of directed edges E connecting distinct nodes. If two nodes are connected\nby an edge then the nodes are adjacent. For pairs of nodes {X, Y } \u2286 V, X is a parent (child) of Y ,\nif there is a directed edge from X to Y (Y to X) in E. A trail in G is a sequence of nodes such that\neach consecutive pair of nodes in the sequence is adjacent in G and no node appears more than once\nin the sequence. A trail is a directed path if every edge between consecutive pairs of nodes points in\nthe same direction. X is an ancestor (descendant) of Y if there is a directed path from X to Y (Y\nto X). G is a directed acyclic graph (DAG) if for every pair {X, Y } \u2286 V, X is not both an ancestor\nand a descendent of Y (no directed cycles). A collider (v-structure) is a triple of nodes hX, Y, Zi\nsuch that X and Z are parents of Y . A trail is active given C \u2286 V if (i) for every collider hX, Y, Zi\nin the trail either Y \u2208 C or some descendant of Y is in C and (ii) no other node in the trail is in C.\nFor disjoint sets of nodes X, Y, and Z, X is d-separated (d-connected) from Y given Z if and only if\nthere are no (at least one) active trails between any X \u2208 X and any Y \u2208 Y given Z.\nA Bayesian network B is a pair hG, Pi, where G = hV, Ei is a DAG and P is a joint probability\ndistribution over the variables represented by the nodes in V such that P can be decomposed as\nfollows:\n\nP(V) = Y\n\nP (V |Parents(V ))\n\nV \u2208V\n\nFor B = hG, Pi, if X is d-separated from Y given Z in G, then X is conditionally independent of Y\ngiven Z in P [9]. For disjoint sets of nodes, X, Y, and Z in V, P is faithful to G if X is d-separated\nfrom Y given Z in G whenever X is conditionally independent of Y given Z in P [1]. B is a causal\nBayesian network if an edge from X to Y indicates that X is a direct cause of Y relative to V.\nMost algorithms for causal discovery, or learning causal relationships from nonexperimental data,\nassume that the distribution over the observed variables P is decomposable according to a DAG\nG and P is faithful to G. The goal is to learn G using the data from P. Most causal discovery\nalgorithms return a set of possible DAGs which entail the same d-separations and d-connections,\ne.g. the Markov equivalence class, rather than a single DAG. The DAGs in this set have the same\nadjacencies but only some of the same directed edges. The directed edges common to each DAG\nrepresent causal relationships that are learned from the data. If we admit the possibility that there\nmay be unobserved (latent) common causes between observed variables, then this set of possible\nDAGs is usually larger.\n\nA partial ancestral graph (PAG) represents the set of DAGs in a particular Markov equivalence class\nwhen latent common causes may be present. Nodes in a PAG correspond to observed variables.\nEdges are of four types: \u2212\u25ee, \u25e6\u2212\u25ee, \u25e6\u2212\u25e6 and \u25ed\u2212\u25ee, where a \u25e6 indicates either an \u25ee or \u2212 orientation,\nbidirected edges indicate the presence of a latent common cause, and fully directed edges (\u2212\u25ee)\n\n2\n\n\findicate that the directed edge is present in every DAG, e.g. a causal relationship. For {X, Y } \u2286 V,\na possibly active trail between X and Y given Z \u2286 V/{X, Y } is a trail in a PAG between X and Y\nsuch that some orientation of \u25e6\u2019s on edges between consecutive nodes in the trail, to either \u2212 or \u25ee,\nmakes the trail active given Z.\n\n3 Integration of Overlapping Networks (ION) algorithm\n\nThe ION algorithm uses conditional independence information to discover the complete set of PAGs\nover a set of variables V that are consistent with a set of datasets over subsets of V which have\noverlapping variables. ION accepts as input a set of PAGs which correspond to each of such datasets.\nA standard causal discovery algorithm that checks for latent common causes, such as FCI [1] or GES\n[3] with latent variable postprocessing steps1, must \ufb01rst be applied to each of the original datasets\nto learn these PAGs that will be input to ION. Expert domain knowledge can also be encoded in the\ninput PAGs, if available. The ION algorithm is shown as algorithm 1 and described below.\n\n: PAGs Gi \u2208 G with nodes Vi \u2286 V for i = 1, . . . , k\nInput\nOutput: PAGs Hi \u2208 H with nodes Vi = V for i = 1, . . . , m\nK \u2190 the complete graph over V with \u25e6\u2019s at every endpoint\nA \u2190 \u2205\nTransfer nonadjacencies and endpoint orientations from each Gi \u2208 G to K and propagate the\nchanges in K using the rules described in [10]\nPAT({X, Y}, Z) \u2190 all possibly active trails between X and Y given Z for all {X, Y} \u2286 V and\nZ \u2286 V/{X, Y} such that X and Y are d-separated given Z in some Gi \u2208 G\nPC \u2190 all minimal hitting sets of changes to K, such that all PATi \u2208 PAT are not active\nfor PCi \u2208 PC do\n\nAi \u2190 K after making and propagating the changes PCi\nif Ai is consistent with every Gi \u2208 G then add Ai to A\n\nend\nfor Ai \u2208 A do\n\nRemove Ai from A\nMark all edges in Ai as \u2018?\u2019\nFor each {X, Y} \u2286 V such that X and Y are adjacent in Ai, if X and Y are d-connected\ngiven \u2205 in some Gi \u2208 G, then remove \u2018?\u2019 from the edge between X and Y in Ai\nPR \u2190 every combination of removing or not removing \u2018?\u2019 marked edges from Ai\nfor PRi \u2208 PR do\n\nHi \u2190 Ai after making and propagating the changes PRi\nif Hi is consistent with every Gi \u2208 G then add Hi to H\n\nend\n\nend\n\n1\n2\n3\n\n4\n\n5\n6\n7\n8\n9\n10\n11\n12\n13\n\n14\n15\n16\n17\n18\n19\n\nAlgorithm 1: The Integration of Overlapping Networks (ION) algorithm\n\nThe algorithm begins with the complete graph over V with all \u25e6 endpoints and transfers nonadja-\ncencies and endpoint orientations from each Gi \u2208 G at line 3, e.g. if X and Y are not adjacent in Gi\nthen remove the edge between X and Y , if X is directed into Y in Gi then set the endpoint at Y on\nthe edge between X and Y to \u25ee. Once these orientations and edge removals are made, the changes\nto the complete graph are propagated using the rules in [10], which provably make every change\nthat is entailed by the current changes made to the graph. Lines 4-9 \ufb01nd every possibly active trail\nfor every {X, Y } \u2286 V given Z \u2286 V/{X, Y } such that X and Y are d-separated given Z in some\nGi \u2208 G. The constructed set PC includes all minimal hitting sets of graphical changes, e.g. unique\nsets of minimal changes that are not subsets of other sets of changes, which make these paths no\nlonger active. For each minimal hitting set, a new graph is constructed by making the changes in\nthe set and propagating these changes. If the graph is consistent with each Gi \u2208 G, e.g. the graph\ndoes not imply a d-separation for some {X, Y } \u2286 V given Z \u2286 V/{X, Y } such that X and Y are\nd-connected in some Gi \u2208 G, then this graph is added to the current set of possible graphs. Lines 10-\n\n1We use the standard GES algorithm to learn a DAG structure from the data and then use the FCI rules to\n\ncheck for possible latent common causes.\n\n3\n\n\f19 attempt to discover any additional PAGs that may be consistent with each Gi \u2208 G after deleting\nedges from PAGs in the current set and propagating the changes. If some pair of nodes {X, Y } \u2286 V\nthat are adjacent in a current PAG are d-connected given \u2205 in some Gi \u2208 G, then we do not consider\nsets of edge removals which remove this edge.\n\nThe ION algorithm is provably sound in the sense that the output PAGs are consistent with every\nGi \u2208 G, e.g. no Hi \u2208 H entails a d-separation or d-connection that contradicts a d-separation or\nd-connection entailed by some Gi \u2208 G. This property follows from the fact that d-separation and\nd-connection are mutually exclusive, exhaustive relations.\nTheorem 3.1 (soundness). If X and Y are d-separated (d-connected) given Z in some Gi \u2208 G, then\nX and Y are d-separated (d-connected) given Z in every Hi \u2208 H.\n\nProof Sketch. Every structure Ai constructed at line 7 provably entails every d-separation entailed\nby some Gi \u2208 G. Such structures are only added to A if they do not entail a d-separation correspond-\ning to a d-connection in some Gi \u2208 G. The only changes made (other than changes resulting from\npropagating other changes which are provably correct by [10]) in lines 10-19 are edge removals,\nwhich can only create new d-separations. If a new d-separation is created which corresponds to a\nd-connection in some Gi \u2208 G, then the PAG entailing this new d-separation is not added to H.\n\nThe ION algorithm is provably complete in the sense that if there is some structure Hi over the\nvariables V that is consistent with every Gi \u2208 G, then Hi \u2208 H.\nTheorem 3.2 (completeness). Let Hi be a PAG over the variables V such that for every pair\n{X, Y } \u2286 V, if X and Y are d-separated (d-connected) given Z \u2286 V/{X, Y } in some Gi \u2208 G, then\nX and Y are d-separated (d-connected) given Z in Hi. Then, Hi \u2208 H.\n\nProof Sketch. Every change made at line 3 is provably necessary to ensure soundness. At least\none graph added to A at line 8 provably has every adjacency (possibly more) in Hi and no non-\u25e6\nendpoints on an edge found in Hi that is not also present in Hi. Some sequence of edge removals\nwill provably produce Hi at line 16 and it will be added to the output set since it is consistent with\nevery Gi \u2208 G.\n\nThus, by theorems 3.1 and 3.2, ION is an asymptotically correct algorithm for learning the complete\nset of PAGs over V that are consistent with a set of datasets over subsets of V with overlapping\nvariables, if the input PAGs are discovered using an asymtotically correct algorithm that detects the\npresence of latent common causes, i.e. FCI, with each of these datasets.\n\nFinding all minimal hitting sets is an NP-complete problem [11]. Since learning a DAG structure\nfrom data is also an NP-complete problem [12], the ION algorithm, as given above, requires a\nsuperexponential (in V) number of operations and is often computationally intractable even for small\nsizes of |V|. In practice, however, we can break the minimal hitting set problem into a sequence\nof smaller subproblems and use a branch and bound approach that is tractable in many cases and\nstill results in an asymptotically correct algorithm. We tested several such strategies. The method\nwhich most effectively balanced time and space complexity tradeoffs was to \ufb01rst \ufb01nd all minimal\nhitting sets which make all possibly active trails of length 2 that correspond to d-separations in some\nGi \u2208 G not active, then \ufb01nd the structures resulting from making and propagating these changes that\nare consistent with every Gi \u2208 G, and iteratively do the same for each of these structures, increasing\nthe length of possibly active trails considered until trails of all sizes are considered.\n\n4 Experimental results\n\nWe \ufb01rst used synthetic data to evaluate the performance of ION with known ground truth. In the \ufb01rst\nexperiment, we generated 100 random 4-node DAGs using the MCMC algorithm described in [13]\nwith random discrete parameters (conditional probability tables for the factors in the decomposition\nshown in section 2). For each DAG, we then randomly chose two subsets of size 2 or 3 of the nodes\nin the DAG such that the union of the subsets included all 4 nodes and at least one overlapping\nvariable between the two subsets was present. We used forward sampling to generate two i.i.d.\nsamples of sizes N = 50, N = 100, N = 500, N = 1000 and N = 2500 from the DAGs for\nonly the variables in each subset. We used both FCI and GES with latent variable postprocessing to\n\n4\n\n\fi\n\ni\n\ns\nn\no\ns\ns\nm\no\ne\ng\nd\nE\n\n \n\ns\nr\no\nr\nr\ne\n \nn\no\ni\nt\na\nt\nn\ne\ni\nr\n\nO\n\n5\n\n4\n\n3\n\n2\n\n1\n\n0\n\n \n\n5\n\n4\n\n3\n\n2\n\n1\n\n0\n\n \n\nFCI\u2212baseline\nION\u2212FCI\nGES\u2212baseline\nION\u2212GES\nStructural EM\n\ni\n\ni\n\ns\nn\no\ns\ns\nm\nm\no\nc\n \n\ne\ng\nd\nE\n\nN=50 N=100 N=500 N=1000 N=2500\n\nSample size\n\n(a)\n\n3\n\n2.5\n\n2\n\n1.5\n\n1\n\n0.5\n\n0\n\nN=50 N=100 N=500 N=1000 N=2500\n\nSample size\n(b)\n\n)\ns\nd\nn\no\nc\ne\ns\n(\n \n\ne\nm\nT\n\ni\n\n14000\n\n12000\n\n10000\n\n8000\n\n6000\n\n4000\n\n2000\n\n0\n\nN=50 N=100 N=500 N=1000 N=2500\n\nSample size\n(d)\n\nN=50 N=100 N=500 N=1000 N=2500\n\nSample size\n\n(c)\n\nFigure 1: (a) edge omissions, (b) edge commissions, (c) orientation errors, and (d) runtimes\n\ngenerate PAGs for each of these samples which were input to ION. To evaluate the accuracy of ION,\nwe counted the number of edge omission, edge commision, and orientation errors (\u25ee instead of \u2212)\nfor each PAG in the ION output set and averaged the results. These results were then averaged across\nall of the 100 4-node structures. Figure 1 shows the averaged results for these methods along with\n3 other methods we included for comparison. ION-FCI and ION-GES refer the the performance of\nION when the input PAGs are obtained using the FCI algorithm and the GES algorithm with latent\nvariable postprocessing, respectively. For Structural EM, we took each of the datasets over subsets\nof the nodes in each DAG and formed a concatenated dataset, as described in section 1, which\nwas input to the Structural EM algorithm.2 For FCI-baseline and GES-baseline, we used forward\nsampling to generate another i.i.d. sample of sizes N = 50, N = 100, N = 500, N = 1000 and\nN = 2500 for all of the variables in each DAG and used these datasets as input for the FCI and GES\nwith latent variable postprocessing algorithms, respectively, to obtain a measure for how well these\nalgorithms perform when no data is missing. The average runtimes for each method are also reported\nin \ufb01gure 1. Error bars show 95% con\ufb01dence intervals. We \ufb01rst note the performance of Structural\nEM. Almost no edge omission errors are made, but more edge commissions errors are made than any\nof the other methods and the edge commission errors do not decrease as the sample size increases.\nWhen we looked at the results, we found that Structural EM always returned either the complete\ngraph or a graph that was almost complete, indicating that Structural EM is not a reliable method\nfor causal discovery in this scenario where there is a highly structured pattern to the missing data.\nFurthermore, the runtime for Structural EM was considerably higher than any of the other methods.\nFor the larger sample sizes (where more missing values need to be estimated at each iteration), a\nsingle run required several hours in some instances. Due to its signi\ufb01cant computation time, we\n\n2We ran Structural EM with 5 random restarts and chose the model with the highest BDeu score to avoid\nconverging to local maxima. Random \u201cchains\u201d of nodes were used as the initial models. Structural EM was\nnever stopped before convergence.\n\n5\n\n\fi\n\ni\n\ns\nn\no\ns\ns\nm\no\n \ne\ng\nd\nE\n\ni\n\ns\nn\no\ns\ns\nm\no\n\ni\n\n \n\ne\ng\nd\nE\n\n8\n7\n6\n5\n4\n3\n2\n1\n0\n\n \n\n8\n7\n6\n5\n4\n3\n2\n1\n0\n\n \n\n \n\nFCI\u2212baseline\nION\u2212FCI\nGES\u2212baseline\nION\u2212GES\n\ni\n\ni\n\ns\nn\no\ns\ns\nm\nm\no\nc\n \n\ne\ng\nd\nE\n\nN=50 N=100 N=500 N=1000N=2500\n\nSample size\n(a)\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\nN=50 N=100 N=500 N=1000N=2500\n\nSample size\n(b)\n\ns\nr\no\nr\nr\ne\n\n \n\nn\no\n\ni\nt\n\na\n\nt\n\nn\ne\ni\nr\n\nO\n\n6\n\n5\n\n4\n\n3\n\n2\n\n1\n\n0\n\nN=50 N=100 N=500 N=1000N=2500\n\nSample size\n(c)\n\nFigure 2: (a) edge omissions, (b) edge comissions, and (c) orientation errors\n\n \n\nFCI\u2212baseline\nION\u2212FCI\nGES\u2212baseline\nION\u2212GES\n\ni\n\ni\n\ns\nn\no\ns\ns\nm\nm\no\nc\n \n\ne\ng\nd\nE\n\nN=50 N=100 N=500 N=1000N=2500\n\nSample size\n(a)\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\ns\nr\no\nr\nr\ne\nn\no\ni\nt\n\n \n\na\n\nt\n\nn\ne\ni\nr\n\nO\n\nN=50 N=100 N=500 N=1000N=2500\n\nSample size\n(b)\n\n6\n\n5\n\n4\n\n3\n\n2\n\n1\n\n0\n\nN=50 N=100 N=500 N=1000N=2500\n\nSample size\n(c)\n\nFigure 3: (a) edge omissions, (b) edge comissions, and (c) orientation errors\n\nwere unable to use Structural EM with larger DAG structures so it is excluded in the experiments\nbelow. The FCI-baseline and GES-baseline methods performed similarly to previous simulations\nof them. The ION-FCI and ION-GES methods performed similarly to the FCI-baseline and GES-\nbaseline methods but made slightly more errors and showed slower convergence (due to the missing\ndata). Very few edge commission errors were made. Slightly more edge omission errors were made,\nbut these errors decrease as the sample size increases. Some edge orientation errors were made even\nfor the larger sample sizes. This is due to the fact that each of the algorithms returns an equivalence\nclass of DAGs rather than a single DAG. Even if the correct equivalence class is discovered, errors\nresult after comparing the ground truth DAG to every DAG in the equivalence class and averaging.\nWe also note that there are fewer orientation errors for the GES-baseline and ION-GES methods on\nthe two smallest sample sizes than all of the other sample sizes. While this may seem surprising, it\nis simply a result of the fact that more edge omission errors are made for these cases.\n\nWe repeated the above experiment for 3 similar cases where we used 6-node DAG structures rather\nthan 4-node DAG structures: (i) two i.i.d. samples were generated for random subsets of sizes 2-5\nwith only 1 variable that is not overlapping between the two subsets; (ii) two i.i.d. samples were\ngenerated for random subsets of sizes 2-5 with only 2 variables that are not overlapping between\nthe two subsets; (iii) three i.i.d. samples were generated for random subsets of sizes 2-5 with only\n1 variable that is not overlapping between any pair of subsets. Figures 2, 3, and 4 show edge\nomission, edge commission, and orientation errors for each of these cases, respectively. In general,\nthe performance in each case is similar to the performance for the 4-node case.\n\nWe also tested the performance of ION-FCI using a real world dataset measuring IQ and various\nneuroanatomical and other traits [14]. We divided the variables into two subsets with overlapping\nvariables based on domain grounds: (a) variables that might be included in a study on the relationship\nbetween neuroanatomical traits and IQ; and (b) variables for a study on the relationship between IQ,\nsex, and genotype, with brain volume and head circumference included as possible confounders.\nFigures 5a and 5b show the FCI output PAGs when only the data for each of these subsets of the\nvariables is provided as input, respectively. Figure 5c shows the output PAG of ION-FCI when these\ntwo resulting PAGs are used as input. We also ran FCI on the complete dataset to have a comparison.\nFigure 5d shows this PAG.\n\n6\n\n\fi\n\ni\n\ns\nn\no\ns\ns\nm\no\n \ne\ng\nd\nE\n\n8\n7\n6\n5\n4\n3\n2\n1\n0\n\n \n\n \n\nFCI\u2212baseline\nION\u2212FCI\nGES\u2212baseline\nION\u2212GES\n\ni\n\ni\n\ns\nn\no\ns\ns\nm\nm\no\nc\n \n\ne\ng\nd\nE\n\nN=50 N=100 N=500 N=1000N=2500\n\nSample size\n(a)\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\nN=50 N=100 N=500 N=1000N=2500\n\nSample size\n(b)\n\ns\nr\no\nr\nr\ne\n\n \n\nn\no\n\ni\nt\n\na\n\nt\n\nn\ne\ni\nr\n\nO\n\n6\n\n5\n\n4\n\n3\n\n2\n\n1\n\n0\n\nN=50 N=100 N=500 N=1000N=2500\n\nSample size\n(c)\n\nFigure 4: (a) edge omissions, (b) edge comissions, and (c) orientation errors\n\nCorpus Collosum Surface Area\n\nBrain Surface Area\n\nIQ\n\nBody Weight\n\nHead Circumference\n\n(a)\n\n(b)\n\n(c)\n\n(d)\n\nBrain Volume\n\nGenotype\n\nIQ\n\nBody Weight\n\nHead Circumference\n\nBrain Volume\n\nSex\n\nCorpus Collosum Surface Area\n\nBrain Surface Area\n\nGenotype\n\nIQ\n\nBody Weight\n\nHead Circumference\n\nBrain Volume\n\nSex\n\nCorpus Collosum Surface Area\n\nBrain Surface Area\n\nGenotype\n\nIQ\n\nBody Weight\n\nHead Circumference\n\nBrain Volume\n\nSex\n\nFigure 5: (a) FCI output PAG for variables in subset a, (b) FCI output PAG for variables in subset b,\n(c) ION output PAG when using the FCI ouput PAGs for variables in subset a and variables in subset\nb as input, and (d) FCI output PAG for all variables\n\nIn this particular case, the output of ION-FCI consists of only a single PAG, which is identical to\nthe result when FCI is given the complete dataset as input. This case shows that in some instances,\nION-FCI can recover as much information about the true DAG structure as FCI even when less\ninformation can be extracted from the ION-FCI input. We note that the graphical structure of the\ncomplete PAG (\ufb01gures 5c and 5d) is the union of the structures shown in \ufb01gures 5a and 5b. While\nvisually this may appear to be a trivial example for ION where all of the relevant information can be\nextracted in the \ufb01rst steps, there is in fact much processing required in later stages in the algorithm\nto determine the structure around the nonoverlapping variables.\n\n5 Conclusions\n\nIn practice, researchers are often unable to \ufb01nd or construct a single, complete dataset containing\nevery variable they may be interested in (or doing so is very costly). We thus need some way\nof integrating information about causal relationships that can be discovered from a collection of\ndatasets with related variables [5]. Standard causal discovery algorithms cannot be used, since they\ntake only a single dataset as input. To address this open problem, we proposed the ION algorithm,\nan asymptotically correct algorithm for discovering the complete set of causal DAG structures that\nare consistent with such data.\n\nWhile the results presented in section 4 indicate that ION is useful in smaller domains when the\nbranch and bound approach described in section 3 is used, a number of issues must be addressed\nbefore ION or a simlar algorithm is useful for higher dimensional datasets. Probably the most sig-\nni\ufb01cant problem is resolving contradictory information among overlapping variables in different\n\n7\n\n\finput PAGs, i.e. X is a parent of Y in one PAG and a child of Y in another PAG, resulting from\nstatistical errors or when the input samples are not identi\ufb01cally distributed. ION currently ignores\nsuch information rather than attempting to resolve it. This increases uncertainty and thus the size of\nthe resulting output set of PAGs. Furthermore, simply ignoring such information does not always\navoid con\ufb02icts. In some of such cases, ION will not discover any PAGs which entail the correct\nd-separations and d-connections. Thus, no output PAGs are returned. When performing condi-\ntional independence tests or evaluating score functions, statistical errors occur more frequently as\nthe dimensionality of a dataset increases, unless the sample size also increases at an exponential\nrate (resulting from the so-called curse of dimensionality). Thus, until reliable methods for resolv-\ning con\ufb02icting information from input PAGs are developed, ION and similar algorithms will not\nin general be useful for higher dimensional datasets. Furthermore, while the branch and bound\napproach described in section 3 is a signi\ufb01cant improvement over other methods we tested for com-\nputing minimal hitting sets, its memory requirements are still considerable in some instances. Other\nalgorithmic strategies should be explored in future research.\n\nAcknowledgements\n\nWe thank Joseph Ramsey, Peter Spirtes, and Jiji Zhang for helpful discussions and pointers. We\nthank Frank Wimberley for implementing the version of Structural EM we used. R.E.T. was sup-\nported by the James S. McDonnell Foundation Causal Learning Collaborative Initiative. C.G. was\nsupported by a grant from the James S. McDonnell Foundation.\n\nReferences\n[1] P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. MIT Press, 2nd\n\nedition, 2000.\n\n[2] J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000.\n[3] D. M. Chickering. Optimal structure identi\ufb01cation with greedy search. Journal of Machine\n\nLearning Research, 3:507\u2013554, 2002.\n\n[4] D. Danks. Learning the causal structure of overlapping variable sets. In Discovery Science:\n\nProceedings of the 5th International Conference, 2002.\n\n[5] D. Danks. Scienti\ufb01c coherence and the fusion of experimental results. The British Journal for\n\nthe Philosophy of Science, 56:791\u2013807, 2005.\n\n[6] S. R\u00a8assler. Statistical Matching. Springer, 2002.\n[7] D. B. Rubin. Multiple Imputation for Nonresponse in Surveys. Wiley & Sons, 1987.\n[8] N. Friedman. The Bayesian structural EM algorithm. In Proceedings of the 14th Conference\n\non Uncertainty in Arti\ufb01cial Intelligence, 1998.\n\n[9] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.\n\nMorgan Kauffmann Publishers, 1988.\n\n[10] J. Zhang. A characterization of markov equivalence classes for causal models with latent\nIn Proceedings of the 23rd Conference on Uncertainty in Arti\ufb01cial Intelligence,\n\nvariables.\n2007.\n\n[11] R. Greiner, B. A. Smith, and R. W. Wilkerson. A correction to the algorithm in Reiter\u2019s theory\n\nof diagnosis. Arti\ufb01cial Intelligence, 41:79\u201388, 1989.\n\n[12] D. M. Chickering. Learning Bayesian networks is NP-complete. In Proceedings of the 5th\n\nInternational Workshop on Arti\ufb01cial Intelligence and Statistics, 1995.\n\n[13] G. Melanc\u00b8on, I. Dutour, and M. Bousquet-M\u00b4elou. Random generation of dags for graph draw-\ning. Technical Report INS-R0005, Centre for Mathematics and Computer Sciences, Amster-\ndam, 2000.\n\n[14] M. J. Tramo, W. C. Loftus, R. L Green, T. A. Stukel, J. B. Weaver, and M. S. Gazzaniga. Brain\n\nsize, head size, and IQ in monozygotic twins. Neurology, 50:1246\u20131252, 1998.\n\n8\n\n\f", "award": [], "sourceid": 304, "authors": [{"given_name": "David", "family_name": "Danks", "institution": null}, {"given_name": "Clark", "family_name": "Glymour", "institution": null}, {"given_name": "Robert", "family_name": "Tillman", "institution": null}]}