WBAR WBAR - 5 months ago 21
SQL Question

MySQL: How to find leaves in specific node

I know this kind questions has been posted here many times, for exmaple: Java way

I have huge amount of data (150k+) in standard tree pattern (

id
,
parent_id
,
some_data
)

Question: How to get leaves for given node_id?

Table structure:

CREATE TABLE `DATA_TREE` (
`ID` int(11) NOT NULL,
`PARENT_ID` int(11) NOT NULL,
`DATA` varchar(45) DEFAULT NULL,
PRIMARY KEY (`ID`),
UNIQUE KEY `ID_UNIQUE` (`ID`),
KEY `fk_DATA_TREE_1_idx` (`PARENT_ID`),
CONSTRAINT `fk_DATA_TREE_1` FOREIGN KEY (`PARENT_ID`) REFERENCES `DATA_TREE` (`ID`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf


Database:
MySQL 5.1.61

Answer

There is no way to do this in a single query. Even if there was it would probably be very inefficient.

We can do it with a stored procedure and a loop. With the indexes you added it should be pretty quick too. This uses two tables selecting the nodes from the input table (A) and inserting the node and their children into (B). It then swaps B for A, and repeats until no more non-leaf nodes exist in A. The nice thing is loop iterations would only be as many as their are levels between the input node and the last leaf node, which under most cases is probably not that deep. This stored procedure would be faster than doing it externally in code.

FYI I had difficulty with my installation handling temporary tables, if you get a 'error 2' then remove the temporary keyword.

delimiter $$
drop procedure if exists GetLeafNodes $$
create procedure GetLeafNodes(nodeid int)
begin
declare N int default 1;

-- create two working sets of IDs, we'll go back and forth between these two sets
drop temporary table if exists A;
drop temporary table if exists B;
create temporary table A(node int, child int);
create temporary table B(node int, child int);

-- insert our single input node into the working set
insert into A values (null, nodeid);

while (N>0) do
  -- keep selecting child nodes for each node we are now tracking
  -- leaf nodes will end up with the child set to null
  insert into B
  select ifnull(A.child,A.node), tree.ID
    from A
    left outer join DATA_TREE as tree on A.child=tree.parent_id;

  -- now swap A and B
  rename table A to temp, B to A, temp to B;

  -- remove non-leaf nodes from table B
  delete from B;

  -- exit when there are no longer any non-leaf nodes in A
  set N=(select count(*) from A where child is not null);
end while;

-- now output our list of leaf nodes
select node from A;

drop temporary table A;
drop temporary table B;
end $$
DELIMITER ;
call GetLeafNodes(4);

I used the following sample set for testing:

CREATE TABLE `DATA_TREE` (
  `ID` int(11) NOT NULL,
  `PARENT_ID` int(11) NOT NULL,
  PRIMARY KEY (`ID`),
  UNIQUE KEY `ID_UNIQUE` (`ID`),
  KEY `fk_DATA_TREE_1_idx` (`PARENT_ID`)
) ENGINE=InnoDB
;

insert into DATA_TREE values
(1,0),(2,1),(3,1),(4,1),(5,3),(6,3),(7,4),(8,4),(9,4),(10,6),(11,6),(12,7),(13,9),(14,9),(15,12),(16,12),(17,12),(18,14);