Question

[LeetCode 127] Given two words (beginWord and endWord), and a dictionary’s word list, find the length of shortest transformation sequence from beginWord to endWord, such that:

Only one letter can be changed at a time
Each intermediate word must exist in the word list

For example,

Given:
      beginWord = "hit"
      endWord = "cog"
      wordList = ["hot","dot","dog","lot","log"]
      As one shortest transformation is "hit" -> "hot" -> "dot" -> "dog" -> "cog",
      return its length 5.

The journey to Accept

I spent lots of time on this question, as I hit Time Limit Exceeded in the first few trials and it confused me a lot. Thus I would like to take this opportunity to summarized what I’ve learnt.

Correct BFS Solution, but it’s bad!

First of all, this question is trival for most of us, it’s just a basic graph traversal problem, we consider the word lists a graph, whereas each word in the word list is a node, and we add an edges to connect both nodes if both nodes are only different in one character.

The shortest transformation from beginWord to endWord are actually the shorest path from beginWord to endWord, and we know that BFS is the best fit for this kind of problem.

Thus, connected(String s1, String s2) was implemented to determine whether two nodes(words) are actually connected.

public boolean connected(String s1, String s2) {
    int diff = 0;
    for (int i = 0; i < s1.length(); i++) {
        if (s1.charAt(i) != s2.charAt(i)) {
            diff++;
            if (diff > 1) {
                return false;
            }
        }
    }
    return (diff == 1);
}

And we iterate wordList just as the normal BFS does. Add a queue, blar blar blar…

public int ladderLength(String beginWord, String endWord, Set<String> wordList) {
    TreeSet<String> visited = new TreeSet<String>();
    
    int depth = 1;
    Queue<String> bfsQueue = new LinkedList<String>();
    bfsQueue.add(beginWord);
    TreeSet<String> nextLayer = new TreeSet<String>();
    
    while (!bfsQueue.isEmpty()) {
        String currWord = bfsQueue.poll();
        System.out.println(currWord);
        if (connected(currWord, endWord)) {
            return depth+1;
        }
        for (String word : wordList) {
            if (!visited.contains(word) && connected(word, currWord)) {
                nextLayer.add(word);
                visited.add(currWord);
            }
        }
        if (bfsQueue.isEmpty()) {
            bfsQueue.addAll(nextLayer);
            nextLayer.clear();
            depth++;
        }
    }
    
    return 0;
}

We use a queue bfsQueue to store the word to be traversed. Since we need to find the shorstest path from source to destination, we need to measure the depth of the traversal. Thus, we insert an empty String each time to separate the nodes in each level

if (bfsQueue.isEmpty()) {
    bfsQueue.addAll(nextLayer);
    nextLayer.clear();
    depth++;
}

And up to this stage, I finally get a correct implementation but a bad one. Soon I’ve found that this submission so dump that it can’t even beat the test case with around 600 words. Therefore, impovement has to be make for better timing.

Improve the solution with 2-way BFS

After a little bit of google-ing, I’ve found a way to speed up BFS would be using 2-way BFS, which we not only traverse the graph from source to destination, but also from destination to source. The main idea is that we reduece the number of nodes to be traversed, as less nodes is expended during BFS traversal.

2-way BFS

public int ladderLength(String beginWord, String endWord, Set<String> wordList) {
    int listSize = wordList.size();
    int[] visited = new int[listSize+2];
    String[] wordListArr = new String[listSize+2];
    wordListArr = wordList.toArray(wordListArr);
    
    wordListArr[listSize] = beginWord;
    visited[listSize] = 1;	// forward
    wordListArr[listSize+1] = endWord;
    visited[listSize+1] = -1;	// backward
    
    int forwardDepth = 1;
    Queue<Integer> forwardQueue = new LinkedList<Integer>();
    forwardQueue.add(listSize);
    forwardQueue.add(-1);
    
    int backwardDepth = 1;
    Queue<Integer> backwardQueue = new LinkedList<Integer>();
    backwardQueue.add(listSize+1);
    backwardQueue.add(-1);
    
    int currentSerchingDir = 1;
    while (!forwardQueue.isEmpty() && !backwardQueue.isEmpty()) {
        Integer currWord = currentSerchingDir == 1 ? forwardQueue.poll() : backwardQueue.poll();
        if (currWord >= 0) {
            for (int i = 0; i < wordListArr.length; i++) {
                if (visited[i] != currentSerchingDir) {
                    if (connected(wordListArr[i], wordListArr[currWord])) {
                        if (visited[i] == 0) {
                            if (currentSerchingDir == 1) {
                                forwardQueue.add(i);
                            } else {
                                backwardQueue.add(i);
                            }
                            visit++;
                            visited[i] = currentSerchingDir;
                        } else {
                            return forwardDepth + backwardDepth;
                        }
                    }
                }
            }
        } else if (!forwardQueue.isEmpty() || !forwardQueue.isEmpty()) {
            System.out.println(currentSerchingDir);
            if (currentSerchingDir == 1) {
                forwardDepth++;
                forwardQueue.add(-1);
            } else {
                backwardDepth++;
                backwardQueue.add(-1);
            }
            currentSerchingDir *= -1;
        }
    }
    
    return 0;
}

2 queues are used this time, one for the forward traversial and one for backward traversial, and swap the treavre direction after each iteration. Besides, I’ve also used a boolean array visited to indicate whether a particular nodes have been traversed, etc.

The results are positive, it did improved the run time, and the 600-word test cases are beated, but this time it fail the 4000-word test cases. More have to be done to get to our goal.

Not to loop through the whole word list

One of the major problem in the previous submissions are it need to check too many nodes in each layers

for (int i = 0; i < wordListArr.length; i++) {
    if (visited[i] != currentSerchingDir) {
        if (connected(wordListArr[i], wordListArr[currWord])) {
            if (visited[i] == 0) {
                if (currentSerchingDir == 1) {
                    forwardQueue.add(i);
                } else {
                    backwardQueue.add(i);
                }
                visit++;
                visited[i] = currentSerchingDir;
            } else {
                return forwardDepth + backwardDepth;
            }
        }
    }
}

In each iteration, It will check the words in the wordlist to see whether two nodes should be connected, it take $O(kn)$ time to findout the where $k$ is the number of nodes traversed, which in worse case we have $O(n^2)$ running time (i.e. each node are only travered once), which is clearly not acceptable.

But, how we can avoid these stupid traveral? One key observation will help. The maximun number of outgoing edges that a node can have will be $len(s)*25$, as mentioned in the quesiton.

Only one letter can be changed at a time
Each intermediate word must exist in the word list

As illustrated below, the word abc can be transformed to bbc, cbc, etc. The total number of outgoing edges are bounded. Thus, we can loop through all the possiblilty during each iteration, the time complexity reduced to $O(k* len(s) *25) => O(k* len(s))$.

abc +- (b)bc
    +- (c)bc
    +- (d)bc
     ...
    +- (z)bc
    +- a(a)c
    +- a(c)c
    +- a(d)c
     ...
    +- a(z)c
    +- ab(a)
    +- ab(b)
    +- ab(d)
     ...
    +- ab(z)

This is a very important observation and it leads to a huge improvement, and I finally get a green light from LeetCode after that.

Queue<String> currentQueue = forwardQueue;
Queue<String> targetQueue = backwardQueue;
while(currentQueue.size() != 0) {
    String currentWord = currentQueue.poll();
    if (currentWord.length() > 0) {
        char[] currWordCharArr = currentWord.toCharArray();
        for (int i = 0; i < currentWord.length(); i++) {
            char currOriginChar = currWordCharArr[i]; 
            for (char c = 'a'; c <= 'z'; c++) {
                if (c != currOriginChar) {
                    currWordCharArr[i] = c;
                    String s = new String(currWordCharArr);
                    if (targetQueue.contains(s)) {
                        return depth + 1;
                    }
                    if (wordList.contains(s)) {
                        wordList.remove(s);
                        currentQueue.add(s);
                    }
                }
            }
            currWordCharArr[i] = currOriginChar; 
        }
    } else if (forwardQueue.size() != 0) {
        depth++;
        currentQueue.add("");
        Queue<String> tmpQueue = currentQueue;
        currentQueue = targetQueue;
        targetQueue = tmpQueue;
    }
}

2-way BFS are used and with limited number of nodes to be traversed, we derived the possible-next-level nodes and check if it is a valid node instead of finding the next-level nodes from the list of nodes.

The above code are accepted by LeetCode and it run with 113 ms. It beats only 47.97% of the Java submission. And of course, we know we can do better!

Further optimization

We final improvement to push the running time down to 32 ms which beats 93.11% Java submission by the following improvements:

Use HashMap instead of Queue for storing the next level nodes.
Do not switch side after each iteration, instead, we choose the side with minimum number of node, so that we reduced the number of nodes traversed as much as possible

And here come my final Submission…

Final Submission

public class Solution {
    
    public int ladderLength(String beginWord, String endWord, Set<String> wordList) {
        
        int depth = 1;
        
        Set<String> wordHashList = new HashSet<String>();
        wordHashList.addAll(wordList);
        
        Set<String> forward = new HashSet<String>();
        forward.add(beginWord);
        
        Set<String> backward = new HashSet<String>();
        backward.add(endWord);
        
        Set<String> currentSet = forward;
        Set<String> targetSet = backward;
        while(currentSet.size() != 0) {
            Set<String> nextLevelSet = new HashSet<String>();
            for (String currentWord : currentSet) {
                char[] currWordCharArr = currentWord.toCharArray();
                for (int i = 0; i < currentWord.length(); i++) {
                    char currOriginChar = currWordCharArr[i]; 
                    for (char c = 'a'; c <= 'z'; c++) {
                        if (c != currOriginChar) {
                            currWordCharArr[i] = c;
                            String s = new String(currWordCharArr);
                            if (targetSet.contains(s)) {
                                return depth + 1;
                            }
                            if (wordHashList.contains(s)) {
                                nextLevelSet.add(s);
                                wordHashList.remove(s);
                            }
                        }
                    }
                    currWordCharArr[i] = currOriginChar; 
                }
            }
            depth++;
            if (nextLevelSet.size() > targetSet.size()) {
                currentSet = targetSet;
                targetSet = nextLevelSet;
            } else {
                currentSet = nextLevelSet;
            }
        }
        
        return 0;
        
    }
}

Thanks for reading, please let me know if I can further improve the running time in the comment section! I am always looking for improvement.