Intro to Graphs with C++

Crack FAANG
30 min readJan 28, 2021

--

There are 2 popular approaches of doing graph traversals :
* Depth first search ( DFS )
* Breadth first search ( BFS )

Example Implementation Of Bfs And Dfs

DFS Algorithmic Steps

Step 1: Push the root node in the Stack.
Step 2: Loop until stack is empty.
Step 3: Peek the node of the stack.
Step 4: If the node has unvisited child nodes, get the unvisited child node, mark it as traversed and push it on stack.
Step 5: If the node does not have any unvisited child nodes, pop the node from the stack.

Based upon the above steps, the following Java code shows the implementation of the DFS algorithm:

public void dfs() {
//DFS uses Stack data structure
Stack s = new Stack();
s.push(this.rootNode);
rootNode.visited = true;
printNode(rootNode);
while(!s.isEmpty()) {
Node n = (Node)s.peek();
Node child = getUnvisitedChildNode(n); // Essentially this function goes through the neighbors of n, and finds the one with node.visited = false
if(child != null) {
child.visited = true;
printNode(child); // print the node as you like.
s.push(child);
}
else {
s.pop();
}
}
//Clear visited property of nodes
clearNodes();
}

BFS Algorithmic Steps

Step 1: Push the root node in the Queue.
Step 2: Loop until the queue is empty.
Step 3: Remove the node from the Queue.
Step 4: If the removed node has unvisited child nodes, mark them as visited and insert the unvisited children in the queue.

Based upon the above steps, the following Java code shows the implementation of the BFS algorithm:

public void bfs() {
//BFS uses Queue data structure
Queue q = new LinkedList();
q.add(this.rootNode);
printNode(this.rootNode);
rootNode.visited = true;
while(!q.isEmpty()) {
Node n = (Node)q.remove();
Node child = null;
while((child = getUnvisitedChildNode(n)) != null) {
child.visited = true;
printNode(child);
q.add(child);
}
}
//Clear visited property of nodes
clearNodes();
}

Breadth First Search

Breadth First Search (BFS) is an algorithm for traversing or searching layerwise in tree or graph data structures. BFS was first invented in 1945 by Konrad Zuse which was not published until 1972. It was reinvented in 1959 by Edward F. Moore for finding the shortest path out of a maze. BFS was further developed by C.Y.Lee into a wire routing algorithm (published in 1961). The strategy used here is opposite to depth first search (DFS) which explores the nodes as far as possible (depth-wise) before being forced to backtrack and explore other nodes. The algorithm starts at the tree root (or any arbitrary node of a graph called ‘source node’), and investigates all of the neighboring nodes (directly connected to source node) at the present level before moving on to the nodes at the next level. The process is repeated until the desired result is obtained.

Application

Most of the concepts in computer science and real world can be visualized and represented in terms of graph data structure. BFS is one such useful algorithm for solving these problems easily. The architecture of BFS is simple, accurate and robust. It is very seamless as it is guaranteed that the algorithm won’t get caught in an infinite loop.

  • Shortest Path: In an unweighted graph, the shortest path is the path with least number of edges. With BFS, we always reach a node from given source in shortest possible path. Example: Dijkstra’s Algorithm.
  • GPS Navigation Systems: BFS is used to find the neighboring locations from a given source location.
  • Finding Path: We can use BFS to find whether a path exists between two nodes.
  • Finding nodes within a connected component: BFS can be used to find all nodes reachable from a given node.
  • Social Networking Websites: We can find number of people within a given distance ‘k’ from a person using BFS.
  • P2P Networks: In P2P (Peer to Peer) Networks like BitTorrent, BFS is used to find all neighbor nodes from a given node.
  • Search Engine Crawlers: The main idea behind crawlers is to start from source page and follow all links from that source to other pages and keep repeating the same. DFS can also be used here, but Breadth First Traversal has the advantage in limiting the depth or levels traversed.

Breadth First Search Algorithm

  • BFS is a traversing algorithm where we start traversing from a selected source node layerwise by exploring the neighboring nodes.
  • The data structure used in BFS is a queue and a graph. The algorithm makes sure that every node is visited not more than once.
  • BFS follows the following 4 steps:
  1. Begin the search algorithm, by knowing the key which is to be searched. Once the key/element to be searched is decided the searching begins with the root (source) first.
  2. Visit the contiguous unvisited vertex. Mark it as visited. Display it (if needed). If this is the required key, stop. Else, add it in a queue.
  3. On the off chance that no neighboring vertex is discovered, expel the first vertex from the Queue.
  4. Repeat step 2 and 3 until the queue is empty.
  • The above algorithm is a search algorithm that identifies whether a node exists in the graph. We can convert the algorithm to traversal algorithm to find all the reachable nodes from a given node.

Note:

  • If the nodes are not marked as visited, then we might visit the same node more than once and we will possibly end up in an infinite loop.

BFS Algorithm Example

Consider the following graph structure where S is the Source node to begin BFS with:

The goal here is to find whether the node E is present in the graph. Just by seeing the graph, we can say that node E is not present. Lets see how BFS works to identify this.

Step 1: We enqueue S to the QUEUE.

Step 2: Mark S as Visited.

Step 3: Now, call the BFS function with S in the queue.

  • Dequeue S from queue and we compare dequeued node with key E. It doesnt match. Hence, proceed by looking for the unexplored nodes from S. There exist three namely, A, B, and C.
  • We start traversing from A. Mark it as visited and enqueue.
  • After this, there are two neighboring nodes from A, i.e., B and C. We next visit B. And insert it into the queue and mark as visited.
  • The similar procedure begins with node C, and we insert it into the queue.

Step 4: Dequeue A and check whether A matches the key. It doesnt match, hence proceed by enqueueing all unvisited neighbours of A (Here, D is the unvisited neighbor to A) to the queue.

Step 5: Dequeue B and check whether B matches the key E. It doesnt match. So, proceed by enqueueing all unvisited neighbors of B to queue. Here all neighboring nodes to B has been marked visited. Hence, no nodes are enqueued.

Step 6: Dequeue C and check whether C matches the key E. It doesnt match. Enqueue all unvisited neighbors of C to queue. Here again all neighboring nodes to C has been marked visited. Hence, no nodes are enqueued.

Step 7: Dequeue D and check whether D matches the key E. It doesnt match. So, enqueue all unvisited neighbors of D to queue. Again all neighboring nodes to D has been marked visited. Hence, no nodes are enqueued.

Step 8: As we can see that the queue is empty and there are no unvisited nodes left, we can safely say that the search key is not present in the graph. Hence we return false or “Not Found” accordingly.

This is how a breadth-first search works, by traversing the nodes levelwise. We stop BFS and return Found when we find the required node (key). We return Not Found when we have not found the key despite of exploring all the nodes.

Pseudocode of Breadth First Search

Recursive BFS

/**
* Pseudo code for recursive BFS
* @Parameters: Graph G represented as adjacency list,
* Queue q, boolean[] visited, key
* Initially q has s node in it.
*/
recursiveBFS(Graph graph, Queue q, boolean[] visited, int key){
if (q.isEmpty())
return "Not Found";
// pop front node from queue and print it
int v = q.poll();
if(v==key) return "Found";
// do for every neighbors of node v
for ( Node u in graph.get(v))
{
if (!visited[u])
{
// mark it visited and push it into queue
visited[u] = true;
q.add(u);
}
}
// recurse for other nodes
recursiveBFS(graph, q, visited, key);
}
Queue q = new Queue();
q.add(s);
recursiveBFS(graph, q, visited, key);

Iterative BFS

/**
* Pseudo code for iterative BFS
* @Parameters: Graph G, source node s, boolean[] visited, key
*/
iterativeBFS(Graph graph, int s, boolean[] visited, int key){
// create a queue neeeded for BFS
Queue q = Queue();
// mark source node as discovered
visited[s] = true;
// push source node into the queue
q.add(s);
// while queue isnt empty
while (!q.isEmpty())
{
// pop front node from queue and print it
v = q.poll();
if(v==key) return "Found";
// for every neighboring node of v
for (int u : graph.get(v)) {
if (!visited[u]) {
// mark it visited and enqueue to queue
visited[u] = true;
q.add(u);
}
}
}
//If key hasnt been found
return "Not Found";
}

Implementation:

BFS Implementation in C++
#include<bits/stdc++.h>
using namespace std;
#define MAX 1005
vector <int> adj[MAX]; // adjacency matrix, where adj[i] is a list, which denotes there are edges from i to each vertex in the list adj[i]
bool visited[MAX]; // boolean array, hacing value true / false, which denotes if a vertex 'i' has been visited or not.
void init(){ // initialization function
int i;
for(i = 0; i < MAX; i++){
visited[i] = false; // marking all vertex as unvisited
adj[i].clear(); // clearing all edges
}
}
void BFS(int start){

init();
queue <int> q;
q.push(start);

int iterator, current_node, next_node;
cout<<"BFS Traversal is:\n";
while(q.empty() == false){
current_node = q.front();
q.pop();

if(visited[current_node] == true){
continue;
}

cout<<current_node<<" ";
visited[current_node] = true;

for(iterator = 0; iterator < adj[current_node].size(); iterator++){
next_node = adj[current_node][iterator];

if(visited[next_node] == false) {
q.push(next_node);
}
}
}

}
int main(){
int vertices, edges;
cout<<"Enter Number of Vertices:\n";
cin>>vertices;
cout<<"Enter number of edges:\n";
cin>>edges;

int i;
int source, destination;
cout<<"Enter Edges as (source) <space> (destination):\n";
for(i=0; i<edges; i++){
cin>>source>>destination;
if(source > vertices || destination > vertices){
cout<<"Invalid Edge";
i--;
continue;
}
adj[source].push_back(destination);
adj[destination].push_back(source);
}

int start;
cout<<"Enter Starting Vertex:";
cin>>start;

BFS(start);
}

Complexity Analysis of Breadth First Search

Time Complexity

  • The time complexity of BFS if the entire tree is traversed is O(V) where V is the number of nodes.

If the graph is represented as adjacency list:

  • Here, each node maintains a list of all its adjacent edges. Let’s assume that there are V number of nodes and E number of edges in the graph.
  • For each node, we discover all its neighbors by traversing its adjacency list just once in linear time.
  • For a directed graph, the sum of the sizes of the adjacency lists of all the nodes is E. So, the time complexity in this case is O(V) + O(E) = O(V + E).
  • For an undirected graph, each edge appears twice. Once in the adjacency list of either end of the edge. The time complexity for this case will be O(V) + O (2E) ~ O(V + E).

If the graph is represented as an adjacency matrix (a V x V array):

  • For each node, we will have to traverse an entire row of length V in the matrix to discover all its outgoing edges.
  • Note that each row in an adjacency matrix corresponds to a node in the graph, and that row stores information about edges emerging from the node. Hence, the time complexity of BFS in this case is O(V * V) = O(V2).
  • The time complexity of BFS actually depends on the data structure being used to represent the graph.

Space Complexity

  • Since we are maintaining a priority queue (FIFO architecture) to keep track of the visited nodes, in worst case, the queue could take upto the size of the nodes(or vertices) in the graph. Hence, the space complexity is O(V).

FAQs

Why do we prefer queues instead of other data structures while implementing BFS?

  • BFS searches for nodes levelwise, i.e. it searches the nodes w.r.t their distance from the root (or source).
  • From the above example, we could see that BFS required us to visit the child nodes in order their parents were discovered.
  • Whenever we visit a node, we insert all the neighboring nodes into our data structure. If a queue data structure is used, it guarantees that, we get the nodes in order their parents were discovered as queue follows the FIFO (first in first out) flow.

Why is time complexity more in the case of graph being represented as Adjacency Matrix?

  • Every time we want to find what are the edges adjacent to a given node ‘U’, we have to traverse the whole array AdjacencyMatrix[U], which is of length |V|.
  • During BFS, you take a starting node S, which is at level 0. All the adjacent nodes are at level 1. Then, we mark all the adjacent nodes of all vertices at level 1, which don’t have a level, to level 2. So, every vertex will belong to one level only and when an element is in a level, we have to check once for its adjacent nodes which takes O(V) time. Since the level covers V elements over the course of BFS, the total time would beO(V * V) which is O(V2).
  • In short, for the case of Adjacency Matrix, to tell which nodes are adjacent to a given vertex, we take O(V) time, irrespective of edges.
  • Whereas, when Adjacency List is used, it is immediately available to us and it just takes time complexity proportional to adjacent nodes itself, which upon summation over all nodes V is E. So, BFS when using Adjacency List gives O(V + E).

0–1 BFS

  • This type of BFS is used to find shortest distance or path from a source node to a destination node in a graph with edge values 0 or 1.
  • When the weights of edges are 0 or 1, the normal BFS techniques provide erroneous results because in normal BFS technique, its assumed that the weight of edges would be equal in the graph.
  • In this technique, we will check for the optimal distance condition instead of using bool array to mark visited nodes.
  • We use double ended queue to store the node details. While performing BFS, if we encounter a edge having weight = 0, the node is pushed at front of double ended queue and if a edge having weight = 1 is found, the node is pushed at back of double ended queue.
  • The above approach is similar to Dijkstra’s algorithm where if the shortest distance to node is relaxed by the previous node then only it will be pushed in the queue.

Pseudo Code:

Analysis:

  • The approach is quite similar to BFS + Dijkstra combined.
  • But the time complexity of this code is O(E + V), which is linear and more efficient than Dijkstra algorithm.
  • The analysis and proof of correctness for this algorithm is also same as that of normal BFS.

Why can’t we use normal queue in 0–1 BFS technique?

The normal queue lacks methods which helps us to perform the below functions necessary for performing 0–1 BFS:

  • Removing Top Element (To get vertex for BFS)
  • Inserting at the beginning of queue
  • Inserting at the end

All the above operations are supported in Double ended Queue data structure and hence we go for that.

What are the classifications of edges in a BFS graph?

  • For the given graph below, the general types of edges are as follows:

Tree Edge: The edge which is present in the tree obtained after applying DFS on the graph. All the Green edges are tree edges.

Forward Edge: Edge (u, v) such that v is descendant but not part of the DFS tree. Edge from node 1 to node 6 is a forward edge.

Back edge:

  • Edge (u, v) such that v is ancestor of edge u but not part of DFS or BFS tree. Edge from node 4 to node 1 is a back edge.
  • Presence of back edge indicates a cycle in the directed graph.

Cross Edge: It is a edge which connects two nodes such that they do not have any ancestor and a descendant relationship between them. Edge from node 3 to node 2 is a cross edge.

What are the types of edges present in BFS of a directed graph?

  • A BFS of a directed graph has only Tree Edge, Cross Edge and Back Edge.
  • The BFS has NO Forward edges.
  • For Edge A->B as forward edge, node B should have been visited before the edge A-B is discovered and this can happen only when B is visited via some other node using more than one edge.
  • As BFS finds shortest path from source by using optimal number of edges, when node A is enqueued, edge A-B will have been discovered and would be marked as a tree or cross edge. Hence, forward edges is never possible in BFS.

Can BFS be used for finding shortest possible path?

  • Yes. BFS is mostly used for finding shortest possible path.

What is the difference between DFS and BFS? When is DFS and BFS used?

  • BFS is less space efficient than DFS as BFS maintains a priority queue of the entire level while DFS just maintains a few pointers at each level by using simple stack.
  • If it is known priorly that an answer will likely be found far into a tree (depths of tree), DFS is a better option than BFS.
  • BFS is useful when the depth of the tree can vary or when a single answer is needed. For instance, the shortest path in a maze. BFS will perform better here because DFS is most likely to start out on a wrong path, exploring a large portion of the maze before reaching the goal.
  • If it is known that the solution is not far from the root of the tree, a breadth first search (BFS) might be better.
  • If the tree is very deep and solutions are rare, depth first search (DFS) might take an extremely long time, but BFS could be faster.
  • If the tree is very wide, a BFS might need too much memory, so it might be completely impractical. We go for DFS in such cases.
  • If solutions are frequent but located deep in the tree we opt for DFS.

Is BFS a complete algorithm?

  • A search algorithm is said to be complete if at least one solution exists then the algorithm is guaranteed to find a solution in a finite amount of time.
  • BFS is complete.

Is BFS a optimal algorithm?

  • A search algorithm is optimal if it finds a solution, it finds that in the best possible manner.
  • BFS is optimal which is why it is being used in cases to find single answer in optimal manner.

Depth First Search

Depth First Search (commonly called as DFS) was first studied in the 19th century by French mathematician Charles Pierre Trémaux as a strategy for solving mazes. It is one of the most commonly preferred algorithms used for traversing or search in tree or graph data structures by using the idea of backtracking. DFS is also known as Depth First Traversal in case we are using the algorithm in tree data structures (Note: A tree is a special kind of graph with no cycles). The algorithm starts at a node of a graph and goes as far as possible in a given path, then backtracks until it finds an unexplored path, and then explores the rest of it. The process is repeatedly followed until all the nodes in the graph are explored.

Application

DFS algorithm works in the following way

Most of the concepts in computer science can be visualized and represented in terms of graph data structure. DFS is one such useful algorithm for analysing these problems easily.

  • Solving maze-like puzzles with only one solution: DFS can be used to find all solutions to a maze problem by only considering nodes on the current path in the visited set.
  • Topological Sorting: This is mainly used for scheduling jobs from the given dependencies among jobs. DFS is highly preferred approach while finding solutions to the following type of problems using Topological Sort:
  • instruction/job scheduling
  • ordering of formula cell evaluation when recomputing formula values in spreadsheets
  • determining the order of compilation tasks to perform in makefiles
  • data serialization
  • resolving symbol dependencies in linkers
  • Mapping Routes and Network Analysis
  • Path Finding: DFS is used for finding path between two given nodes — source and destination — in a graph.
  • Cycle detection in graphs

Depth First Search Algorithm

  • The DFS algorithm is the search algorithm which begins the searching from the root node and goes down till the leaf of a branch at a time looking for a particular key. If the key is not found, the searching retraces its steps back to the point from where the other branch was left unexplored and the same procedure is repeated for that branch.
  • DFS follows the following 3 steps:
  1. Visit a node “S”.
  2. Mark “S” as visited.
  3. Recursively visit every unvisited node attached to “S”.
  • Since DFS is of recursive nature, this can be implemented using stacks. (Refer the FAQ section to understand why we use stacks)
  • Implementation:
  1. Push the source node to the stack.
  2. Maintain a data structure to mark if a node is visited or not, say,

boolean[] visited = new boolean[total_nodes_in_graph]

//Default value will be false for all the elements

  1. Mark source node S as visited: visited[S] = True
  2. While stack is not empty repeat steps 5–8 below
  3. Pop node, say, A from the stack
  4. If visited[A] is True go to step 5, else go to step 7.
  5. Mark visited[A] = True.
  6. Check if A is the node that we need. If yes, break dfs. Else, push all the adjacent nodes of A which are not visited into the stack.
  • In this way, we will traverse a path until it is exhausted. Meaning, either there are no adjacent nodes or all the adjacent nodes are already visited.

Note:

  • The “adjacent nodes” mean the “neighbors” of the selected node.
  • If the nodes are not marked as visited, then we might visit the same node more than once and we will possibly end up in an infinite loop.

DFS Algorithm Example

Consider the following graph structure where S is the Source node to begin DFS with:

The goal here is to find whether the node E is present in the graph. Just by seeing the graph, we can say that node E is not present. Lets see how DFS works to identify this.

Step 1: We push S to the STACK

Step 2: Mark S as Visited.

Step 3: While the stack is not empty, repeat the below steps:

  • Pop the top element i.e., node S out of STACK ==> STACK = [ ]
  • Is the popped element equal to the node element we were looking for? If yes, return true. Here S != E. Hence, continue with the below steps.
  • Push all the adjacent nodes of S which are not visited yet into STACK.
  • We push nodes C, B, and A into STACK. STACK = [C, B, A]
  • Pop the top element i.e., node A out of STACK.
  • Is the popped element equal to the node element we were looking for? If yes, return true. Here A != E. Hence, we proceed with the below steps.
  • Mark A as visited. Stack now becomes STACK = [C, B]
  • Push all the adjacent nodes of A which are not visited yet into STACK.
  • We push node D into STACK and stack now hasSTACK = [C, B, D]
  • Pop the top element i.e., node D out of STACK. Mark D as visited. After popping D, stack is now: STACK = [C, B]
  • Is the popped element equal to the node element we were looking for? If yes, return true. Here A != E. Hence, we proceed with the below steps.
  • Push all the adjacent nodes of D which are not visited yet into STACK.
  • We push nodes C and B into STACK. STACK = [C, B, C, B]
  • Pop the top element i.e., node B out of STACK.
  • Is the popped element equal to the node element we were looking for? If yes, return true. Here B != E. Hence proceed with the below steps.
  • Mark B as visited. STACK = [C, B, C]
  • Push all the adjacent nodes of B which are not visited yet into STACK.
  • There are no adjacent nodes of B which are not visited. So stack will still be STACK = [C, B, C]
  • Pop the top element i.e., node C out of STACK.
  • Is the popped element equal to the node element we were looking for? If yes, return true. Here C != E. Hence proceed with the below steps.

Mark C as visited. Stack becomes STACK = [C, B]

  • Push all the adjacent nodes of C which are not visited yet into STACK.
  • There are no adjacent nodes of C which are not visited. Hence stack will remain: STACK = [C, B]
  • Now, we pop B from STACK and see that it was visited earlier. After popping B, stack is STACK = [C]
  • We continue with the next iteration and pop C from STACK and see that it was visited earlier. Stack now become s empty: STACK = [ ]
  • STACK is now empty and hence we stop dfs as all the nodes have been visited and we havent found any nodes matching to node E. Hence we return false.

This is how a depth-first search works, by traversing the nodes depth-wise. We stop DFS and return true when we find the required node (key). We return false when we have not found the key despite of exploring all the nodes.

Pseudocode of Depth First Search

Pseudocode of recursive DFS

/**
* Pseudo code for recursive DFS
* @Parameters: adjacent list G, source node,
* visited array, key (node to be searched)
*/
DFS(adjacent[][], source, visited[], key) {
if(source == key) return true //We found the key
visited[source] = True

FOR node in adjacent[source]:
IF visited[node] == False:
DFS(adjacent, node, visited)
END IF
END FOR
return false // If it reaches here, then all nodes have been explored
//and we still havent found the key.
}

Pseudocode of iterative DFS

/**
* Pseudo code for iterative DFS
* @Parameters: Graph G, source node s, key
*/
DFS(G, s, key):
stack = new Stack()
stack.push( s ) //Push s to stack
mark s as visited
while ( stack is not empty):
//Pop node from stack and start to visit its children
v = stack.pop()

if(v == key) return true //We found the key

//Push all the unvisited neighbours of v to stack
for all neighbours w of v in Graph G:
//unvisited neighbors
if w is not visited :
stack.push( w )
mark w as visited
return false // If it reaches here, then all nodes have been explored
//and we still havent found the key.

Implementation Of Depth-first Search

Implementation of Depth-first Search in C++:
#include <iostream>
#include <vector>
void dfs(int u, vector<vector<int> > &adj, vector<bool> &visited) {
visited[u] = true;
for(int v : adj[u])
if(!visited[v])
dfs(v, adj, visited);
cout << “Done visiting node: “ << u + 1 << endl;
}
int main() {
int nodes, edges, u, v;
cin >> nodes >> edges;
vector<vector<int> > adj(nodes);
vector<bool> visited(nodes);
for(int i = 1; i <= edges; ++i) {
cin >> u >> v;
--u, --v; //there is an undirected edge between u and v (0 based)
adj[u].push_back(v);
adj[v].push_back(u);
}
dfs(0, adj, visited); //assume root is always node 0
return 0;
}

Complexity Analysis of Depth First Search

Time Complexity

The time complexity of DFS if the entire tree is traversed is O(V) where V is the number of nodes.

If the graph is represented as adjacency list:

  • Here, each node maintains a list of all its adjacent edges. Let’s assume that there are V number of nodes and E number of edges in the graph.
  • For each node, we discover all its neighbors by traversing its adjacency list just once in linear time.
  • For a directed graph, the sum of the sizes of the adjacency lists of all the nodes is E. So, the time complexity in this case is O(V) + O(E) = O(V + E).
  • For an undirected graph, each edge appears twice. Once in the adjacency list of either end of the edge. The time complexity for this case will be O(V) + O (2E) ~ O(V + E).

If the graph is represented as an adjacency matrix (a V x V array):

  • For each node, we will have to traverse an entire row of length V in the matrix to discover all its outgoing edges.
  • Note that each row in an adjacency matrix corresponds to a node in the graph, and that row stores information about edges emerging from the node. Hence, the time complexity of DFS in this case is O(V * V) = O(V2).

The time complexity of DFS actually depends on the data structure being used to represent the graph.

Space Complexity

  • Since we are maintaining a stack to keep track of the last visited node, in worst case, the stack could take upto the size of the nodes(or vertices) in the graph. Hence, the space complexity is O(V).

FAQs

Why can we not implement DFS using Queues? Why do we prefer stacks instead of other data structures?

In DFS, we always want to find the last node we visited to go in the other unvisited branches of that node in the most efficient way possible.

If we are using queue (First in First out) data structure:

  • To retrieve the last node visited, first, we will have to take all the elements out of the queue.
  • We will also need to store these elements except the last one so that we can push them back in the queue.
  • The last element taken out of the queue will be the last visited node.
  • We now have to push all the elements taken out back to queue.

The way queue works will give us the last visited node in O(n) operations as that will be the last element inserted in the queue.

Stacks on the other hand, work in “Last In First Out” fashion. It will give the last node visited in O(1) operations as that node will be the top element of the stack and we only have to pop it. Hence, we go for stacks.

What are the classifications of edges in DFS of a graph?

  • We have 4 kinds of edges in DFS of a graph.
  • Consider a directed graph as shown in the diagram below, DFS of the below graph is 1 2 4 3 5 6. If DFS is applied on this graph, a tree is obtained which is represented and connected by means of using green edges. The types of edges are as follows:
  • Tree Edge: The edge which is present in the tree obtained after applying DFS on the graph. All the Green edges are tree edges.
  • Forward Edge: Edge (u, v) such that v is descendant but not part of the DFS tree. Edge from node 1 to node 6 is a forward edge.
  • Back edge:
  • Edge (u, v) such that v is ancestor of edge u but not part of DFS tree. Edge from node 4 to node 1 is a back edge.
  • Presence of back edge indicates a cycle in the directed graph.
  • Cross Edge: It is a edge which connects two nodes such that they do not have any ancestor and a descendant relationship between them. Edge from node 3 to node 2 is a cross edge.

Can DFS be used for finding shortest possible path?

No.

Why can we not use DFS for finding shortest possible path?

  • In the implementation of DFS, there is no deterministic rule on what order the next children/neighbor to be explored is considered.
  • DFS does not guarantee that if node 1 is visited before another node 2 starting from a source vertex. It can not identify what node is closer to the source node.
  • DFS just visits the ‘deeper’ nodes in any order. It can even be farther from source nodes. In the worst case, it might go as far as possible from the source node and then returns to unvisited adjacent nodes of visited nodes.
  • Due to this, DFS is not a reliable choice to find the shortest path between the nodes.

Is DFS a complete algorithm?

  • A search algorithm is said to be complete if at least one solution exists then the algorithm is guaranteed to find a solution in a finite amount of time.
  • DFS is complete if the search tree is finite, meaning for a given finite search tree, DFS will come up with a solution if it exists.

Is DFS a optimal algorithm?

  • A search algorithm is optimal if it finds a solution, it finds that in the best possible manner.
  • DFS is not optimal, meaning the number of steps in reaching the solution, or the cost spent in reaching it is high.

When is it best to use DFS?

The usage of DFS heavily depends on the structure of the search tree/graph and the number and location of solutions needed.

  • If it is known that the solution is not far from the root of the tree, a breadth first search (BFS) might be better.
  • If the tree is very deep and solutions are rare, depth first search (DFS) might take an extremely long time, but BFS could be faster.
  • If the tree is very wide, a BFS might need too much memory, so it might be completely impractical. We go for DFS in such cases.
  • If solutions are frequent but located deep in the tree we opt for DFS.

Dijkstra Algorithm

The algorithm was developed by a Dutch computer scientist Edsger W. Dijkstra in 1956. It is used to find the shortest path between a node/vertex (source node) to any (or every) other nodes/vertices (destination nodes) in a graph. A graph is basically an interconnection of nodes connected by edges. This algorithm is sometimes referred to as Single Source Shortest Path Algorithm due to its nature of implementation.

Working of Dijkstra Algorithm

Before diving into any algorithm, its very much necessary for us to understand what are the real world applications of it. Majority of the problems that we encounter in real life scenarios deals with finding solutions to shortest path based problems. To explain in simple words, you want to travel from city A to city B. Both cities are connected by multiple routes. What route do we generally prefer? There is no doubt that we would opt for the route which can make us reach our destination with minimum possible cost and time! Isn’t this relatable?
Following are the main applications of Dijkstra’s Algorithm:

  1. It is most widely used in finding shortest possible distance and show directions between 2 geographical locations such as in Google Maps.
  2. This is also widely used in routing of data in networking and telecommunication domains for minimizing the delay occurred for transmission.
  3. Wherever you encounter the need for shortest path solutions be it in robotics, transportation, embedded systems, factory or production plants to detect faults, etc this algorithm is used.

Working of Dijkstra’s Shortest Path First algorithm

In order to find the shortest path, Dijkstra’s algorithm mainly allocates a “cost” value taken to reach the destination vertex from the source vertex. The “cost” can be mapped to disance, money or time taken to reach from source to a destination.

Below are the steps to be followed for solving using Dijkstra’s algorithm:

  1. Convert any problem to its graph equivalent representation.
  2. Maintain a list of unvisited vertices. Assign a vertex as “source” and also allocate a maximum possible cost (infinity) to every other vertex. The cost of the source to itself will be zero as it actually takes nothing to go to itself.
  3. In every step of the algorithm, it tries to minimize the cost for each vertex.
  4. For every unvisited neighbor (V2, V3) of the current vertex (V1) calculate the new cost from V1.
  5. The new cost of V2 is calculated as :
Minimum( existing cost of V2 , (sum of cost of V1 + the cost of edge from V1 to V2) )

7. When all the neighbors of the current node are visited and cost has been calculated, mark the current node V1 as visited and remove it from the unvisited list.

8. Select next vertex with smallest cost from the unvisited list and repeat from step 4.

9. The algorithm finally ends when there are no unvisited nodes left.

Problem statement in shortest path

Consider the map below. The cities have been selected and marked from alphabets A to F and every edge has a cost associated with it.
We need to travel from Bengaluru to all other places and we have to identify what are the shortest paths with minimal cost from Bengaluru to other destinations.

Solution for shortest path algorithm

1. Convert problem to its graph equivalent.

Upon conversion, we get the below representation. Note that the graph is weighted and undirected. All the cities have been replaced by the alphabets associated with it and the edges have the cost value (to go from one node to other) displayed on it.

2. Assign cost to vertices.

Assign cost of 0 to source vertex and ∞∞ (Infinity) to all other vertices as shown in the image below.
Maintain a list of unvisited vertices. Add all the vertices to the unvisted list.

3. Calculate minimum cost for neighbors of selected source.

For each neighbor A, C and D of source vertex selected (B), calculate the cost associated to reach them from B using the formula. Once this is done, mark the source vertex as visited (The vertex has been changed to blue to indicate visited).

Minimum(current cost of neighbor vertex, cost(B)+edge_value(neighbor,B))
  1. For neighbor A: cost = Minimum(∞∞ , 0+3) = 3
  2. For neighbor C: cost = Minimum(∞∞ , 0+1) = 1
  3. For neighbor D: cost = Minimum(∞∞ , 0+6) = 6

4. Select next vertex with smallest cost from the unvisited list.

Choose the unvisited vertex with minimum cost (here, it would be C) and consider all its unvisited neighbors (A,E and D) and calculate the minimum cost for them.

Once this is done, mark C as visited.

Minimum(current cost of neighbor vertex, cost(C)+edge_value(neighbor,C))
  1. For neighbor A: cost = Minimum(3 , 1+2) = 3
  2. For neighbor E: cost = Minimum(∞∞, 1+4) = 5
  3. For neighbor D: cost = Minimum(6 , 1+4) = 5

Observe that the cost value of node D is updated by the new minimum cost calculated.

5. Repeat step 4 for all the remaining unvisited nodes.

Repeat step 4 until there are no unvisited nodes left. At the end of the execution, we will know the shortest paths from the source vertex B to all the other vertices. The final state of the graph would be like below.

Pseudo Code of Dijkstra algorithm

Dijkstra_Algorithm(source, G):
"""
parameters: source node--> source, graph--> G
return: List of cost from source to all other nodes--> cost
"""
unvisited_list = [] // List of unvisited verticesvertices
cost = []
cost[source] = 0 // Distance (cost) from source to source will be 0
for each vertex v in G: // Assign cost as INFINITY to all vertices
if v ≠ source
cost[v] = INFINITY
add v to unvisited_list // All nodes pushed to unvisited_list initially
while unvisited_list is not empty: // Main loop
v = vertex in unvisited_list with min cost[v] // v is the source node for first iteration
remove v from unvisited_list // Marking node as visited
for each neighbor u of v: // Assign shorter path cost to neigbour u
cost_value = Min( cost[u], cost[v] + edge_cost(v, u)]
cost[u] = cost_value // Update cost of vertex u
return cost

Complexity Analysis of Dijsktra algorithm

Consider there are V number of vertices in a graph. Then by definition, there would be |V-1| number of edges.

  1. The main outer loop runs for |V| times
  2. The inner loop meant where actual cost calculation happens, runs for |V-1| times for a complete graph as each vertex has |V-1| edges.
  3. Also, for each iteration of the inner loop we do an extractMin and a decreaseKey operation for the vertex.

Hence the total running time will have an upper bound of O(|V| * |V-1|) which is equivalent to O(|V|2)

Note

  1. We can further reduce the time complexity of this algorithm by using Binary Heap as data structure for Priority Queue implementation instead of list.
  2. The priority queue implementation is for efficiently finding the node with minimum cost and then updating the cost value associated with the node.
  3. With this, the time complexity will be O((E+V)*LogV) = O(ELogV) where E is the number of edges and V is the number of vertices in a graph

Proof of Correctness

How can we be sure that Dijkstra’s algorithm provides us the shortest possible path between two nodes? Let’s go through the following explanation to understand whether this algorithm always gives us the shortest possible path.
Consider the following notations:

D(s,u) = the minimum distance as calculated by Dijkstra's algorithm between nodes s and u
d(s,u) = the actual minimum distance between nodes s and u

According to Dijkstra’s Algorithm, D(s,u) = d(s,u)

Proof:

Let us start by assuming that Dijkstra’s Algorithm is incorrect.

  • This means there would be some vertices left when a vertex u is included into the Visited List which indicates => D(s,u) > d(s,u)

Let x be the first of these vertices that was pushed into the Visited List. Then,

  • When node x is included into the Visited List: D(s,x) > d(s,x)
  • This implies that all previous vertices, say z that were included into the Visited List signifies D(s,z) = d(s,z)
  • Let’s look at the moment when this node x is included into the Visited List.
  • Let P be the (real) shortest path from s to x
  • Let z be the first node that is not in the “Visited List” and is along the shortest path.
  • Let y be the predecessor node of z on the shortest path P
  • Have a look at the diagram below for better understanding:
  • We can conclude the following:
  • D(s,y) = d(s,y) … (1) (This means the minimum distance s to y computed by the algorithm = actual min. distance s to y as y is included before x)
  • D(s,z) = D(s,y) + edge_cost(y,z) = d(s,y) + edge_cost(y,z) … (2)
  • D(s,x) ≤ D(s,z) …(3) (Because the next vertex included by the algorithm is x)

We can now finally deduce that:

  • D(s,x) ≤ D(s,z) ... From (3) = d(s,y) + edge_cost(y,z) ... From (2) ≤ d(s,y) + edge_cost(y,z) + d(z,x) = d(s,x)

The above result contradicts the assumed fact of Dijkstra’s algorithm being incorrect earlier.

Hence, by proof of contradiction, we can say that Dijkstra’s algorithm always gives us the shortest possible path between 2 nodes which is: D(s,x) should be equal to d(s,x)

Additional Information

  1. The example demo was done for undirected graph. However, Dijkstra’s Algorithm can also be used for directed graphs as well.
  2. The pseudo code finds the shortest path from source to all other nodes in the graph. If we want it to be from a source to a specific destination, we can break the loop when the target is reached and minimum value is calculated.
  3. The shortest path might not pass through all the vertices. Also, there can be more than one shortest path between two nodes.
  4. Dijkstra’s Algorithm doesnt work for graphs with negative edges. Algorithms like Bellman-Ford Algorithm will be used for such cases.

--

--

Crack FAANG

Dive into our tips on tech interviews and industry insights. Follow for success in your tech career!