What THEY Do Not Teach YOU In Data Science! - Data Structures Problem #3 By Anish Arya (Machine Learning and Reinforcement Learning Consultant)

Hello Aspiring Real Data and Decision Scientists,

In case you missed, the introduction to the series of problems, please go through the following discussion:
.
.

What THEY Do Not Teach YOU In Data Science! - By Anish Arya (Machine Learning and Reinforcement Learning Consultant)

.
.
PROBLEM #3 STATEMENT: Problem Level: Hard

The following is a Graph problem which will test the knowledge of Hash tables and node traversing techniques.

The CEO of a leading airline company wants to know if his flights are servicing all major cities /airports such that if a passenger wants to travel from a given starting city, they can reach any of the other cities using either a direct flight (0 stops) or an indirect flight (multiple stops at intermediate cities).

If there are any cities that are not reachable, the CEO wants to know the minimum number of flights (oneway flights) that need to be added so that the passenger can reach any destination city from the starting city using either direct or indirect flights.

You are given a list of city airports (three-letter codes like “JFK”), a list of routes (one-way flights from one airport to another like [“JFK” , “SFO”] ), and a starting city / airport.

Note that routes only allow you to fly in one direction; for instance, the route [“JFK”, “SFO”] only allows you to fly from “JFK” to “SFO” and not from SFO to JFK

Also note that the connections don’t have to be direct (take off at the starting airport and land in the destination airport).

It is okay if the destination airport can be reached from the starting airport by stopping at other intermediary airports in between.

Sample Input:

airports = [ “BGI”, “CDG”, “DEL”, “DOH”, “DSM”, “EWR”, “EYW”, “HND”, “ICN”, “JFK”, “LGA”, “LHR”, “ORD”, “SAN”, “SFO”, “SIN”, “TLV”, “BUD” ]

routes = [ [“DSM”, “ORD”], [“ORD”, “BGI”], [“BGI”, “LGA”], [“SIN”, “CDG”], [“CDG”, “SIN”], [“CDG”, “BUD”], [“DEL”, “DOH”], [“DEL”, “CDG”], [“TLV”, “DEL”], [“EWR”, “HND”], [“HND”, “ICN”], [“HND”, “JFK”], [“ICN”, “JFK”], [“JFK”, “LGA”], [“EYW”, “LHR”], [“LHR”, “SFO”], [“SFO”, “SAN”], [“SFO”, “DSM”], [“SAN”, “EYW”] ]

Starting Airport = LGA

Sample Output:

The minimum flights that need to be added = 3

The flights that need to be added are: [LGA, TLV][LGA, SFO][LGA, EWR]

Will post the answer by eod.

Programming Language Used: Python

Reference Code Notations/Short Forms Used In Program:

v = Number of Vertices or Nodes or Airports

e = Number of Edges or Routes

apt = Airport

apts = List of Airports

rt = Route

rts = List of Routes

connects = Connections or Routes

connect = Connection or Route

COMPLEXITIES:

Overall Time Complexity:

O (v * (v + e) + v + e + (v * log(v)))

Overall Space Complexity:

O (v + e)

Method 1:

``````#airportConnections.... Primary method

def aptConnects(apts, rts, startingApt):

""" This method will return the minimum number of flights (one-way flights) that need to be added so that the passenger can reach any destination city from the

starting city using either direct or indirect flights."""

# If airports list is empty, then return "No Input Airport List"

if len(apts) == 0:

return "No Input Airport List"

# If starting airport is not provided, then return "No Starting Airport Provided."

if startingApt is None:

return "No Starting Airport Provided"

# Firstly, we gonna create a graph from the airports and routes list. And for this, we gonna use a helper method

aptGraph = createAptGraph(apts, rts)

# Secondly, we'll gather all of the unreachable airports, then assign the score value(= the number of unreachable airports that we can reach from this unreachable airport)

# In this helper method, we gonna pass our airport graph to fetch the unreachable nodes, the list of airports, and the starting airport

unreachableAptNodes = fetchUnreachableAptNodes(aptGraph, apts, startingApt) # Fetching all of the unreachable airports

scoreUnreachableNodesFromUnreachableNodes(aptGraph, unreachableAptNodes)

#Finally, return the minimum number of flights/connections  that need to be added

return fetchMinNumberOfNewConnects(aptGraph, unreachableAptNodes, startingApt)
``````

Method 2:

``````# Time Complexity = O(vlog(v) + (v + e)) ||| Space Complexity = O(1) --> Constant Space

# Where v is the number of vertices/nodes/airports and e is the number of edges/routes

def fetchMinNumberOfNewConnects(aptGraph, unreachableAptNodes, startingApt):

# Sorting (in decreasing order) the unreachable nodes by their score and node/airport name (# of unreachable connections that they have)

unreachableAptNodes.sort(key = lambda apt: (len(apt.unreachableConnects), apt.apt), reverse = True)

numOfNewConnections = 0

reachNewApts = []

for aptNode in unreachableAptNodes:

if aptNode.isNodeReachable:

# this is because sometimes, we're gonna be getting unreachable connections from a given unreachable airport and mark them as handled/unlocked

continue

numOfNewConnections += 1

reachNewApts.append([startingApt, aptNode.apt])

for connection in aptNode.unreachableConnects:

aptGraph[connection].isNodeReachable = True

return (numOfNewConnections, reachNewApts)
``````

Method 3

``````# Time Complexity = O(v + e) ||| Space Complexity = O(v + e)

#Where v is the number of vertices/nodes/airport and e is the number of edges/routes

def createAptGraph(apts, rts):

"""We gonna create a graph from the airport(s) and route(s) list."""

aptGraph = {} # We gonna use it as a hash table, wherin every key would be an actual airport name

for apt in apts:

# We probably wanna know, if an airport is reachable, the unreachable connections that an apt have, and the actual connects that apt has

# For this, am gonna create a new data structure called AirportNode

aptGraph[apt] = AptNode(apt)

# Fill up the airport connections based on routes

for rt in rts:

# ["DSM", "ORD"]

apt, connection = rt

aptGraph[apt].connections.append(connection) # We gonna add the connection in the parent

return aptGraph
``````

Method 4:

``````# Time Complexity = O(v + e) ||| Space Complexity = O(v)

#Where v is the number of vertices/nodes/airports and e is the numberof edges/routes

def fetchUnreachableAptNodes(aptGraph, apts, startingApt):

"""We gonna keep a track of visited nodes using Depth First Search"""

visitedApts = {} # It's just gonna be a hash table E.g.; {"BGI": True, '...': True}. In this every value would be true because we are keeping the track of visited nodes only

depthFirstTraversalOfApts(aptGraph, startingApt, visitedApts)

unreachableAptNodes = []

for apt in apts:

if apt in visitedApts:

continue # so, if an apt is an already visited one, then skip

else:

aptNode = aptGraph[apt]

aptNode.isNodeReachable = False

unreachableAptNodes.append(aptNode)

return unreachableAptNodes
``````

Method 5

``````def depthFirstTraversalOfApts(aptGraph, apt, visitedApts):

"""DFS and mark existing nodes as 'visited'."""

if apt in visitedApts:

# if an airport/node is already visited then skip it

return

visitedApts[apt] = True # mark the apt as visited by assigning a bool True value

# For every connection, in whatever airport we are visiting, visit that connection and then visit that connection's connections, and ultimately traverse every node/airport under the passed (root) airport

connections = aptGraph[apt].connections

for connection in connections: # For every connection, perform DFS

depthFirstTraversalOfApts(aptGraph, connection, visitedApts)
``````

Method 6:

``````# Time Complexity = O(v * (v + e)) ||| Space Complexity = O(v)

#Where v is the number of vertices/nodes/airports and e is the numberof edges/routes

def scoreUnreachableNodesFromUnreachableNodes(aptGraph, unreachableAptNodes):

# For every unreachable node that we have, figure out how many unreachable nodes that we gonna have

for aptNode in unreachableAptNodes:

apt = aptNode.apt # fetching the name of the airport which was initialised while creatind the node

unreachableConnects = []

# Now, we again gonna perform DFS technique to add all the unreachable connections and score them

depthFirstAppendUnreachableConnctions(aptGraph, apt, unreachableConnects, {})

aptNode.unreachableConnects = unreachableConnects
``````

Method 7:

``````def depthFirstAppendUnreachableConnctions(aptGraph, apt, unreachableConnects, visitedSetOfApts):

if aptGraph[apt].isNodeReachable:

return # if the node is reachable then skip it, because here we are only dealing with unreachable nodes

if apt in visitedSetOfApts:

return # if the node is already visited then skip it

visitedSetOfApts[apt] = True # mark the apt as visited by assigning a bool True value

unreachableConnects.append(apt)

connections = aptGraph[apt].connections

for connection in connections:

depthFirstAppendUnreachableConnctions(aptGraph, connection, unreachableConnects, visitedSetOfApts)
``````

Custom Data Structure Used For A Node/Airport:

``````class AptNode:

def __init__(self,apt):

"""This is a custom data structure of a node/airport. We'll be storing the aiport/node name just in case if we need to access the name of the airport."""

self.apt = apt

self.connections = []

self.isNodeReachable = True

self.unreachableConnects = []

# *******************************************
``````

Test Case 1

``````import AirportConnectionsAssignment

import unittest

class TestPrgm(unittest.TestCase):

def test_case_1(self):

AIRPORTS = [ "BGI", "CDG", "DEL", "DOH", "DSM", "EWR", "EYW", "HND", "ICN", "JFK", "LGA", "LHR", "ORD", "SAN", "SFO", "SIN", "TLV", "BUD" ]

STARTING_AIRPORT = "LGA"

routes = [ ["DSM", "ORD"], ["ORD", "BGI"], ["BGI", "LGA"], ["SIN", "CDG"], ["CDG", "SIN"], ["CDG", "BUD"], ["DEL", "DOH"], ["DEL", "CDG"], ["TLV", "DEL"], ["EWR", "HND"], ["HND", "ICN"], ["HND", "JFK"], ["ICN", "JFK"], ["JFK", "LGA"], ["EYW", "LHR"], ["LHR", "SFO"], ["SFO", "SAN"], ["SFO", "DSM"], ["SAN", "EYW"] ]

aptConnects = AirportConnectionsAssignment.aptConnects(AIRPORTS, routes, STARTING_AIRPORT)

self.assertTrue(aptConnects == (3, [['LGA', 'SFO'], ['LGA', 'TLV'], ['LGA', 'EWR']]))
if __name__ == '__main__':
unittest.main()

``````

Test Case 2:

``````import AirportConnectionsAssignment

import unittest

class TestPrgm(unittest.TestCase):

def test_case_2(self):

AIRPORTS = ["1", "2", "3"]

STARTING_AIRPORT = "2"

routes = [ ["1", "2"], ["2", "3"]]

aptConnects = AirportConnectionsAssignment.aptConnects(AIRPORTS, routes, STARTING_AIRPORT)

self.assertTrue(aptConnects == (1, [['2', '1']]))

if __name__ == '__main__':
unittest.main()

``````

Test Case 3:

``````import AirportConnectionsAssignment
import unittest

class TestPrgm(unittest.TestCase):

def test_case_3(self):

AIRPORTS = ["1", "2", "3"]

STARTING_AIRPORT = "3"

routes = [ ["1", "2"], ["2", "3"]]

aptConnects = AirportConnectionsAssignment.aptConnects(AIRPORTS, routes, STARTING_AIRPORT)

self.assertFalse(aptConnects == (1, [['2', '1']]))

if __name__ == '__main__':
unittest.main()

``````

Test Case 4:

``````import AirportConnectionsAssignment
import unittest

class TestPrgm(unittest.TestCase):

def test_case_4(self):

AIRPORTS = ["1", "2", "3",  "4"]

STARTING_AIRPORT = "3"

routes = [ ["1", "2"], ["2", "3"]]

aptConnects = AirportConnectionsAssignment.aptConnects(AIRPORTS, routes, STARTING_AIRPORT)

self.assertTrue(aptConnects == (2, [['3', '1'], ['3', '4']]))

if __name__ == '__main__':
unittest.main()

``````

Test Case 5:

``````import AirportConnectionsAssignment
import unittest

class TestPrgm(unittest.TestCase):

def test_case_5(self):
AIRPORTS = []
STARTING_AIRPORT = "3"
routes = []
aptConnects = AirportConnectionsAssignment.aptConnects(AIRPORTS, routes, STARTING_AIRPORT)
self.assertTrue(aptConnects == "No Input Airport List")

if __name__ == '__main__':
unittest.main()

``````

Test Case 6:

``````import AirportConnectionsAssignment
import unittest

class TestPrgm(unittest.TestCase):

def test_case_6(self):
AIRPORTS = ["1", "2", "ABC"]
STARTING_AIRPORT = None
routes = []
aptConnects = AirportConnectionsAssignment.aptConnects(AIRPORTS, routes, STARTING_AIRPORT)
self.assertTrue(aptConnects == "No Starting Airport Provided")
if __name__ == '__main__':
unittest.main()

``````