What THEY Do Not Teach YOU In Data Science! - Data Structures Problem #3 By Anish Arya (Machine Learning and Reinforcement Learning Consultant)

Hello Aspiring Real Data and Decision Scientists,

In case you missed, the introduction to the series of problems, please go through the following discussion:
.
.

What THEY Do Not Teach YOU In Data Science! - By Anish Arya (Machine Learning and Reinforcement Learning Consultant)

.
.
PROBLEM #3 STATEMENT: Problem Level: Hard

The following is a Graph problem which will test the knowledge of Hash tables and node traversing techniques.

The CEO of a leading airline company wants to know if his flights are servicing all major cities /airports such that if a passenger wants to travel from a given starting city, they can reach any of the other cities using either a direct flight (0 stops) or an indirect flight (multiple stops at intermediate cities).

If there are any cities that are not reachable, the CEO wants to know the minimum number of flights (oneway flights) that need to be added so that the passenger can reach any destination city from the starting city using either direct or indirect flights.

You are given a list of city airports (three-letter codes like “JFK”), a list of routes (one-way flights from one airport to another like [“JFK” , “SFO”] ), and a starting city / airport.

Note that routes only allow you to fly in one direction; for instance, the route [“JFK”, “SFO”] only allows you to fly from “JFK” to “SFO” and not from SFO to JFK

Also note that the connections don’t have to be direct (take off at the starting airport and land in the destination airport).

It is okay if the destination airport can be reached from the starting airport by stopping at other intermediary airports in between.

Sample Input:

airports = [ “BGI”, “CDG”, “DEL”, “DOH”, “DSM”, “EWR”, “EYW”, “HND”, “ICN”, “JFK”, “LGA”, “LHR”, “ORD”, “SAN”, “SFO”, “SIN”, “TLV”, “BUD” ]

routes = [ [“DSM”, “ORD”], [“ORD”, “BGI”], [“BGI”, “LGA”], [“SIN”, “CDG”], [“CDG”, “SIN”], [“CDG”, “BUD”], [“DEL”, “DOH”], [“DEL”, “CDG”], [“TLV”, “DEL”], [“EWR”, “HND”], [“HND”, “ICN”], [“HND”, “JFK”], [“ICN”, “JFK”], [“JFK”, “LGA”], [“EYW”, “LHR”], [“LHR”, “SFO”], [“SFO”, “SAN”], [“SFO”, “DSM”], [“SAN”, “EYW”] ]

Starting Airport = LGA

Sample Output:

The minimum flights that need to be added = 3

The flights that need to be added are: [LGA, TLV][LGA, SFO][LGA, EWR]

Will post the answer by eod. :slight_smile:

Programming Language Used: Python

Reference Code Notations/Short Forms Used In Program:

v = Number of Vertices or Nodes or Airports

e = Number of Edges or Routes

apt = Airport

apts = List of Airports

rt = Route

rts = List of Routes

connects = Connections or Routes

connect = Connection or Route

COMPLEXITIES:

Overall Time Complexity:

O (v * (v + e) + v + e + (v * log(v)))

Overall Space Complexity:

O (v + e)

Method 1:

#airportConnections.... Primary method

def aptConnects(apts, rts, startingApt):

    """ This method will return the minimum number of flights (one-way flights) that need to be added so that the passenger can reach any destination city from the 

starting city using either direct or indirect flights."""

    

    # If airports list is empty, then return "No Input Airport List"

    if len(apts) == 0:

        return "No Input Airport List"

    # If starting airport is not provided, then return "No Starting Airport Provided."

    if startingApt is None:

        return "No Starting Airport Provided"

    # Firstly, we gonna create a graph from the airports and routes list. And for this, we gonna use a helper method

    aptGraph = createAptGraph(apts, rts)

    # Secondly, we'll gather all of the unreachable airports, then assign the score value(= the number of unreachable airports that we can reach from this unreachable airport)

    # In this helper method, we gonna pass our airport graph to fetch the unreachable nodes, the list of airports, and the starting airport

    unreachableAptNodes = fetchUnreachableAptNodes(aptGraph, apts, startingApt) # Fetching all of the unreachable airports

    scoreUnreachableNodesFromUnreachableNodes(aptGraph, unreachableAptNodes)

    #Finally, return the minimum number of flights/connections  that need to be added

    return fetchMinNumberOfNewConnects(aptGraph, unreachableAptNodes, startingApt)

Method 2:

# Time Complexity = O(vlog(v) + (v + e)) ||| Space Complexity = O(1) --> Constant Space

# Where v is the number of vertices/nodes/airports and e is the number of edges/routes

def fetchMinNumberOfNewConnects(aptGraph, unreachableAptNodes, startingApt):

    # Sorting (in decreasing order) the unreachable nodes by their score and node/airport name (# of unreachable connections that they have)

    unreachableAptNodes.sort(key = lambda apt: (len(apt.unreachableConnects), apt.apt), reverse = True)

    numOfNewConnections = 0

    reachNewApts = []

    for aptNode in unreachableAptNodes:

        if aptNode.isNodeReachable:

            # this is because sometimes, we're gonna be getting unreachable connections from a given unreachable airport and mark them as handled/unlocked

            continue

        numOfNewConnections += 1

        reachNewApts.append([startingApt, aptNode.apt])

        for connection in aptNode.unreachableConnects:

            aptGraph[connection].isNodeReachable = True

    return (numOfNewConnections, reachNewApts)

Method 3

# Time Complexity = O(v + e) ||| Space Complexity = O(v + e)

#Where v is the number of vertices/nodes/airport and e is the number of edges/routes

def createAptGraph(apts, rts):

    """We gonna create a graph from the airport(s) and route(s) list."""

    aptGraph = {} # We gonna use it as a hash table, wherin every key would be an actual airport name

    for apt in apts:

        # We probably wanna know, if an airport is reachable, the unreachable connections that an apt have, and the actual connects that apt has

        # For this, am gonna create a new data structure called AirportNode

        aptGraph[apt] = AptNode(apt)

    # Fill up the airport connections based on routes

    for rt in rts:

        # ["DSM", "ORD"]

        apt, connection = rt

        aptGraph[apt].connections.append(connection) # We gonna add the connection in the parent

    return aptGraph

Method 4:

# Time Complexity = O(v + e) ||| Space Complexity = O(v)

#Where v is the number of vertices/nodes/airports and e is the numberof edges/routes

def fetchUnreachableAptNodes(aptGraph, apts, startingApt):

    """We gonna keep a track of visited nodes using Depth First Search"""

    visitedApts = {} # It's just gonna be a hash table E.g.; {"BGI": True, '...': True}. In this every value would be true because we are keeping the track of visited nodes only

    depthFirstTraversalOfApts(aptGraph, startingApt, visitedApts)

    unreachableAptNodes = []

    for apt in apts:

        if apt in visitedApts:

            continue # so, if an apt is an already visited one, then skip

        else:

            aptNode = aptGraph[apt]

            aptNode.isNodeReachable = False

            unreachableAptNodes.append(aptNode)

    return unreachableAptNodes

Method 5

def depthFirstTraversalOfApts(aptGraph, apt, visitedApts):

    """DFS and mark existing nodes as 'visited'."""

    if apt in visitedApts:

        # if an airport/node is already visited then skip it

        return

    visitedApts[apt] = True # mark the apt as visited by assigning a bool True value

    # For every connection, in whatever airport we are visiting, visit that connection and then visit that connection's connections, and ultimately traverse every node/airport under the passed (root) airport

    connections = aptGraph[apt].connections

    for connection in connections: # For every connection, perform DFS

        depthFirstTraversalOfApts(aptGraph, connection, visitedApts)

Method 6:

# Time Complexity = O(v * (v + e)) ||| Space Complexity = O(v)

#Where v is the number of vertices/nodes/airports and e is the numberof edges/routes

def scoreUnreachableNodesFromUnreachableNodes(aptGraph, unreachableAptNodes):

    # For every unreachable node that we have, figure out how many unreachable nodes that we gonna have

    for aptNode in unreachableAptNodes:

        apt = aptNode.apt # fetching the name of the airport which was initialised while creatind the node

        unreachableConnects = []

        # Now, we again gonna perform DFS technique to add all the unreachable connections and score them

        depthFirstAppendUnreachableConnctions(aptGraph, apt, unreachableConnects, {})

        aptNode.unreachableConnects = unreachableConnects

Method 7:

def depthFirstAppendUnreachableConnctions(aptGraph, apt, unreachableConnects, visitedSetOfApts):

    if aptGraph[apt].isNodeReachable:

        return # if the node is reachable then skip it, because here we are only dealing with unreachable nodes

    if apt in visitedSetOfApts:

        return # if the node is already visited then skip it

    visitedSetOfApts[apt] = True # mark the apt as visited by assigning a bool True value

    unreachableConnects.append(apt) 

    connections = aptGraph[apt].connections

    for connection in connections:

        depthFirstAppendUnreachableConnctions(aptGraph, connection, unreachableConnects, visitedSetOfApts)

Custom Data Structure Used For A Node/Airport:

class AptNode:

    def __init__(self,apt):

        """This is a custom data structure of a node/airport. We'll be storing the aiport/node name just in case if we need to access the name of the airport."""

        self.apt = apt

        self.connections = []

        self.isNodeReachable = True

        self.unreachableConnects = []

# *******************************************

Test Case 1

import AirportConnectionsAssignment

import unittest

class TestPrgm(unittest.TestCase):

    def test_case_1(self):

        AIRPORTS = [ "BGI", "CDG", "DEL", "DOH", "DSM", "EWR", "EYW", "HND", "ICN", "JFK", "LGA", "LHR", "ORD", "SAN", "SFO", "SIN", "TLV", "BUD" ]

        STARTING_AIRPORT = "LGA"

        routes = [ ["DSM", "ORD"], ["ORD", "BGI"], ["BGI", "LGA"], ["SIN", "CDG"], ["CDG", "SIN"], ["CDG", "BUD"], ["DEL", "DOH"], ["DEL", "CDG"], ["TLV", "DEL"], ["EWR", "HND"], ["HND", "ICN"], ["HND", "JFK"], ["ICN", "JFK"], ["JFK", "LGA"], ["EYW", "LHR"], ["LHR", "SFO"], ["SFO", "SAN"], ["SFO", "DSM"], ["SAN", "EYW"] ]

        aptConnects = AirportConnectionsAssignment.aptConnects(AIRPORTS, routes, STARTING_AIRPORT)

        self.assertTrue(aptConnects == (3, [['LGA', 'SFO'], ['LGA', 'TLV'], ['LGA', 'EWR']]))
if __name__ == '__main__':
	unittest.main()

Test Case 2:

import AirportConnectionsAssignment

import unittest

class TestPrgm(unittest.TestCase):

    def test_case_2(self):

        AIRPORTS = ["1", "2", "3"]

        STARTING_AIRPORT = "2"

        routes = [ ["1", "2"], ["2", "3"]]

        aptConnects = AirportConnectionsAssignment.aptConnects(AIRPORTS, routes, STARTING_AIRPORT)

        self.assertTrue(aptConnects == (1, [['2', '1']]))

if __name__ == '__main__':
	unittest.main()

Test Case 3:

import AirportConnectionsAssignment
import unittest

class TestPrgm(unittest.TestCase):

    def test_case_3(self):

        AIRPORTS = ["1", "2", "3"]

        STARTING_AIRPORT = "3"

        routes = [ ["1", "2"], ["2", "3"]]

        aptConnects = AirportConnectionsAssignment.aptConnects(AIRPORTS, routes, STARTING_AIRPORT)

        self.assertFalse(aptConnects == (1, [['2', '1']]))

if __name__ == '__main__':
	unittest.main()

Test Case 4:

import AirportConnectionsAssignment
import unittest

class TestPrgm(unittest.TestCase):

    def test_case_4(self):

        AIRPORTS = ["1", "2", "3",  "4"]

        STARTING_AIRPORT = "3"

        routes = [ ["1", "2"], ["2", "3"]]

        aptConnects = AirportConnectionsAssignment.aptConnects(AIRPORTS, routes, STARTING_AIRPORT)

        self.assertTrue(aptConnects == (2, [['3', '1'], ['3', '4']]))

if __name__ == '__main__':
	unittest.main()

Test Case 5:

import AirportConnectionsAssignment
import unittest

class TestPrgm(unittest.TestCase):

    def test_case_5(self):
        AIRPORTS = []
        STARTING_AIRPORT = "3"
        routes = []
        aptConnects = AirportConnectionsAssignment.aptConnects(AIRPORTS, routes, STARTING_AIRPORT)
        self.assertTrue(aptConnects == "No Input Airport List")

if __name__ == '__main__':
	unittest.main()

Test Case 6:

import AirportConnectionsAssignment
import unittest

class TestPrgm(unittest.TestCase):

    def test_case_6(self):
        AIRPORTS = ["1", "2", "ABC"]
        STARTING_AIRPORT = None
        routes = []
        aptConnects = AirportConnectionsAssignment.aptConnects(AIRPORTS, routes, STARTING_AIRPORT)
        self.assertTrue(aptConnects == "No Starting Airport Provided")
if __name__ == '__main__':
	unittest.main()