Graphs
The term “graph” has a very specific meaning in Computer Science. It refers specifically to a network of connections, like this:
The numbered circles are called nodes and the lines between them are called edges. Examples of graphs include:
Facebook Friends — Each node is a person and each edge is a friendship. That means person
1
is friends with2
and4
. We may want to ask, who is the person with the most friends? Who are my friends-of-friends?Train Maps — Each node is a town and each edge is a train track. So if I live in town
6
I can take a train directly to town7
with no stops. We may want to ask, what is the shortest path from town1
to town6
?
In both of these examples, the edges are two-way. If there are train tracks between towns 1
and 2
, you can go across them in either direction. These two-way edges are called undirected edges, and a graph made up of undirected edges is called an undirected graph.
In practice, it is far more common to use one-way edges, or directed edges. A directed graph looks like this:
Notice that the edges switched from lines to arrows. Examples of this kind of graph include:
Twitter Follows — Each node is a person and each edge is a follow. So person
1
follows2
and4
, but they do not follow back. Person2
and3
follow each other. We may want to ask if a tweet from4
will ever be seen by6
through retweets? Can a tweet from4
ever be seen by7
through retweets?The Internet — Each node is a web page and each edge is a link from page-to-page. Page
4
only has one outgoing link, pointing to page6
. If there are tons of link to particular pages, maybe that is a proxy for how important that page is.Elm Modules — Each node a
module
and each edge represents animport
. It looks like modules2
and3
import each other, so we have a cyclic dependency. To compile2
we need3
to be compiled, which needs2
to be compiled, which needs3
to be compiled, etc. No good! So we would definitely want to know if this graph has any cycles.
When writing a compiler, you have to deal with directed graphs all the time. It is kind of crazy. Anyway, now that we know the two basic types of graphs, how do we represent them in Elm?
Graphs in Elm
Graphs are easy to represent in Elm thanks to dictionaries! So imagine we wanted to represent this tiny graph of Twitter users:
We can represent it in Elm like this:
import Dict exposing (Dict)
graph : Dict String (List String)
graph =
Dict.fromList
[ ( "Bob", ["Sue", "Tom"] ) -- Bob follows Sue and Tom.
, ( "Sue", ["Tom"] ) -- Sue only follows Tom.
, ( "Tom", [] ) -- Tom does not follow anyone!
]
So if we wantned to know how many followers someone has, we could say:
countFollowers : String -> Dict String (List String) -> Int
countFollowers name graph =
let
addFollow _ follows count =
if List.member name follows then
count + 1
else
count
in
Dict.foldl addFollow 0 graph
This function visits each person in the Twitter graph, and then ask if this person follows a partucilar individual. It would be really expensive to crawl all twitter users any time you wanted a follower count though! Fortunately, we can just add some additional information to our graph representation.
import Dict exposing (Dict)
type alias TwitterGraph =
Dict String User
type alias User =
{ displayName : String
, following : List String
, followers : Int
, biography : String
, picture : String
}
Now the data for a person includes followers
like before, but also a bunch of other useful information. If we want to know someone’s follower count, we can just look it up directly now. This same technique can be applied to web pages, Elm modules, train maps, or whatever else you can think of!
Note 1: An undirected graph can be represented with a directed graph. Just make sure that whenever two nodes are connected, you create a directed edge in each direction!
Note 2: The graph representation we covered here is called an adjacency list, but there are many other ways to represent graphs. For example, you could also use
Array (List Int)
where the array index is the unique identifier for each node, but this works really poorly if your only nodes are1
and40231
. How do we fill in all the other array indexes? On the web, our identifiers are often user names or numbers that are non-consecutive, so dictionaries are a great default representation in Elm.