1.1 What is Data Structure?
A data structure is a specialized format for organizing and storing data. General data structure types include the array, the file, the record, the table, the tree, and so on. Any data structure is designed to organize data to suit a specific purpose so that it can be accessed and worked with in appropriate ways. In computer programming, a data structure may be selected or designed to store data for the purpose of working on it with various algorithms.
1.2 Why Study Data Structures?
To manage the complexity of problems and the problem-solving process, computer scientists use abstractions to allow them to focus on the “big picture” without getting lost in the details. By creating models of the problem domain, we are able to utilize a better and more efficient problem-solving process. These models allow us to describe the data that our algorithms will manipulate in a much more consistent way with respect to the problem itself.
1.3 What Are Linear Structures?
We will consider three simple but very powerful concepts. Stacks, queues, and lists are examples of data collections whose items are ordered depending on how they are added or removed. Once an item is added, it stays in that position relative to the other elements that came before and came after it. Collections such as these are often referred to as linear data structures.
Linear structures can be thought of as having two ends. Sometimes these ends are referred to as the “left” and the “right” or in some case the “front” and the “rear.” You could also call them the “top” and the “bottom.” The names given to the ends are not significant. What distinguishes one linear structure from another is the way in which items are added and removed, in particular the location where these additions and removals occur. For example, a structure might allow new items to be added at only one end. Some structures might allow items to be removed from either end.
These variations give rise to some of the most useful data structures in computer science. They appear in many algorithms and can be used to solve a variety of important problems.
2. Arrays Data Structure
2.1.1 What is an array?
An array data structure or simply an array is a data structure consisting of a collection of elements, each identified by at least one array index or key. An array is stored so that the position of each element can be computed from its index topple by a mathematical formula. The simplest type of data structure is a linear array, also called one-dimensional array.
2.1.2 Array element identifier
When data objects are stored in an array, individual objects are selected by an index that is usually a non-negative scalar integer. Indices are also called subscripts. An index maps the array value to a stored object. There are three ways in which the elements of an array can be indexed:
+ 0 (zero-based indexing): The first element of the array is indexed by subscript of 0.
+ 1 (one-based indexing): The first element of the array is indexed by subscript of 1.
+ n (n-based indexing): The base index of an array can be freely chosen.
2.1.3 Applications of Array
Arrays are used to implement mathematical vectors and matrices, as well as other kinds of rectangular tables. Many databases, small and large, consist of (or include) one-dimensional arrays whose elements are records.
Arrays are used to implement other data structures, such as heaps, hash tables, deques, queues, stacks, strings, and VLists. One or more large arrays are sometimes used to emulate in-program dynamic memory allocation, particularly memory pool allocation.
Arrays can be used to determine partial or complete control flow in programs, as a compact alternative to (otherwise repetitive) multiple IF statements. They are known in this context as control tables and are used in conjunction with a purpose built interpreter whose control flow is altered according to values contained in the array.
2.2.1 What is Bitmap?
A bitmap is a mapping from some domain (for example, a range of integers) to bits, that is, values which are zero or one. It is also called a bit array orbitmap index.
2.2.2 What is Bitmap Image?
When the domain is a rectangle (indexed by two coordinates) a bitmap gives a way to store a binary image, that is, an image in which each pixel is either black or white (or any two colors).
The more general term pixmap refers to a map of pixels, where each one may store more than two colors, thus using more than one bit per pixel. Often bitmap is used for this as well. In some contexts, the term bitmap implies one bit per pixel, whilepixmap is used for images with multiple bits per pixel.
2.2.3 Bitmap pixel storage
In typical uncompressed bitmaps, image pixels are generally stored with a color depth of 1, 4, 8, 16, 24, 32, 48, or 64 bits per pixel. Pixels of 8 bits and fewer can represent either grayscale or indexed color. An alpha channel (for transparency) may be stored in a separate bitmap, where it is similar to a grayscale bitmap, or in a fourth channel that, for example, converts 24-bit images to 32 bits per pixel.
2.2.4 Bitmaps file format
Device-independent bitmap (DIB) is a format used to define device-independent bitmaps in various color resolutions. The main purpose of DIBs is to allow bitmaps to be moved from one device to another (hence, the device-independent part of the name).
2.3.1 What is matrix?
A matrix is a rectangular array of numbers, symbols, or expressions, arranged in rows and columns. The individual items in a matrix are called its elements or entries.
2.3.2 Basic matrix operations
There are a number of basic operations that can be applied to modify matrices, called matrix addition, scalar multiplication, transposition, matrix multiplication, row operations, and submatrix.
2.3.3 Matrix applications
There are numerous applications of matrices, both in mathematics and other sciences. Some of them merely take advantage of the compact representation of a set of numbers in a matrix. For example, in game theory and economics, the payoff matrix encodes the payoff for two players, depending on which out of a given (finite) set of alternatives the players choose. Text mining and automated thesaurus compilation makes use of document-term matrices such as tf-idf to track frequencies of certain words in several documents.
2.4 Parallel Array
2.4.1 What is parallel array?
A parallel array is a data structure for representing arrays of records. It keeps a separate, homogeneous array for each field of the record, each having the same number of elements. Then, objects located at the same index in each array are implicitly the fields of a single record. Pointers from one object to another are replaced by array indices. For example, one might declare an array of 100 names, each a string, and 100 ages, each an integer, associating each name with the age that has the same index.
2.4.2 Pros and cons of parallel array
Parallel arrays have a number of practical advantages over the normal approach:
+They can be used in languages which support only arrays of primitive types and not of records (or perhaps don’t support records at all).
+Parallel arrays are simple to understand and use, and are often used where declaring a record is more trouble than it’s worth.
+They can save a substantial amount of space in some cases by avoiding alignment issues.
+If the number of items is small, array indices can occupy significantly less space than full pointers, particularly on architectures with large words.
However, parallel arrays also have several strong disadvantages, which serve to explain why they are not generally preferred:
+They have significantly worse locality of reference when visiting the records non-sequentially and examining multiple fields of each record.
+They obscure the relationship between fields of a single record.
+They have little direct language support (the language and its syntax typically express no relationship between the arrays in the parallel array).
+They are expensive to grow or shrink, since each of several arrays must be reallocated.
3. Lists Data Structure
3.1 Array List
3.1.1 What is array list?
A dynamic array, growable array, resizable array, dynamic table, mutable array, or array list is a random access, variable-size list data structure that allows elements to be added or removed. It is supplied with standard libraries in many modern mainstream programming languages.
A dynamic array is not the same thing as a dynamically allocated array, which is a fixed-size array whose size is fixed when the array is allocated, although a dynamic array may use such a fixed-size array as a back end.
3.1.2 Performance of array list
The dynamic array has performance similar to an array, with the addition of new operations to add and remove elements. Iterating over the elements in order (linear time, good cache performance). Inserting or deleting an element in the middle of the array (linear time). Inserting or deleting an element at the end of the array (constant amortized time)
3.1.3 Language supported by array list
C++’s std::vector is an implementation of dynamic arrays, as are the ArrayList classes supplied with the Java API and the .NET Framework. The genericList<> class supplied with version 2.0 of the .NET Framework is also implemented with dynamic arrays.
Python’s list datatype implementation is a dynamic array. Delphi and D implement dynamic arrays at the language’s core. Ada’s Ada.Containers.Vectors generic package provides dynamic array implementation for a given subtype.
Many scripting languages such as Perl and Ruby offer dynamic arrays as a built-in primitive data type. Several cross-platform frameworks provide dynamic array implementations for C: CFArray and CFMutableArray in Core Foundation; GArray and GPtrArray in GLib.
3.2 Skip List
3.2.1 What is skip list?
A skip list is a data structure that allows fast search within an ordered sequence of elements. Fast search is made possible by maintaining a linked hierarchy of subsequences, each skipping over fewer elements. Searching starts in the sparsest subsequence until two consecutive elements have been found, one smaller and one larger than the element searched for.
3.2.2 Inserting elements to skip list
The elements used for a skip list can contain more than one pointer since they can participate in more than one list. Insertions and deletions are implemented much like the corresponding linked-list operations, except that “tall” elements must be inserted into or deleted from more than one linked list.
3.2.3 Usages of skip list
Cyrus IMAP server offers a “skiplist” backend DB implementation
Lucene uses skip lists to search delta-encoded posting lists in logarithmic time
QMap template class of Qt that provides a dictionary.
Redis, an ANSI-C open-source persistent key/value store for Posix systems, uses skip lists in its implementation of ordered sets.
ConcurrentSkipListSet and ConcurrentSkipListMap in the Java 1.6 API.
Skip lists are used for efficient statistical computations of running medians.
3.3.1 What is Zipper?
A zipper is a technique of representing an aggregate data structure so that it is convenient for writing programs that traverse the structure arbitrarily and update its contents, especially in purely functional programming languages.
The zipper includes and generalizes the gap buffer technique sometimes used with arrays. The zipper technique is general in the sense that it can be adapted to lists, trees, and other recursively defined data structures. Such modified data structures are usually referred to as “a tree with zipper” or “a list with zipper” to emphasize that the structure is conceptually a tree or list, while the zipper is a detail of the implementation.
3.3.2 Usage of Zipper
The zipper is often used where there is some concept of ‘focus’ or of moving around in some set of data, since its semantics reflects that of moving around but in a functional non-destructive manner.
The zipper has been used in
+Xmonad, to manage focus and placement of windows
+Huet’s papers cover a structural editor based on zippers and a theorem proverb
+A filesystem (ZipperFS) written in Haskell offering the strongest, repeatable read, isolation mode for clients; pervasive copy-on-write for files and directories; built-in traversal facility
+Clojure has extensive support for zippers.
3.3.3 Zipper contexts and differentiation
It has been shown that the type of the items in the context list produced by the zipper transformation is the “derivative” of the original type in a sense that is related to differentiation in calculus by decategorification. Most datatypes are constructed from products and sums of datatypes; any given datatype looks like a polynomial or aTaylor series, and the representation of the type of context items looks like the derivative of that polynomial or series.In a recursive datatype like a list or a tree, the derivative is taken with respect to the recursion variable.
4. Binary Ttrees
4.1 Binary search tree
4.1.1 What is binary search tree?
A binary search tree (BST), sometimes also called an ordered or sorted binary tree, is anode-based binary tree data structure where each node has a comparable key (and an associated value) and satisfies the restriction that the key in any node is larger than the keys in all nodes in that node’s left sub-tree and smaller than the keys in all nodes in that node’s right sub-tree. Each node has no more than two child nodes. Each child must either be a leaf node or the root of another binary search tree. The left sub-tree contains only nodes with keys less than the parent node; the right sub-tree contains only nodes with keys greater than the parent node. BSTs are also dynamic data structures, and the size of a BST is only limited by the amount of free memory in the operating system.
4.1.2 Advantage and Disadvantage of BST
The main advantage of binary search trees is that it remains ordered, which provides quicker search times than many other data structures. Binary search trees are a fundamental data structure used to construct more abstract data structures such as sets, multisets, and associative arrays.
Some of their disadvantages are as follows:
+The shape of the binary search tree totally depends on the order of insertions, and it can be degenerated.
+When inserting or searching for an element in binary search tree, the key of each visited node has to be compared with the key of the element to be inserted or found, i.e., it takes a long time to search an element in a binary search tree.
+The keys in the binary search tree may be long and the run time may increase. After a long intermixed sequence of random insertion and deletion, the expected height of the tree approaches square root of the number of keys which grows much faster than .
4.1.3 Determining a tree is a BST
Sometimes we already have a binary tree, and we need to determine whether it is a BST. This is an interesting problem which has a simple recursive solution. The BST property,
+every node on the right subtree has to be larger than the current node and every node on the left subtree has to be smaller than (or equal to – should not be the case as only unique values should be in the tree – this also poses the question as to if such nodes should be left or right of this parent)
+the current node is the key to figuring out whether a tree is a BST or not. On first thought it might look like we can simply traverse the tree, at every node check whether the node contains a value larger than the value at the left child and smaller than the value on the right child, and if this condition holds for all the nodes in the tree then we have a BST.
4.1.4 Difference between binary tree and binary search tree
Binary tree: In short, a binary tree is a tree where each node has up to two leaves. In a binary tree, a left child node and a right child node contain values which can be either greater, less, or equal to parent node.
Binary Search Tree: A Binary Search Tree imposes the condition that, for all nodes, the left children are less than or equal to the current node, which is less than all the right nodes.
Operations, such as find, on a binary search tree require comparisons between nodes. These comparisons are made with calls to a comparator, which is a subroutine that computes the total order (linear order) on any two keys. This comparator can be explicitly or implicitly defined, depending on the language in which the binary search tree was implemented. A common comparator is the less-than function.
4.2 Cartesian tree
4.2.1 What is cartesian tree?
A Cartesian tree is a binary tree derived from a sequence of numbers; it can be uniquely defined from the properties that it is heap-ordered and that a symmetric (in-order) traversal of the tree returns the original sequence. In the context of geometric range searching data structures, Cartesian trees have also been used in the definition of the treap and randomized binary search tree data structures for binary search problems. The Cartesian tree for a sequence may be constructed in linear time using a stack-based algorithm for finding all nearest smaller values in a sequence.
4.2.2 Application in sorting
Pairs of consecutive sequence values (shown as the thick red edges) that bracket a sequence value x (the darker blue point)…
Levcopoulos & Petersson (1989) describe a sorting algorithm based on Cartesian trees. They describe the algorithm as based on a tree with the maximum at the root, but it may be modified straightforwardly to support a Cartesian tree with the convention that the minimum value is at the root.
4.3.1 What is T-Tree?
AT-tree is a type of binary tree data structure that is used by main-memory databases, such asDatablitz, EXtremeDB, MySQL Cluster, Oracle TimesTen and MobileLite.
A T-tree is a balanced index tree data structure optimized for cases where both the index and the actual data are fully kept in memory, just as a B-tree is an index structure optimized for storage on blocks oriented secondary storage devices like hard disks. T-trees seek to gain the performance benefits of in-memory tree structures such as AVL trees while avoiding the large storage space overhead which is common to them.
4.3.2 T-Tree Performance
Although T-trees seem to be widely used for main-memory databases, recent research indicates that they actually do not perform better than B-trees on modern hardware:
5. Heaps Data Structure
5.1.1 What is heap?
A heap is a specialized tree-based data structure that satisfies the heap property: If A is a parent node of B then the key of node A is ordered with respect to the key of node B with the same ordering applying across the heap. Either the keys of parent nodes are always greater than or equal to those of the children and the highest key is in the root node (this kind of heap is called max heap) or the keys of parent nodes are less than or equal to those of the children and the lowest key is in the root node (min heap).
In a heap the highest (or lowest) priority element is always stored at the root, hence the name heap. A heap is not a sorted structure and can be regarded as partially ordered.
5.1.2 Heap commonly performed operations
+create-heap: create an empty heap
+heapify: create a heap out of given array of elements
+delete-max or delete-min: removing the root node of a max- or min-heap, respectively
+increase-key or decrease-key: updating a key within a max- or min-heap, respectively
+insert: adding a new key to the heap
+merge: joining two heaps to form a valid new heap containing all the elements of both.
+size: return the number of items in the heap.
+isEmpty(): returns true if the heap is empty, false otherwise.
+Union(): Creates a new heap by joining two heaps given as input.
+Shift-down: Move a node down in the tree, similar to Shift-up
5.1.3 Heap Usage
The heap data structure has many applications.
+Heapsort: One of the best sorting methods being in-place and with no quadratic worst-case scenarios.
+Selection algorithms: A heap allows access to the min or max element in constant time, and other selections (such as median or kth-element) can be done in sub-linear time on data that is in a heap.
+Graph algorithms: By using heaps as internal traversal data structures, run time will be reduced by polynomial order.
+Order statistics: The Heap data structure can be used to efficiently find the kth smallest (or largest) element in an array.
5.1.4 Heap Implementations
+The C++ Standard Template Library provides the make_heap, push_heap and pop_heap algorithms for heaps (usually implemented as binary heaps), which operate on arbitrary random access iterators. It treats the iterators as a reference to an array, and uses the array-to-heap conversion.
+The Java 2 platform (since version 1.5) provides the binary heap implementation with class java.util.PriorityQueue in Java Collections Framework.
+Python has a heapq module that implements a priority queue using a binary heap.
+PHP has both max-heap (SplMaxHeap) and min-heap (SplMinHeap) as of version 5.3 in the Standard PHP Library.
+Perl has implementations of binary, binomial, and Fibonacci heaps in the
5.2 Binary Heap
5.2.1 What is binary heap ?
A binary heap is a heap data structure created using a binary tree. It can be seen as a binary tree with two additional constraints:
+Shape property: a binary heap is a complete binary tree; that is, all levels of the tree, except possibly the last one (deepest) are fully filled, and, if the last level of the tree is not complete, the nodes of that level are filled from left to right.
+Heap property: All nodes are either greater than or equal to or less than or equal to each of its children, according to a comparison predicate defined for the heap.
5.2.2 Binary heap operations]
Both the insert and remove operations modify the heap to conform to the shape property first, by adding or removing from the end of the heap. Then the heap property is restored by traversing up or down the heap. Both operations take O(log n) time.
5.2.3 Building a binary heap
A binary heap could be built by successive insertions. This approach requires time because each insertion takes time and there are elements. However this is not the optimal method. The optimal method starts by arbitrarily putting the elements on a binary tree, respecting the shape property. Then starting from the lowest level and moving upwards, shift the root of each subtree downward as in the deletion algorithm until the heap property is restored.
5.2.4 Binary heap usage
Despite its limitations and its unpredictable nature, binary heaps are useful in the design of deterministic algorithms. They were used to achieve the best complexity to date for finding a minimum spanning tree. They can also be used to easily build an optimal selection algorithm, as well as near-sorting algorithms, which are algorithms that place every element near its final position, a situation in which insertion sort is fast.
6. Tries Data Structure
6.1.1 What is tire ?
A trie, also called digital tree and sometimes radix tree or prefix tree, is an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated. All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string. Values are normally not associated with every node, only with leaves and some inner nodes that correspond to keys of interest.
6.1.2 Tries Advantage and Disadvantage
A trie has a number of advantages over binary search trees. A trie can also be used to replace a hash table, over which it has the following advantages:
+Looking up data in a trie is faster in the worst case, O(m) time, compared to an imperfect hash table. An imperfect hash table can have key collisions. A key collision is the hash function mapping of different keys to the same position in a hash table.
+Buckets in a trie, which are analogous to hash table buckets that store key collisions, are necessary only if a single key is associated with more than one value.
+There is no need to provide a hash function or to change hash functions as more keys are added to a trie.
+A trie can provide an alphabetical ordering of the entries by key.
Tries do have some drawbacks as well:
+Tries can be slower in some cases than hash tables for looking up data, especially if the data is directly accessed on a hard disk drive or some other secondary storage device where the random-access time is high compared to main memory.
+Some keys, such as floating point numbers, can lead to long chains and prefixes that are not particularly meaningful.
+Some tries can require more space than a hash table, as memory may be allocated for each character in the search string, rather than a single chunk of memory for the whole entry, as in most hash tables.
7. Multiway Trees
7.1 Spaghetti Stack
7.1.1 What is Spaghetti Stack ?
A spaghetti stack in computer science is an N-ary tree data structure in which child nodes have pointers to the parent nodes (but not vice-versa).When a list of nodes is traversed from a leaf node to the root node by chasing these parent pointers, the structure looks like a linked list stack. It can be compared to a linked list having one and only one parent pointer called “next” or “link”, and ignoring that each parent may have other children (which are not accessible anyway since there are no downward pointers).
7.1.2 Spaghetti stack usage
Spaghetti stack structures arise in situations when records are dynamically pushed and popped onto a stack as execution progresses, but references to the popped records remain in use.
7.2 Disjoint-set Data Structure
7.2.1 What is disjoint-set data structure ?
A disjoint-set data structure, also called a union–find data structure or merge find set, is a data structure that keeps track of a set of elements partitioned into a number of disjoint subsets. It supports two useful operations:
+Find: Determine which subset a particular element is in.
+Union: Join two subsets into a single subset.
7.2.2 Disjoint-set Type
A simple approach to creating a disjoint-set data structure is to create a linked list for each set. The element at the head of each list is chosen as its representative.
MakeSet creates a list of one element. Union appends the two lists, a constant-time operation. The drawback of this implementation is that Find requires O(n) or linear time to traverse the list backwards from a given element to the head of the list.
Disjoint-set forests are data structures where each set is represented by a tree data structure, in which each node holds a reference to its parent node. In a disjoint-set forest, the representative of each set is the root of that set’s tree.
8.1 Distributed Hash Table
8.1.1 What is distributed hash table
A distributed hash table (DHT) is a class of a decentralized distributed system that provides a lookup service similar to a hash table; (key, value) pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.
8.1.2 Distributed hash table usage
DHT research was originally motivated, in part, by peer-to-peer systems such as Freenet, gnutella, BitTorrent and Napster, which took advantage of resources distributed across the Internet to provide a single useful application. In particular, they took advantage of increased bandwidth and hard disk capacity to provide a file-sharing service.
These systems differed in how they found the data their peers contained:
+Napster, the first large-scale P2P content delivery system, required a central index server: each node, upon joining, would send a list of locally held files to the server, which would perform searches and refer the querier to the nodes that held the results. This central component left the system vulnerable to attacks and lawsuits.
+Gnutella and similar networks moved to a flooding query model – in essence, each search would result in a message being broadcast to every other machine in the network. While avoiding a single point of failure, this method was significantly less efficient than Napster. Later versions of Gnutella clients moved to a dynamic querying model which vastly improved efficiency.
8.1.3 Distributed hash table structure
The structure of a DHT can be decomposed into several main components. The foundation is an abstract keyspace, such as the set of 160-bit strings. A keyspace partitioning scheme splits ownership of this keyspace among the participating nodes. An overlay network then connects the nodes, allowing them to find the owner of any given key in the keyspace.
Once these components are in place, a typical use of the DHT for storage and retrieval might proceed as follows. Suppose the keyspace is the set of 160-bit strings. To store a file with given and in the DHT, the SHA-1 hash of is generated, producing a 160-bit key, and a message is sent to any node participating in the DHT. The message is forwarded from node to node through the overlay network until it reaches the single node responsible for key as specified by the keyspace partitioning. That node then stores the key and the data. Any other client can then retrieve the contents of the file by again hashing to produce and asking any DHT node to find the data associated with a message . The message will again be routed through the overlay to the node responsible for , which will reply with the stored.
8.1.4 Distributed hash table implementations
Most notable differences encountered in practical instances of DHT implementations include at least the following:
+The address space is a parameter of DHT. Several real world DHTs use 128-bit or 160-bit key space
+Some real-world DHTs use hash functions other than SHA-1.
In the real world the key could be a hash of a file’s content rather than a hash of a file’s name to provide content-addressable storage, so that renaming of the file does not prevent users from finding it.
8.1.5 Distributed hash table advantage
Some advanced DHTs like Kademlia perform iterative lookups through the DHT first in order to select a set of suitable nodes and send messages only to those nodes, thus drastically reducing useless traffic, since published messages are only sent to nodes that seem suitable for storing the key ; and iterative lookups cover just a small set of nodes rather than the entire DHT, reducing useless forwarding.
8.2 Double Hashing
8.2.1 What is double hashing
Double hashing is a computer programming technique used in hash tables to resolve hash collisions, cases when two different values to be searched for produce the same hash key. It is a popular collision-resolution technique in open-addressed hash tables. Double hashing is implemented in many popular libraries.
8.2.2 Double hashing usage
Double hashing with open addressing is a classical data structure on a table . Let be the number of elements stored in, then’s load factor is.
Double hashing approximates uniform open address hashing. That is, start by randomly, uniformly and independently selecting two universal hash functions and to build a double hashing table.
8.3 Hash Tree
8.3 What is hash tree ?
Hash tree (persistent data structure), a tree used to map hash values to keys. A space-efficient implementation of a sparse tree, in which the descendants of each node may be interleaved in memory.
A data structure which “combines features of hash tables and LC-tries in order to perform efficient lookups and updates”
9. Graphs Data Structure
9.1.1 What is graph ?
a graph is an abstract data type that is meant to implement the graph and directed graph concepts from mathematics.
A graph data structure consists of a finite set of nodes or vertices, together with a set of ordered pairs of these nodes. These pairs are known as edges or arcs. As in mathematics, an edge (x,y) is said to point or go from x to y. The nodes may be part of the graph structure, or may be external entities represented by integer indices or references.
A graph data structure may also associate to each edge some edge value, such as a symbolic label or a numeric attribute (cost, capacity, length, etc.).
9.1.2 Graph algorithms
Graph algorithms are a significant field of interest within computer science. Typical higher-level operations associated with graphs are: finding a path between two nodes, like depth-first search and breadth-first search and finding the shortest path from one node to another, like Dijkstra’s algorithm. A solution to finding the shortest path from each node to every other node also exists in the form of the Floyd–Warshall algorithm.
9.1.3 Graph operations
The basic operations provided by a graph data structure G usually include:
+adjacent(G, x, y): tests whether there is an edge from node x to node y.
+neighbors(G, x): lists all nodes y such that there is an edge from x to y.
+add(G, x, y): adds to G the edge from x to y, if it is not there.
+delete(G, x, y): removes the edge from x to y, if it is there.
+get_node_value(G, x): returns the value associated with the node x.
+set_node_value(G, x, a): sets the value associated with the node x to a.
Structures that associate values to the edges usually also provide:
+get_edge_value(G, x, y): returns the value associated to the edge (x,y).
+set_edge_value(G, x, y, v): sets the value associated to the edge (x,y) to v.
9.2.1 What is multigraph ?
A multigraph is a graph which is permitted to have multiple edges(also called parallel edges), that is, edges that have the same end nodes. Thus two vertices may be connected by more than one edge.
There are two distinct notions of multiple edges:
+Edges without own identity: The identity of an edge is defined solely by the two nodes it connects. In this case, the term “multiple edges” means that the same edge can occur several times between these two nodes.
+Edges with own identity: Edges are primitive entities just like nodes. When multiple edges connect two nodes, these are different edges.
A multigraph is different from a hypergraph, which is a graph in which an edge can connect any number of nodes, not just two.
9.2.2 Multigraph Type
220.127.116.11 Undirected multigraph (edges without own identity)
Formally, a multigraph G is an ordered pair G:=(V, E) with V a set of vertices or nodes, E a multiset of unordered pairs of vertices, called edges or lines. Some authors allow multigraphs to have loops, that is, an edge that connects a vertex to itself, while others call these pseudographs, reserving the term multigraph for the case with no loops.
18.104.22.168 Directed multigraph (edges without own identity)
A multidigraph is a directed graph which is permitted to have multiple arcs, i.e., arcs with the same source and target nodes. A multidigraph G is an ordered pairG:=(V,A) with V a set of vertices or nodes, A multiset of ordered pairs of vertices called directed edges, arcs or arrows. A mixed multigraph G:=(V,E, A) may be defined in the same way as a mixed graph.
22.214.171.124 Directed multigraph (edges with own identity)
A multidigraph or quiver G is an ordered 4-tuple G:=(V, A, s, t) with V a set of vertices or nodes, A set of edges or lines, assigning to each edge its source node, assigning to each edge its target node.
9.3.1 What is hypergraph ?
A hypergraph is a generalization of a graph in which an edge can connect any number of vertices. Formally, a hypergraph is a pair where a set of elements is called nodes or vertices, and is a set of non-empty subsets of called hyperedges or edges. Therefore, is a subset of, where is the power set of.
While graph edges are pairs of nodes, hyperedges are arbitrary sets of nodes, and can therefore contain an arbitrary number of nodes. However, it is often desirable to study hypergraphs where all hyperedges have the same cardinality. So a 2-uniform hypergraph is a graph, a 3-uniform hypergraph is a collection of unordered triples, and so on.
A hypergraph is also called a set system or a family of sets drawn from the universal set X. The difference between a set system and a hypergraph is in the questions being asked. Hypergraph theory tends to concern questions similar to those of graph theory, such as connectivity and colorability, while the theory of set systems tends to ask non-graph-theoretical questions, such as those of Sperner theory.
There are variant definitions; sometimes edges must not be empty, and sometimes multiple edges, with the same set of nodes, are allowed.
9.3.2 Hypergraph Type
126.96.36.199 Bipartite graph model
A hypergraph H may be represented by a bipartite graph BG as follows: the sets X and E are the partitions of BG, and (x1, e1) are connected with an edge if and only if vertex x1 is contained in edge e1 in H. Conversely, any bipartite graph with fixed parts and no unconnected nodes in the second part represent some hypergraph in the manner described above. This bipartite graph is also called incidence graph.
Many theorems and concepts involving graphs also hold for hypergraphs. Ramsey’s theorem and Line graph of a hypergraph are typical examples. Some methods for studying symmetries of graphs extend to hypergraphs.
Two prominent theorems are the Erdős Ko Rado theorem and the Kruskal Katona theorem on uniform hypergraphs.
One possible generalization of a hypergraph is to allow edges to point at other edges. There are two variations of this generalization. In one, the edges consist not only of a set of vertices, but may also contain subsets of vertices, ad infinitum. In essence, every edge is just an internal node of a tree or directed acyclic graph, and vertexes are the leaf nodes. A hypergraph is then just a collection of trees with common, shared nodes. Conversely, every collection of trees can be understood as this generalized hypergraph. Since trees are widely used throughout computer science and many other branches of mathematics, one could say that hypergraphs appear naturally as well.