Introduction to Data Structures

·

6 min read

Data Structures

You will hear about them no matter in what stage of SE career you are at. Understanding them, how they work, is essential if you want to become a good programmer.

When you get a good grasp of them, it will help you with either your career or coding interviews. If you want to become well-versed, successful, you need to understand data structures.

Bad programmers worry about the code. Good programmers worry about data structures and their relationships. — Linus Torvalds

What are they? Which ones exist? Pros and cons of each one?
If this is looking hmm to you, well all of us were in that phase. But don’t worry! I will try to explain all of those questions in this introductory blog post. You got it!

What are Data Structures?

In computer science, a data structure is a data organization, management, and storage format that enables efficient access and modification.

So essentially, the data structure is a way to efficiently manipulate and organize data.

We are using data structures every day! For example:

  • In Microsoft Word, undo and redo operations? — Stacks!
  • Social networks feeds? — Linked list or Hash table!
  • Storing an image? — Arrays!
  • Go back functionality in browsers? — Stacks!
  • Friend list, connections, who is a friend with who? — Graph!
  • etc.

Data structures are everywhere!

They are the backbone of every good and quality software and software engineer!

Which Data Structures exist?

I will cover the following Data Structures.

Array

In computer science, an array data structure*, or simply an **array**, is a data structure consisting of a collection of elements (values or variables), each identified by at least one array index or key.*

Linear array or one-dimensional array is the simplest type of data structure. Arrays are present everywhere! Basically, it is just a continuous stream of cells.

Array pros:

  • They are quick! If you know the exact number, that data is retrieved almost instantly.

Array cons:

  • Arrays have to be continuous. Before, it was quite common to run into memory problems as they grow in size. Memory management is nowadays a lot more automated/secure (Python, JavaScript), but you still have to keep in mind this, in order to either write or troubleshoot code well (C, C++).

We can find the Array data structure here:

  • Sorted leaderboards
  • 2D arrays, matrices, are used in image processing and manipulation

Linked List

A linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next.

A linked list is not continuous, or at least it doesn't have to be.

It consists of a collection of nodes. Basically, each node contains data and a pointer to the next node in the sequence. The last node contains a pointer to nothing (it has a value of Null/None).

The first and last nodes are known as HEAD and TAIL ones.

Linked List pros:

  • You can easily add or delete a node. All you need is to change the pointer, as each node is aware only of the node next to it (or node before it, if it’s double-linked)

Linked List cons:

  • They can be slow because, in order to find a node, you have to (in the worst case, TAIL has the data you are looking for) traverse the whole Linked List.

We can find the Linked List data structure here:

  • Previous web pages in browsers
  • Switching music in music players
  • Turns in mobile games

Hash Table

Hash Table is a data structure which stores data in an associative manner. In a hash table*, data is stored in an **array format, where each data value has its own unique index value. Access of data becomes very fast if we know the index** of the desired data.*

So essentially, it uses arrays as an underlying data structure. Hash Table functions in a key-value manner. It has a function, which computes an index (hash node). That index is used for data (value) to be stored.

Hash Table pros:

  • As per the nature of Hash Table functioning, they are fast when it comes to the addition or retrieval of data

Hash Table cons:

  • The hash function can cause a collision! There are workarounds, but you need to keep in mind that two different inputs can cause the same output, which could make a problem in memory and potential data loss.

off-topic: Recently, there was a collision problem in Apple’s security system. Read more about it here.

We can find the Hash Table data structure here:

  • Databases, passwords
  • Searching in web-browsers
  • Files on devices, filename, and file path

Stack and Queue

Stack is a container of objects that are inserted and removed according to the last-in first-out (LIFO) principle.

You can think of a stack as a pile of plates. You clean them, and the last one you place on top, you will use it first. So, last-in -> first-out.

Queue is a container of objects (a linear collection) that are inserted and removed according to the first-in first-out (FIFO) principle.

You can abstract queue as people waiting to buy something in store. The first one who came will be the first one who will get to buy and go. So, first-in -> first-out.

Stack and Queue pros:

  • They are efficient in addition or removal of data

Stack and Queue cons:

  • They don't have many use cases. That doesn't mean you shouldn't learn them!

We can find the Stack data structure here:

  • Undo operation
  • Syntaxes in programming languages
  • Virtual Machines
  • Browser history
  • Call logs, e-mails, gallery, notifications

We can find the Queue data structure here:

  • Operating system, job scheduling
  • Sending e-mail, it’s queued and sent in blocks
  • Server responding to request
  • Uploading and downloading files

Graphs and Trees

Tree and graph come under the category of non-linear data structures where tree offers a very useful way of representing a relationship between the nodes in a hierarchical structure and graph follows a network model.

How to differentiate them?

Tree data structure has to be connected, and can not have loops.

Graphs data structures don't have such restrictions.

They are both hierarchical data structures. You can abstract that as a business, with CEO being on top with his associates beneath, then the people below him, who have their own associates, and so on.

Each node is aware of the next node (kind of like a linked list). It contains data and the pointer to the next one.

Graphs and Trees pros:

  • They provide a quick addition of data, as well as searching e.g. binary search.

Graphs and Trees cons:

  • Modifying data can be tricky sometimes, as you don't have all node indexes at once and you maybe have to traverse them.

We can find the Graph data structure here:

  • Facebook Graph API
  • Dijkstra algorithm which finds the smallest path between two nodes, used in maps for example
  • Facebook, Instagram, social media networking

We can find the Tree data structure here:

  • XML parser
  • Domain Name Server — DNS
  • File explorer of any device
  • Posting questions on Quora, Reddit, etc.

Conclusion

These data structures will give you a good foundation for algorithms.

But don’t forget: You will become an expert at them only if you start applying them!

It can be frustrating from time to time, but solving those problems will give you an experience, something you can not learn.

I wish you the best of luck, you got it!

_ktb