Open-Source: Week 3

Eduardo Ahumada
7 min readJun 29, 2021

--

Hello everyone! In today's post, I will be sharing with you some of the things that I learned during my 3rd week working on OS projects. I will be talking about the lessons that I have learned, and the areas of opportunity that I discovered to improve.

Photo by James Harrison on Unsplash

Things that I have learned from OS

The importance of testing

Every project in OS will have a way of testing their code, in the same way when you collaborate you will have to test the code you changed or create with the existing tests or you might also have to create those tests.

Both of these situations happened to me during my collaborations, I had to make changes and then execute the tests to make sure everything was working as expected. In another contribution, I had to create new tests.

Here happened 2 different things, for example, in one issue from an OS project I worked with refactoring some lines of code. Before implementing the refactor everything was running ok, after the refactor there appeared some failures, so in the end I had to do the refactoring and then some problem solving by understanding the failures and then going to the code to find what things can be changed so everything works correctly. Fortunately, I manage to fix the errors and merge my changes.

In another project, I implemented an enhancement of code where I had to create validations for the parameters of a class. Since I created more code, I had to create tests for the newly implemented methods and validations. But this new code needed to pass the already existing tests and the newly created tests.

In a nutshell, knowing how tests work, how to implement them, and interpret them is an ability necessary to have when contributing to OS. But don’t worry, if you don’t know this beforehand you will learn these things during your contributions, nevertheless, I would suggest you check this so you won’t struggle during your contributions, at least with this.

Extra things

One piece of advice that could have saved me a lot of time is: use the correct tools to search through the projects where you can collaborate. I wasted a lot of my time trying to find projects in tools that didn’t help me, but that might have helped others. The best tool to find projects to collaborate for me was this, but find something you feel comfortable searching and the most important thing, be patient but be smart. Choose a project before you think you are ready to work on it, because you might never think you are ready, but trust me, you are.

The only way of discovering in which projects you can collaborate is to is by start trying to work in something and learning from that experience.

Last but not least, make sure you know git and Github, knowing them will also save you a lot of time.

Photo by Oskar Yildiz on Unsplash

My learning of CS concepts

While I’m working on my OS projects, I have also learned about different CS concepts. Here I will make a quick summary of some of the concepts that I studied.

The Set

Sets are unordered groups of unique items.

  • Used when the order of values to be stored is meaningless.
  • Also used when you need to ensure that no items in the group exist more than once.

Common operations:

  • add(e): To add an item in the set or produce an error if it already exists.
  • list(): To create a list from a set.
  • delete(e): To remove the specified e element.

The Tree

  • The Tree employs non-contiguous memory cells, like linked lists, so you can have your data distributed in memory.
  • Cells have pointers to other cells also like linked lists.
  • Cells and pointers are not arranged in a linear manner like in linked lists, they are arranged as a tree, with, for example, a cell pointing to more than one cell.
  • Trees are especially useful for hierarchical data, like file directories.
  • In trees, cells are called nodes.
  • The pointers from one cell to another are called edges.
  • The only node that doesn’t have a parent is called Root Node.
  • Nodes in trees must have exactly one parent.
  • Two nodes that have the same parent are siblings.
  • Node’s parent, grandparent, great-grandparent, and so on… Constitute the node’s ancestors.
  • Node’s children, grandchildren, great-grandchildren, and so on… Are the node’s descendants.
  • Nodes that don’t have any children are leaf nodes.

We can give other characteristics to trees like:

  • Path: The path between two nodes is a set of nodes and edges that lead from one node to other.
  • Level: The size of its path to the Root Node.
  • Height: Is the level of the deepest node in the tree.
  • Forest: A set of trees.

Binary Search Tree

This Is a special type of tree that can be efficiently searched, I’m going to talk about what efficiently here means later.

  • A node in this type of tree can have at most 2 children.
  • Nodes are positioned according to their value/key.

Some rules are:

  • Children nodes to the left of the parent must be smaller than the parent.
  • Children nodes to the rightmost are greater.

If the tree follows this property, then it would be easy to search certain nodes with a given key/value within the three.

The algorithm to search in a tree can be:

  1. Save the root node from the tree.
  2. While there is a node do:
  3. If the node value is equal to the value you are searching for, return it.
  4. If the value you are searching for is greater than the value of the node, go to a node on the right side, so you can search for greater values.
  5. Otherwise, If the value you are searching for is smaller than the value of the node, go to a node on the left.
  6. If none of those happen, we return for example, not found.

To insert a value in the tree we can do:

  1. Save the root node.

While there is a node do:

  1. Save the last node, which the first time is the root node.
  2. if the new node value is greater than the current node, go to a node in the right.
  3. Otherwise, go to a node on the left.

Then:

  1. Out of the while, if the new node value is greater than the last node value, assign the value of the new node to the right of the last node.
  2. Otherwise, assign the value of the new node to the left of the last node.

Other important terms:

  • Tree balancing: A perfectly balanced tree has the minimum possible height. Tree balancing is an expensive operation, as it requires sorting all nodes.
  • Self-balancing binary trees: To efficiently handle binary trees that change a lot.
  • Some types of self-balancing trees are Rd-Black trees and AVL trees.
  • In database systems, B-Trees are used.

The Binary Heap

This is a special type of Binary Search Tree where we can find the smallest or highest item instantly.

  • Especially useful for implementing priority Queues.
  • It costs 0(1) to get the minimum or maximum item.
  • Searching and inserting still cost 0(log n).
  • Rules for binary heap are the same as the binary search tree except for 1: A parent node must be greater or smaller than both of its child nodes.

Use the Binary Heap when you must work with the maximum or minimum item of a set.

The Graph

Graphs are similar to Trees. Nevertheless:

  • There are no children or parent nodes, so there is no root node.
  • Data is freely organized as nodes and edges, any node can have multiple incoming and outgoing edges.
  • Graphs can represent almost any kind of data.

The Hash Table

A data structure that allows us to find elements in 0(1) time. This means that searching for an item takes a constant time.

  • Requires preallocating a big chunk of sequential memory to store data like arrays.
  • Nevertheless, items aren’t stored in an ordered sequence.
  • The position that an element occupies is given by a hash function. This function takes data you want to store as input and outputs a random-looking number. That number is interpreted to be the memory position where the item will be store at.

Disadvantages:

  • Hash collision: The hash function might return the same memory address for 2 different input values.

Other facts:

  • The larger the range of values the hash function can output, the more data positions are available, and the less probable a collision can happen.
  • Often used to implement maps and sets.
  • Allow faster insertions and deletions than tree-based data structures.
  • They require a very large chunk of sequential memory to work properly.

This has been everything for this blog, I can only say that I’m learning more than ever, and besides, the best part is that I’m actually enjoying a lot to work on OS. I hope you learned something, see you soon, thank you!

Photo by Kelly Sikkema on Unsplash

--

--

Eduardo Ahumada

Engineer looking to help and contribute. Learning about Software development and Computer Science.