When I first discovered the world of computer programming, my search history was filled with; “How many lines of code did it take to build google”, “How many lines of code did it take to build Facebook” etc. Sounds Familiar?
Well, a lot of beginners are curious about this question, thinking it is about the number of lines you write. This usually brings into context the first assumption beginners usually make which is the more lines of code one writes, the better it is.
def my_sum(vals):
result = 0
for element in vals:
result += element
return result
Is the code above familiar? We will come later to it.
Early after I wrote my first hello world, I pride myself in writing “hardcore” code — this was my way of saying I solved coding challenges from first principles. I cared little about things like execution speed.
I enjoyed my first hacker rank questions as everyone did theirs. Sometimes after doing this, I will look at the “better” answers provided and be like “mine worked, so no big deal to it”.
This was something I gradually learned was bad and that one needs to actually write better code. I will say after that, some things changed a bit.
My point here is, one should be aware of writing efficient code from the early onset of learning. It is not about the number of lines of code you write, but what the code does and how efficiently it does what is expected.
Let me share a recent story. I started doing my first internship as a Data Science Intern early this year. My first task was to clean a dataset — sighs. I am not going to lie, I am one of those that enjoy the pain that comes with cleaning data. After that, the next task was to perform Exploratory Data Analysis (EDA) on it. Then came the day I had to submit my result and do a code review. A Data Scientist in my team looked at the code and said “Ahh, so you love writing lines of code right ?”.
Immediately, I think it occurred to me what are the issues. It is one of those old habits again which is implementing functions from scratch. He replaced several lines of code with about three different lines which incorporated some pandas functions. No !, I am not going to shame myself and show that particular code. Upon replacement, the execution time on the dataset was faster and the code was more readable and cleaner. This actually made me revisit something I forgot a while back. That is if you need to perform a certain task, try to look up the documentation if there is a function for it. In most cases, there is. Do not go around implementing pre-available functions from scratch in a very inefficient manner.
Now to the code above. The function performs the same thing as the sum function in python. It will sometimes look as if there is no difference in execution time. That said, lets us open a notebook and inspect these differences.
If we execute our hardcore written function in a new cell from the above, we have
%timeit my_sum(a)
# gives the result below
# 346 ns ± 9.97 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit sum(a)
# gives the result
# 141 ns ± 2.55 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Can you clearly see the difference in execution time?
The above is just a basic example. There are several examples one can highlight. This is why people study algorithms and try every day to find a way to maximize the efficiency of these algorithms in every field including Machine learning.
A faster runtime is why we can comfortably sit and query google and it will efficiently filter millions or billions of results in trying to find what we ask for. That is why over a million people can query our GPT friend and it will spit out results efficiently. Efficiency is what makes programming. Next time you ask, try to ask how efficient is the code rather, than how many lines of code a particular developer has written.
Always remember, Simple is better than complex!
If you like this article, follow me for more!
If you noticed a mistake, have suggestions to improve the article, or want to reach out, feel free to message me on LinkedIn