
Languages Don't Matter in the Beginning
java python
Table of contents
When I was in my second year of university I was in computer science, but I began a teaching program surrounded by mostly software engineers. At some universities they donât bother making the distinction, but the important distinction at my university was languages. Computer scientists started by learning python, and software engineers learned Java (processing, which at the time ran on Java). We were originally teaching Java because of this, and then later the engineering program switched to python so we updated our course as well. One of the things that came up a bunch during this transition was that âpython is too slow, weâre setting people up for failureâ.
Itâs no secret that python is a slow language, but at this point I had been working with it a lot and had experienced some of my own python projects being significantly faster than my peers Java code. When I brought this up, the response I got was something to the effect of âI donât believe youâ, so naturally I challenged the person to a drag race.
Summing
Write a program that generates 100,000 random numbers (between 0-10,000), sums them, and prints the results
That was the challenge, and I won. I re-ran the tests with similar approaches we both took to help re-make the same point I was making. I used hyperfine
to benchmark:
hyperfine --runs 10 "python test.py"
Benchmark 1: python test.py
Time (mean ± Ï): 55.6 ms ± 2.2 ms [User: 6.2 ms, System: 0.0 ms]
Range (min ⊠max): 53.7 ms ⊠60.0 ms 10 runs
hyperfine --runs 10 "java Test"
Benchmark 1: java Test
Time (mean ± Ï): 77.7 ms ± 7.0 ms [User: 0.0 ms, System: 6.2 ms]
Range (min ⊠max): 72.0 ms ⊠87.7 ms 10 runs
Hereâs the code for each:
import random
values = (random.randint(0,10_000) for _ in range(100_000))
print(sum(values))
import java.util.Random;
public class Test{
public static void main(String[] args) {
int[] numbers = new int[100_000];
Random random = new Random();
for (int i = 0; i < 100_000; i++) {
numbers[i] = random.nextInt(10_001); // 0 to 10,000 inclusive
}
int sum = 0;
for (int num:numbers){
sum += num;
}
System.out.println(sum);
}
}
There are 2 key differences that gave us the results. In java it first creates an Array
, adds all the numbers, then sums them. The python version instead yields the values directly from a Generator
. If you donât know what those words mean, thatâs great, letâs get deeper into it.
The Java version if we break it apart is:
- Creating a object (Array) of 100_000 items
- For i in range 100_000
- Generate a random number
- Try to add a number to the list
- Create a
sum
int object starting at 0 - Iterate through every entry of the list
- Add the current value at a given index to the result
- Display the result
The python version instead:
- Setup a generator to be used later
- Consume the generator allocating a single integer at a time
- Display the result
So compared to the java version, the python version only ever has 2 integers in memory, the one for the sum, and the one for the current integer. In java we instead have at least 100_001 integers, the 1 for the sum, and the 100_000 random numbers.
Caching
Another example of something similar was talking to someone about caches. The challenge being:
Make a function that returns the fibonacci sequence of a number n
Here were the results:
hyperfine --runs 10 "python round_2.py"
Benchmark 1: python round_2.py
Time (mean ± Ï): 28.8 ms ± 1.2 ms [User: 6.1 ms, System: 4.7 ms]
Range (min ⊠max): 27.2 ms ⊠30.4 ms 10 runs
hyperfine --runs 10 "java round2"
Benchmark 1: java round2
Time (mean ± Ï): 29.823 s ± 0.599 s [User: 29.151 s, System: 0.014 s]
Range (min ⊠max): 28.127 s ⊠30.111 s 10 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Itâs easy to miss but python is running in ms, where Java is running in seconds. Meaning java is ~1000x slower.
The Java code was:
round2.java
public class round2 {
static int fib(int n){
if (n<2){
return 1;
}
return fib(n-2) + fib(n-1);
}
public static void main(String[] args) {
System.out.println(fib(50));
}
}
and in python:
round_2.py
from functools import lru_cache
@lru_cache
def fib(n):
if n <2:
return 1
return fib(n-2) + fib(n-1)
print(fib(50))
So why is python faster? Because itâs computing the results once. Java has to repeat steps over and over again where python can re-use old values if theyâre the same input. This is called memoization, and itâs a common way to speed up repeat
Get to the point
The point with these annecdotes is that python is still a slow language, but that often doesnât matter, understanding theory does. If you know how to cache, or how to avoid creating new objects, or the importance of memory allocations etc. etc. You will basically always write faster code than people in whatever other language. One of my courses in university was taking C++ code that on average ran 5-10x slower than the provided python code, and speeding it up to being on par with python. If we take a language like Java vs Python, youâre looking at a performance increase of 4-10x, maybe more or less in some cases. Thatâs great, but if you know how to cache recursive lookups, youâre looking at 1000x speedups. In a real world library I worked on the other day an algorithm change sped up the code ~670x. In other words, knowing some basic theory can make the entirety of the difference. So, if youâre just starting out, pick whatever language feels best, and just start learning.
When Languages Matter
So never use Java then? No, well maybe, thatâs up to you. The point of this post is that people focus so much on choosing a language and often miss the forrest for the trees. If you donât know how to program a language wonât save you. Learning to program takes time and effort, and which language you choose to learn the essentials is irrelevent. There are a few topics in computer science that when youâre aware of them can completely change the code you write. The approach will almost always make the biggest difference.
That being said when youâre talking about very specific problems, particularly when youâre running out of optimizations you know, itâs worth looking into other languages. For example, it is possible to write games in python, but for large 3D AAA games you need a language with better performance characteristics. This is because for games you need sub-milisecond latency to maintain framerates. That being said you can also only use those languages properly if you understand the theory necessary to write them.
Additionally, if you do end up with a successful service, or software there is always the option to upgrade incrementally, or rewrite when you understand the domain better. Often there are a handful of operations that take up most of your runtime when you first start out, and optimizing those small areas makes a world of difference. Additionally you can often find more exotic solutions to improving performance. Most machine learning systems for example use numpy, which is actually a C library that interfaces with python. These approaches exist in tons of languages (V8 with JS, FFI, etc.).
Conclusions
So, stop procrastinating by reading arguments online about which language to start learning, and just start learning one. If the simplicity of python feels good, do that, if you like the idea of using C, great, if you like Java, go for it. If you end up habitually running into issues, identify the problems in your approach and improve. When you think you have a good reason to, start adding other languages to your toolbelt. Iâve started doing this recently and itâs great, but starting with python was also probably the best choice I made.