Intro
High school CTF team View Source and I participated in AmateursCTF 2023, placing 2nd both overall and in the student division. Although there were over 64 challenges to tackle throughout the four-day submission period, I personally only put emphasis on the OSINT and algorithm categories. Within these categories lay an interesting challenge: the gcd-query
series, which I solved with an implementation of a very special algorithm. This was my process (paired alongside a lengthy analogy)!
gcd-query-v1
I wonder if this program leaks enough information for you to get the flag with
less than 2048 queries… It probably does. I’m sure you can figure out how.
nc amt.rs 31692
We’re initially provided with an attachment main.py
and a remote server amt.rs:31692
. The server component contains the following:
Let’s go over step-by-step what this server is up to:
- For ten iterations, a long
x
is created by pycrypto’sgetRandomInteger(n)
, which returns a random integer with up to bits in length. ; this is an absolutely mindbendingly large number——up to 617 digits long! You absolutely do not want to see what 617 digits looks like in decmimal:
- For each iteration of
x
, the user gets prompted to enter two integersn
andm
. Once the assertion thatm > 0
is passed,n
andm
are passed into a functiongcd(x + n, m)
, which returns the greatest common divisor ofx + n
andm
. This occurs for 1412 iterations. - After the iterations have completed, the user is then prompted to guess the value of
x
. If the guess is correct, the next iteration ofx
begins. This process is repeated nine more times until the flag is printed.
Here is a quick visual depicting what’s going on:
Paying close attention to the right side of this graphic, we can see that there’s only a couple specific points at which we can interact with the server: when we pick the n
and m
to send, and when we guess the value of x
. The question now is: what values should we be picking for n
and m
which reveal the most information about x
, and how do we use this information to obtain its actual value?
The Chinese Remainder Theorem
Recall: The modulus is the remainder of Euclidian division (division with remainder) of one number by another. For example, .
However, this is different from the congruence modulo relation, represented by the congruence symbol and often expressed as . When two numbers and are congruent modulo , it means that:
- and have the same remainder when divided by
- is divisible by (i.e. )
- There is an integer such that
As such, is true, but is obviously false.
We start with a concept called a “system of congruences.” A system of congruences is a set of equations of the form , where , , and are integers. The values are called the moduli of the system. Here’s a quick example of this:
In this system, we have three congruences with moduli , , and . The goal is to find a value for that satisfies all three congruences simultaneously.
Thus, we can apply the Chinese Remainder Theorem:
Chinese Remainder Theorem: Given pairwise coprime integers and arbitrary integers , the system of simultaneous congruences
has a solution, and the solution is unique modulo .
Note: Although the Chinese Remainder Theorem is often stated with pairwise coprime moduli (meaning that for a set of moduli , for all ), it can be extended to non-coprime moduli. However, doing so does not guarantee a solution — this will become increasingly relevant as we get towards our implementation process.
You may be asking: what the hell does this have to do with guessing the giant integer that we’ve been given? Well, I’ve concocted a little example here to demonstrate how we can use this theorem to our advantage.
Tne Modular Arithmetic Nerd’s Favorite Carnival Game
Let’s say little Bob over at the bottom right goes to a carnival game booth and is asked to guess a number on a ball behind the operator. Obviously, since we’re omnipotent observers in this fantastical 2D universe of cute little cartoon circle people, we know that the number is . However, Bob doesn’t know shit. He’s really good at modular arithmetic though, so he’ll have a lot of fun with this one.
Bob’s told that he can give the operator a piece of paper with two integers of his arbitrary choice: n
and m
. As long as m
is above 0, the operator will always give him back a piece of paper with n
and m
passed into gcd(x + n, m)
. However, the operator’s shift is about to end soon, and he estimates that he’ll probably accept only about three pieces of paper from Bob until he closes shop.
Bob goes back to his table. He’s flabbergasted. How in the world is he going to guess that number with only three pieces of information?
He rummages around his little noggin and recollects himself. Let’s see what he’s thinking:
Uh… thanks, I guess? Well, he has a good point, but since I guarantee that nobody read it (because it’s too long for the average CTF player’s attention span) I’ll give a brief TL;DR here.
Bob’s saying that per the definition of a “greatest common divisor,” in the scenario , both and is true. Since we’re given the function d = gcd(x + n, m)
, we can therefore say that and .
We can introduce an integer into the mix and rewrite as . Let’s algebraify this up to get to the state that we want it to:
Replacing with the gcd()
function:
Doesn’t that look very, very familiar to the system of congruences that we were talking about earlier? Now, all we need to do is decide what values of and to pick.
Bob’s decided that his three attempts is nowhere near enough attempts to do anything reasonable with a fixed offset . He’s discovered something a bit more clever: what if you changed the value of every time? In doing so, it provides information about the offset from 0 modulo that GCD. He’s selected the following values for :
Note: Bob’s chosen negative values for and because of the earlier relation established, . Making negative creates positive remainders.
For , Bob chooses a very large primorial:
For the th prime number , the primorial is defined as the product of the first primes:
where is the prime number.
Primorials have the special property in that since they’re the product of the first primes, they’re guaranteed to have a lot of prime factors. When thrown into the gcd()
function, this will give us tons of information about the prime factors of since we’re a lot more likely to get a hit (a miss would be if ).
Bob’s ended up deciding on . He pulls out his laptop and calculates it with Python:
Bob’s now ready to go! He walks up to the operator and hands him his pieces of paper. The operator hastily hands him back three pieces of paper with the resulting GCDs:
Now he knows that:
and he can apply the Chinese Remainder Theorem to solve for . Bob opens back up his laptop and runs the following code:
Bob’s got the number! Congratulations, Bob!
Implementation
Hopefully through this example, you’ve gained a bit of intuition on where CRT is derived from, why we chose those particular values, and why it works. Now, let’s apply this to the actual challenge.
Here is the script that I used to solve this challenge. It’s very straightforward and readable in comparison to other scripts I’ve seen, so I felt it was redundant to go through the step-by-step process. I’ve added comments to explain what’s going on.
Let’s run the script on the remote server:
We’ve solved gcd-query-v1
!
gcd-query-v1: amateursCTF{probabilistic_binary_search_ftw}
gcd-query-v2
I thought that skittles1412’s querying system wasn’t optimized enough, so I
created my own. My system is so much more optimized than his! nc amt.rs 31693
Of course there’s a continuation. Let’s see what attachment we’re given now:
It seems that they haven’t changed much. The only things that are different are:
getRandomInteger()
’sn
value has been reduced from 2048 to 128 bits (~39 digits)- We no longer need to complete ten iterations of different random integers; now it’s only one iteration of a single random integer
- We only get 16 iterations of
gcd()
instead of 1412
Well, first step is to try and rerun the same script that we used for “gcd-query-v1” with some minor edits:
Well, that didn’t work. We’re correctly parsing input and a number is being generated, but for some reason the server is telling us to “get better lol”.
I added some print statements to see what we were getting in our moduli and remainder arrays:
Wow, check out that moduli array… that’s not even nearly enough prime factors to accurately apply CRT. Let’s increase the primorial then for increased chances:
Let’s try running the script again:
We’ve managed to solve the entire challenge with only 16 queries!
gcd-query-v2: amateursCTF{crt_really_is_too_op...wtf??!??!?!?must_be_cheating!!...i_shouldn't've_removed_query_number_cap.}
Afterword
Thanks to everyone from les amateurs for hosting this CTF! I had a lot of fun solving these challenges and I hope to see more from you guys in the future. I’d also like to credit Quasar, SuperBeetleGamer, and flocto for helping me wrap my head around CRT in general throughout the process of writing this (because I almost always learn along the way). I hope you learned something like I did!
Sources: