Something Kinda Cool
Ever since I got a job in the tech industry I wanted to become independent… surely a universal sentiment. I find it very achievable - I just put up a service which people will use and I ask for payment in return. It’s not as hard as freelancing, no need for an accountant this time. I know about the cloud, so high availability is sorted out. I know about programming, so I can turn an idea into software. I am motivated.
But what idea? According to my bacherlor’s education I’m supposed to do a SWOT analysis and all sorts of business procedures and then I’d have something to work on, but I’ll do something different. I always had an admiration towards black hat hackers -people that break stuff for profit- and ripping off the government is easier.
My country’s civil registry has a form that lets people query their ID card status, as my country issues IDs. Your identifier assigned to us at birth plus the number of your ID - that is, the physical card - are used for authentication on some platforms, like banks. It’s kinda like a SSN, it should be kept secret. This publicly available form has a CAPTCHA and is no good for programmatic use, unless you find a way to break that CAPTCHA. Well, for my masters I had to do a meta study on breaking CAPTCHAs using AI. Guess who’s gonna give it a shot?
Actually, I already implemented it. But it’s kinda primitive and it has about a 50% probability of getting the CAPTCHA right. My segmentation is bad and I should be ashamed, because the CAPTCHA software they used is open source and I managed to reverse completely the distortion:
Traditional CAPTCHA solving - that is, without the usage of neural networks - consists of three stages
- Pre processing
- Segmentation
- Recognition
During pre processing you try to make your image as clean as possible, free from noise and stuff like that. You also binarise it - turn it black and white and nothing else. Segmentation, the hardest stage, is where you take cut each character for later feeding into the recognition engine. Recognition is easy, most of whom I read implemented a simple KNN model for it.
I cheated basically and still failed. Oh well. It takes from 10 seconds to 60 seconds to succesfully go thru the forum ONCE because of the bad segmentation. That is unacceptable and I was thinking of about 5 seconds maximum. The library they used is really shitty though. It doesn’t pluck out the upper case “i” and lower case “L” so even if you have a perfect recognition score there’s no way to recognise that.