Cracking CAPTCHA

Tormod · October 31, 2006

When people sign up at Hypography (and other forums and sites) they have to fill in a form with letters and numbers from a graphic. This graphic is often obscured to a point where it is hard to see what it actually says.

I learned from Wikipedia that CAPTCHA is acronym for "completely automated public Turing test to tell computers and humans apart".

Here is an example:

What I'm wondering about is how do you set up to crack a thing like this? Obviously it has been done, because we see bots sign up at Hypography, and I'm sure we're not the only target.

Any ideas?

C1ay · October 31, 2006

What I'm wondering about is how do you set up to crack a thing like this? Obviously it has been done, because we see bots sign up at Hypography, and I'm sure we're not the only target.

Any ideas?

Me thinks the bots like Googlebot and Slurp possibly phone home when they encounter such requests and a human at the other end helps them out. They only need human help for the initial registration then all they need is their username and password for future visits.

Just thinking out loud,

TheFaithfulStone · October 31, 2006

different kinds of OCR software choke on different kinds of things.

the ocr on my super-high end hp scanner will just spit out garbage if the letter forms aren't super clean, but it can read distorted ones.

the cheapo stuff on my epson at home does pretty good on dirty letters (like a fax or something) but the baseline had best be within a degree or two of level, or no dice.

check out "whatthefont" for an example of how an automated program can id fonts and such.

TFS

Buffy · October 31, 2006

Call center software could handle this perfectly along the lines of what C1ay described, but would not require any knowledge of any particular language.

Great for outsourcing!

My name is Jane, how service may I you,

Buffy

InfiniteNow · October 31, 2006

My name is Jane, how service may I you,

Yes, you're right... I am a MCP.

FYI - Craig's post does not appear in this thread. :)

Jay-qu · October 31, 2006

I have seen software that makes blogs on blogger, and posts articles and links, it is fully automated after setup - except for this step. So the human just gets shown a pic on the screen and enters what it says, thats it.

I dont think this means that it cant be cracked, but is probably just easier to not bother. With one that is fairly simple like the above, black and white no orientation changes, I think it could be cracked with some funky software.

CraigD · November 4, 2006

I’m don’t think that code to crack CAPTCHA, BAFFLETEXT, and similar anti-spam tools, is widely or at all implemented – as previous posters have noted, it’s likely cheaper to employ a human to do the task. I read in journalist Leo Bruno’s 11/2003 SciAm article Innovations: Baffling the Bots that some academics have worked on such schemes as “a kind of mind sport”, but suspect that that such work hasn’t found its way from the academic to the commercial world.

:doh: Hypothetically speaking as a greedy hacker, if I intended to write such a program, I’d not approach it as the high-minded academic exercise in AI these academics have, but as a reverse engineering project. CAPTCHA take a simple random text parameter, some random numeric parameters, and generate a graphic from this data. By knowing the range of possible parameters, and using an “fit” measuring algorithm, I suspect one could write a program to efficiently find the parameters for a particular CAPTCHA graphic, including the text. It likely wouldn’t be necessary to truly reverse engineer CAPTCHA, only have your own copy to generate graphics to compare to the target graphic.

Given how much easier it is to use humans, and the possibility of legal action, I doubt that anyone will try this soon – though it never pays to underestimate human industry and ingenuity when it come to making a $buck$.

Drip Curl Magic · November 4, 2006

I've been noticing an increase in advertisment bot sign ups.

I've been wanting to suggest a CAPTCHA, but i had no idea what the name of it was.

I was planning on finding out.... but I guess T is already on it. Bravo.

Boerseun · November 5, 2006

It seems that humans have more of a problem deciphering individual CAPTCHA letters than computers do. But humans beat computers only in discerning individual letters. If the letters are intertwined or tangled, the computer's busted. I personally think that the background and the letters are too far apart in the colour range, so that the letters themselves could be easily 'lifted' out of the background to be deciphered. Photoshop's "magic wand" tool is a simple example of how this is achieved programatically.

In my opinion, to completely baffle the computers, the letters should be filled in with a texture layer that's from a random graphic, and the background should be also filled in with a random graphic. The two should stand out pretty visibly for the human eye, but the random colours assigned to neighbouring pixels will confuse the computer no end - as well as the random colours inside the text as well - there'd be no easy way to pick it up programatically.

moo · November 5, 2006

Somewhere in the board software is a table/etc. used to check whether the user has entered the correct code for the image.

I'd guess either it's not that hard to find/crack, or else the tables have been dumped for the major brands (such as vBulletin) and shared among spammers.

My 2 cents. :shrug:

moo

CraigD · November 5, 2006

In my opinion, to completely baffle the computers, the letters should be filled in with a texture layer that's from a random graphic, and the background should be also filled in with a random graphic.

That’s a bit like how BAFFLETEXT works.

To my thinking, simple language and common knowledge puzzles, such as “Enter a word meaning the opposite of ‘good’” offer hard-to-defeat CAPTCHAs, and are easier for the less graphically capable to implement. Unfortunately, such test also weed out a considerable number of human beings.

ronthepon · November 5, 2006

How are the images corelated to the words to be typed anyway? Is it stored away is some kind of a database?

Because moo's thoughts seem much more realistic than AI cracking these twisted word groups.

C1ay · November 5, 2006

I personally think that the background and the letters are too far apart in the colour range, so that the letters themselves could be easily 'lifted' out of the background to be deciphered.

I have actually encountered such tests that resembled a test for color blindness. On one that I recall the text was made of colored orange dots on a background made of red and yellow dots. A color blind person would not have been able to make the text out.

Qfwfq · November 7, 2006

Forms of daltonism other than the red-green one are exceedingly rare, this problem could be avoided. At the worst, a phone number could be offered for the rare people unable to take the test and poor Tormod ;) would only have to guess whether he was hearing a voice synthesizer.

To avoid what Moo says, I would expect the character sequences to be generated randomly and the images from them.

A bot might be designed to test various areas of various sizes for a few types of average in order to spot the difference between background and typeface; this would have to be confounded by having variability at all scales. The boundary would then be recognizeable only as a more abrupt change, requiring a more sophisticated 'bot.

Once the face were lifted out from background, you want to avoid things such as separation too. The image in the first post has only two of the characters in contact, and barely so. A relatively simple topological analysis would pick them out. Have characters of separated strokes as well as ligatures between different characters.

moo · November 7, 2006

Hmmm... are the individual character images stored in a file (dll etc.) and then assembled for display? If so, anyone with a copy of vBulletin could rip 'em for image comparisons.

[EDIT] Btw, is it really a good idea to hash this stuff out in an open forum?

moo

C1ay · November 7, 2006

Hmmm... are the individual character images stored in a file (dll etc.) and then assembled for display?

No, they are generated on the fly by programs like ImageMagick and the image generated for a given text, usually created at random, varies as the image generator applies different quantities of effect....

moo · November 7, 2006

Ah ok. Thanks. ;)

Well, here's part of the problem...

http://sam.zoy.org/pwntcha/

moo

Sign In

Cracking CAPTCHA

Recommended Posts

Tormod

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

C1ay

TheFaithfulStone

Buffy

InfiniteNow

Jay-qu

CraigD

Drip Curl Magic

Boerseun

moo

CraigD

ronthepon

C1ay

Qfwfq

moo

C1ay

moo

Join the conversation

Browse

Activity