Jump to content
Science Forums

Recommended Posts

Posted

When people sign up at Hypography (and other forums and sites) they have to fill in a form with letters and numbers from a graphic. This graphic is often obscured to a point where it is hard to see what it actually says.

 

I learned from Wikipedia that CAPTCHA is acronym for "completely automated public Turing test to tell computers and humans apart".

 

Here is an example:

 

What I'm wondering about is how do you set up to crack a thing like this? Obviously it has been done, because we see bots sign up at Hypography, and I'm sure we're not the only target.

 

Any ideas?

  • Replies 49
  • Created
  • Last Reply

Top Posters In This Topic

Posted
What I'm wondering about is how do you set up to crack a thing like this? Obviously it has been done, because we see bots sign up at Hypography, and I'm sure we're not the only target.

 

Any ideas?

Me thinks the bots like Googlebot and Slurp possibly phone home when they encounter such requests and a human at the other end helps them out. They only need human help for the initial registration then all they need is their username and password for future visits.

 

Just thinking out loud,

Posted

different kinds of OCR software choke on different kinds of things.

 

the ocr on my super-high end hp scanner will just spit out garbage if the letter forms aren't super clean, but it can read distorted ones.

 

the cheapo stuff on my epson at home does pretty good on dirty letters (like a fax or something) but the baseline had best be within a degree or two of level, or no dice.

 

check out "whatthefont" for an example of how an automated program can id fonts and such.

 

TFS

Posted

Call center software could handle this perfectly along the lines of what C1ay described, but would not require any knowledge of any particular language.

 

Great for outsourcing!

 

My name is Jane, how service may I you,

Buffy

Posted

I have seen software that makes blogs on blogger, and posts articles and links, it is fully automated after setup - except for this step. So the human just gets shown a pic on the screen and enters what it says, thats it.

 

I dont think this means that it cant be cracked, but is probably just easier to not bother. With one that is fairly simple like the above, black and white no orientation changes, I think it could be cracked with some funky software.

Posted

I’m don’t think that code to crack CAPTCHA, BAFFLETEXT, and similar anti-spam tools, is widely or at all implemented – as previous posters have noted, it’s likely cheaper to employ a human to do the task. I read in journalist Leo Bruno’s 11/2003 SciAm article Innovations: Baffling the Bots that some academics have worked on such schemes as “a kind of mind sport”, but suspect that that such work hasn’t found its way from the academic to the commercial world.

 

:doh: Hypothetically speaking as a greedy hacker, if I intended to write such a program, I’d not approach it as the high-minded academic exercise in AI these academics have, but as a reverse engineering project. CAPTCHA take a simple random text parameter, some random numeric parameters, and generate a graphic from this data. By knowing the range of possible parameters, and using an “fit” measuring algorithm, I suspect one could write a program to efficiently find the parameters for a particular CAPTCHA graphic, including the text. It likely wouldn’t be necessary to truly reverse engineer CAPTCHA, only have your own copy to generate graphics to compare to the target graphic.

 

Given how much easier it is to use humans, and the possibility of legal action, I doubt that anyone will try this soon – though it never pays to underestimate human industry and ingenuity when it come to making a $buck$.

Posted

It seems that humans have more of a problem deciphering individual CAPTCHA letters than computers do. But humans beat computers only in discerning individual letters. If the letters are intertwined or tangled, the computer's busted. I personally think that the background and the letters are too far apart in the colour range, so that the letters themselves could be easily 'lifted' out of the background to be deciphered. Photoshop's "magic wand" tool is a simple example of how this is achieved programatically.

 

In my opinion, to completely baffle the computers, the letters should be filled in with a texture layer that's from a random graphic, and the background should be also filled in with a random graphic. The two should stand out pretty visibly for the human eye, but the random colours assigned to neighbouring pixels will confuse the computer no end - as well as the random colours inside the text as well - there'd be no easy way to pick it up programatically.

Posted

Somewhere in the board software is a table/etc. used to check whether the user has entered the correct code for the image.

 

I'd guess either it's not that hard to find/crack, or else the tables have been dumped for the major brands (such as vBulletin) and shared among spammers.

 

My 2 cents. :shrug:

 

moo

Posted
In my opinion, to completely baffle the computers, the letters should be filled in with a texture layer that's from a random graphic, and the background should be also filled in with a random graphic.

That’s a bit like how BAFFLETEXT works.

 

To my thinking, simple language and common knowledge puzzles, such as “Enter a word meaning the opposite of ‘good’” offer hard-to-defeat CAPTCHAs, and are easier for the less graphically capable to implement. Unfortunately, such test also weed out a considerable number of human beings.

Posted
I personally think that the background and the letters are too far apart in the colour range, so that the letters themselves could be easily 'lifted' out of the background to be deciphered.

I have actually encountered such tests that resembled a test for color blindness. On one that I recall the text was made of colored orange dots on a background made of red and yellow dots. A color blind person would not have been able to make the text out.

Posted

Forms of daltonism other than the red-green one are exceedingly rare, this problem could be avoided. At the worst, a phone number could be offered for the rare people unable to take the test and poor Tormod ;) would only have to guess whether he was hearing a voice synthesizer.

 

To avoid what Moo says, I would expect the character sequences to be generated randomly and the images from them.

 

A bot might be designed to test various areas of various sizes for a few types of average in order to spot the difference between background and typeface; this would have to be confounded by having variability at all scales. The boundary would then be recognizeable only as a more abrupt change, requiring a more sophisticated 'bot.

 

Once the face were lifted out from background, you want to avoid things such as separation too. The image in the first post has only two of the characters in contact, and barely so. A relatively simple topological analysis would pick them out. Have characters of separated strokes as well as ligatures between different characters.

Posted

Hmmm... are the individual character images stored in a file (dll etc.) and then assembled for display? If so, anyone with a copy of vBulletin could rip 'em for image comparisons.

 

[EDIT] Btw, is it really a good idea to hash this stuff out in an open forum?

 

moo

Posted
Hmmm... are the individual character images stored in a file (dll etc.) and then assembled for display?

No, they are generated on the fly by programs like ImageMagick and the image generated for a given text, usually created at random, varies as the image generator applies different quantities of effect....

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...