Pyrotex Posted December 13, 2006 Report Posted December 13, 2006 KickAssClown asked the following question in a PM and I thought it might be interesting to many Hypo members: "From what I hear, NASA has a Zero failure policy for their software. That is, NASA's software is supposed to be bug free, and held to some of the most rigorous QA standards in the industry. Is this true?" Well, yes and no. :( NASA understands, as all top computer scientists do, that there can never be "bug free" code except for relatively trivial applications (like small accounting systems, for example). They put all their fanciest QA effort only where the cost/benefit ratio justifies it. NASA defines several categories of software, with "Life Critical" at the top and COTS (Commercial Off The Shelf) near the bottom. The more "critical" and "expensive to use" an application is, the more intensive is the Quality Assurance (QA) effort that goes into it. "Critical" refers to the severity of the consequences should an application fail. One-of-a-kind custom systems are permitted to be buggier than systems that have to be duplicated and used by a lot of people. Real-time systems used for training astronauts must be cleaner than other applications, because any time wasted is hideously expensive (moderately high consequences). At the top, "Life Critical" (LC) status is given to applications that, should they fail, would kill people faster than anyone could intervene. For these (rare) programs, bug-free status is achieved by several brute force strategies (in addition to elegant planning and design). First, only a very few languages are permitted to be used--they are more sophisticated than assembly language, but less sophisticated than, say, FORTRAN. ;) They have been stripped of many control structures that programmers take for granted in other languages like C or Java. These languages are so dirt-simple, it is hard to make a mistake. :evil: Second, these LC applications are NOT stored on drives and read into RAM by an operating system. The application is pre-loaded into RAM, along with all necessary data--and it stays there for the entire mission. No databases! Any data that is generated by the LC application and must be stored, is placed in pre-determined blocks of RAM. Everything is pre-determined. Third, all LC code development is a TEAM project. Programmers who value their "creativity", their "privacy", their "art" are removed from LC projects and put to work on less critical code. You cannot work on LC unless you are a smooth running cog in the machinery. (Writing LC code is not fun.) Fourth, redundancy. You put the same code in multiple processors and have them run simultaneously so that if one processor fails, the others keep going. You have an entirely different body of people independently write the same application from the same design and requirements, and verified by the same tests. If the first LC version fails for some unthinkable reason, the second may succeed. As you can see, this produces very expensive code. Not even NASA goes to these extreme measures unless they can justify it. The level of QA used is tailored to the cost/benefit analysis results for each specific application. Quote
Buffy Posted December 13, 2006 Report Posted December 13, 2006 The term "bug-free" usually will send any real programmer into convulsions of laughter. I have friends who work on the periphery of NASA, and I've run into some of these practices: as Pyro says, its not fun. In fact its notable that its really built for programmers who are *real engineers* who care more about getting it right than making it "elegant" (a term that I really hate: I've passed on hiring programmers who use that term in interviews....). A question for you though Pyro: you said "pre-loaded into RAM"...seeing how RAM is by definition volatile, what happens if you need to do a cold reboot in space? Does it need a live network connection to Houston to get the code? My impression was that this stuff was normally burned into ROM or EEPROM or something that stays there even if the lights go out.... There is always one more bug, :phones:Buffy Quote
Pyrotex Posted December 13, 2006 Author Report Posted December 13, 2006 The term "bug-free" usually will send any real programmer into convulsions of laughter. ...A question for you though Pyro: you said "pre-loaded into RAM"...seeing how RAM is by definition volatile, what happens if you need to do a cold reboot in space? Does it need a live network connection to Houston to get the code?...Hi, Bufferino!! :)Yes, I have often had those convulsions of laughter. After I hit my 40th, they began to hurt my ribs and give me heartburn. Excellent question. You won't like the answer. Until very recently, NASA used custom designed GPC's (General Purpose Computers) with separate and redundant power supplies for RAM. The GPC's originally designed for Space Station by IBM cost $1 Billion to design and around $80 Million each to build. What happens if a cold reboot is necessary and the RAM is dead? A hard-wired radio communication box, gets a complete upload of the software, from the ground, and inserts it back into the RAM. Ta-da! :phones: Then Space Station Freedom went down the toilet, IBM was kicked out of the JSC community, several other big players disappeared, got bought up, merged, split or changed their names, and the design of the ISS was begun on the ashes of SSF. For a GPC, NASA now uses a "modem" originally designed for SSF. Yeah, a "modem", that just happens to have its own processor and up to several Megs of RAM. Also designed by IBM. :eek_big: Computers in space are now separated into PC's and "avionics". Any software that needs a database or a GUI or networking, a hard drive, whatever, is put in a PC. And the PC is taken to orbit. Flight critical software is still impressed in RAM (or ROM or a combination) before launch and remains immutible: these are the modem "avionics". They now permit taking spare avionics software up in memory sticks but that is considered only a backup to having the SW beamed directly up to RAM from ground. Quote
Boerseun Posted December 14, 2006 Report Posted December 14, 2006 So what's with the fear of harddrives in space? Won't that be easier and cheaper? Besides, as backup devices for avionics coding, for instance, harddrives are quite a lot more trustworthy than memory sticks! Imagine the shuttle coming down, and in the hurry to reload the software from his memory stick, the pilot accidentally loads and launches his personal MP3 collection! Kaboom with a beat! Quote
Qfwfq Posted December 14, 2006 Report Posted December 14, 2006 ...seeing how RAM is by definition volatile<nitpicking icon=':hihi:'>Only by tradition, not definition, it's just a habit to use it to distinguish from ROM and the various types of UROM</nitpicking > Randomly...;)Q Quote
Buffy Posted December 14, 2006 Report Posted December 14, 2006 <nitpicking icon=':hihi:'>Only by tradition, not definition, it's just a habit to use it to distinguish from ROM and the various types of UROM</nitpicking>Quite true. RWM just doesn't tripple off the tongue of course! I vaguely remember someone trying to use VRAM quite a while back too. Oh well, it is an 9457 even if it does not make literal sense. It brings up the issue that the "tradition" that the fastest memory will always require power to maintain its data could be a way to *introduce* intractable bugs: Imagine if you could no longer rely on a hard reboot to fix Windows BSOD crashes <sound of Alexander guffawing in the background/>...nope even shutting it off will not clear the crud out of RAM. Will this be an improvement? The Economist would say that it is because it will just force the bad programmers out of the business even if it causes some "upheaval in the markets" in the short term... Recursing without a terminating condition,Buffy Quote
Buffy Posted December 14, 2006 Report Posted December 14, 2006 So what's with the fear of harddrives in space?Pyro will have to confirm, but I'm sure that this was a rule defined back in the days (before most of you were born) when the slightest bump would cause a disk crash. "Obviously" it would never be technologically possible to build disk drives that could withstand the jarring of shuttle launches or joggers or bike messengers or any of that sort of abuse that a typical iPod or laptop withstands millions of times per day...It would probably take 7 comittees 10 years in Houston to change the rule, so maybe they'll have the very last hard drives in existence built into the hardware on the first flight to Mars, five years after the rest of us have all moved on to by-then-reliable-memory sticks.... Wait, we need to analyze that,Buffy Quote
Qfwfq Posted December 14, 2006 Report Posted December 14, 2006 Ooooh yes, static RAM and refresh cycles and oooh yes, ;) tell Uncle Bill to quit making Windoze so dependent on reboot. But, perhaps NASA also considers it best to have systems that can reliably put up with the event of depressurization. Certainly the makers of hard drives ain't been making them sealed or even conceiving them for it. Just gotta keep that head in flight, redesigning underway... Quote
Buffy Posted December 14, 2006 Report Posted December 14, 2006 Certainly the makers of hard drives ain't been making them sealed or even conceiving them for it.And here so many of us thought that they *were* sealed! No wonder all my disk crashes have happened after going on an airplane: that 50% drop in air pressure is enough to suck in all sorts of dust and hairballs to grind away on the disk surface! ;) :hyper: Learning something every picosecond,Buffy Quote
Pyrotex Posted December 14, 2006 Author Report Posted December 14, 2006 Correct me if I'm wrong, but don't hard drives DEPEND on atmospheric prressure to control the distance between head and spinning disk? I know that this was the secret of the Bernoulli Drive (based obviously on the Bernoulli Effect), and my understanding is that this technology spread and is very common now. Quote
Buffy Posted December 15, 2006 Report Posted December 15, 2006 Correct me if I'm wrong, but don't hard drives DEPEND on atmospheric prressure to control the distance between head and spinning disk? Yes, that's what Q's saying. I completely overestimate how well sealed they are, and I suppose under explosive decompression, the gaps would just blow out in a vacuum. At the very least, Maxtor or whoever would probably have to build gold plated ones at a pretty penny to provide "sealed even when exposed to 14psi->0psi-in-3-seconds-or-less conditions. I can't imagine though that it would be that hard of an engineering effort though... But the question for you Pyro is is this why they don't use hard disks or is there some even sillier reason? Turn, turn, turn, :rant:Buffy Quote
IDMclean Posted December 15, 2006 Report Posted December 15, 2006 Also what about them cosmic rays? I mean astronauts get cancer from it don't they? Has anyone here read a faq and it said something to the effect of "so a cosmic ray fliped a bit" or similar. Down here on earth shielding from the majority of electromagnetic radiation is easy go into space where the atmosphere is thin :weather_storm: I don't think it's quite the same. Quote
Qfwfq Posted December 15, 2006 Report Posted December 15, 2006 The effect of cosmic rays in bit flipping is mainly a matter of size, I would imagine. I suppose under explosive decompression, the gaps would just blow out in a vacuum.I don't think the real risk would be the disk casing bursting. The point would be to make sure that the gas inside has constant enough properties for the expected mission duration. I would tend to use an inert gas and design it so the power supply also keeps a constant temperature. Sealing would have to be good or excellent according to whether or not it's in a crew cabin, where pressure would only drop in abnormal cases. As they are for use on Earth, from what I remember from the early '80s at least, the motor that rotates the disk also forces air in through a filter. This means there wouldn't be much of a problem if the thing is running while the plane is in descent. If it isn't running, the likelihood of damage depends on whether more air gets in through the filter or other apertures. Quote
Pyrotex Posted December 15, 2006 Author Report Posted December 15, 2006 Also what about them cosmic rays? ...Good question! [ding! ding! ding! we have a winnah!] Cosmic rays CAN flip bits in RAM. The very biggest cosmic rays carry energy in excess of 0.1 erg!!!!! No amount of shielding is gonna stop those. That is why general purpose computers (the system critical ones) are custom designed. The ones on the ISS have 10 bits per byte. 8 bits are the byte itself and the other two are for error detection/correction. They are set so that the entire 10-bit "byte" has a value that is restricted--very much like a redundancy check code in network packets. If ANY ONE bit in a 10-bit "byte" is flipped by radiation, the computer's hardware can detect and correct the error. If ANY TWO bits in a 10-bit "byte" are flipped by radiation, the computer's hardware can detect the error and flag a rad-fault in the corresponding application or data. If any data is written to a "byte" and the 2 error-bits cannot be set so the total value is "properly restricted" then the memory block (I think 256 "bytes")containing that radiation damaged "byte" is decommissioned and not used. The system just writes around it. This is why the space station GPC as designed by IBM was so freakin expensive. Even the down-scaled "modems" now used as avionics computers on ISS are still damn expensive, over $4 Million each I think. Quote
Buffy Posted December 15, 2006 Report Posted December 15, 2006 Side note: one of the best descriptions for how this error correction works is in Richard Feynman's Lectures on Computation http://www.amazon.com/Feynman-Lectures-Computation-Richard-Phillips/dp/0738202967. And you thought he was just a physicist.... Detecting correction,Buffy Quote
Buffy Posted December 15, 2006 Report Posted December 15, 2006 The ones on the ISS have 10 bits per byte. 8 bits are the byte itself and the other two are for error detection/correction.ECC RAM has 9-bits per byte, which only gets you error detection, and obviously they want correction, which requires that tenth bit. Question is, does IBM actually bother to manufacture 10-bit Ram? In correction, you actually want to have some programmatic control over the correction process in some cases, so the cheap way to do this is simply alter your compiler so that it just (wastefully) uses 16-bits for each byte: having to double your RAM requirements would be a small fraction of the cost in off-the-shelf hardware compared to custom-built-stuff-that-nobody-but-NASA-wants hardware. Do they really insist on 10-bit hardware??? Your tax dollars at work,Buffy Quote
IDMclean Posted December 15, 2006 Report Posted December 15, 2006 Where as ram is based on physical latice structures (silicon transistors, a solid state hardware), The harddrives we use is based on magnetic polarity. I remember using floppy disks. Couldn't take them with me without a properly shielded box otherwise the data on them would be come compromised by who knows what. :shrug: I imagine that for extra-atmospheric journeys a magnetic polarity HDD would be about as good as a FDD down here on earth. I don't know about you but I didn't like the failure rate of FDDs. Thank the maker for solid state flash drives. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.