Jump to content
Science Forums

Recommended Posts

Posted

Can anyone tell me why this website does not have an autosave feature like gmail does?

 

i just lost 2 pages worth of post because someone clicked the back button.... not really motivated to so that again, it was an excellent post, :) furious!

 

Ok for the second time for me, the discussion starts with datastructures...

 

There are many datastructures in the world of programming, some built into the languages like int , char and bool, others are user defined, some made for speed of operation and virtually unlimited amount of possible data storage, with a sacrifice of memory, like vectors, some for best use of memory but pain of operation, like linked lists and especially queues. Yet there are some datastructures that are optimised for speed of reference and search, those are the ones that we'll be talking about today...

 

 

Ever since the first file systems have come out, man, or programmers have been striding to find a perfect structure for speed and efficiency of operation and search. Most of us have heard of GoogleFS with their amazing hashing and searching algorithms with trees. Well, there is a team of people who have developed a file system that when comes out (stable) will be a super crazy fast, far advanced ahead of its time, free and open-source file system that will, I hope dominate the world. Referring to Reiser here.

 

Reiser is based on a simple concept of a tree data structure. Basically you have a node, the tree, that points to one to ususally 3 nodes or leaves, thus making a tree shape which earned it (the structure the name tree, here's the visual:

yeah kinda like that.

 

Anyhow, tha main advantage of this is that it is easier to traverse the tree to find something in it, disadvantage is that they are a pain to traverse, and manage as a programmer.

 

So what is so special about the Reiser that i call revolutionary? well its not just simple trees that i am referring to, the guy created a datastructure that he called a dancing tree. Its crasy even in concept, its a tree that ballances itself to optimize the search though itself... Yeah if you have no idea what i just said, your jaw should have dropped to the floor... Its well known that trees usually dont need to be perfectly ballanced to be easily searchable, well this guy made sure that the trees arent infact ballanced, but infat are built in such a way that will make then easier to iterate through.

 

The FS also uses something like hashing, where it generates keys for the data and then instead of trying to find some sort of data, the search concentrates on keys, which makes it faster.

 

So why do i think that Reiser 4 will be a revolutionary acheivement and will be far more advanced then anything we are yet to see? well, to raise the craziness factor to 1 superultragooglogazillion, as if their datastructure wasnt crazy enough, they have created a modulated file system. (your jaw should be a few floors down now and hitting the basement floor) What do i mean by modulated? just that, you will be able, well already are but not too stably, to write modules for your filesystem. Here's an example, say you have a load of backup data stored on tapes and you want to easily reference the files without having to search for them, well, a module that will create a file and point to an exact tape, for its location would allow youto do that, so when you access the file, the FS will ask you to put in certain tape that contains that backup file to retreat it. "Heh" you say, you can make a lib to do that, and there were some programs that let you do that in past. Well, thats how i thought of this example, you see, imagine the speed differece in a library operation vs a module for a file system? yeah i thought a pretty major difference too... also you will be able to do filesystem upgrades with no repartitioning or remaking the fs with modules... (the jaw should start to be digging into the ground)

 

As if that wasnt ehnough, there are new precotions that this FS will make super nifty and safe. The have developed an algorithm that will either allow you to make a full transaction or not make one at all, without having to copy the data twice! (your jaw should be digging into the ground a few feet under your basement right now)

 

Anyhow, be weary of using Reiser 4 for now, its really unstable, and still has many bugs, reiser 3 is ok for the moment, but when all the debugging is done, think of the possibilities

 

anyways, if you are interested, here is th efull description page from the makers of reiser:

 

http://www.namesys.com/v4/v4.html#trees_nodes_items

 

and reiser 4 benchmarks:

 

http://www.namesys.com/benchmarks.html

Posted

Well, we've been using this in the database world for quite a while (before you were born Alex!). Yah, there's lots of interesting stuff you can do with dynamic management of trees, and its nice to see it migrating to file systems now. Note that in the olden days, you wouldn't do this stuff because of the intense overhead (both processor and I/O) that was involved unless there was a big payoff (as there is with databases) although you'd usually do it over the weekend as a batch job.

 

With all this extra horsepower and disk space, heck, who cares! :)

 

Great note Alex!

 

Cheers,

Buffy

Posted

Oh absolutely, Buffy, i know that dynamic tree management has been around, but did you actually read through the notes on that guy's dancing trees, they dont just ballance themselves, i can write a tree to do that infact i need to do it with threaded trees for next thursday, it is crazy in the way it acts to optimize itself for search, they are not ballanced and there is a long explanation why they are faster then ballanced trees, I know the concept is old, but what isnt in SC world, but what this guy is doing is absolutely mind boggling....

 

If you read throught this:

http://www.namesys.com/v4/v4.html#tree_design

you'll know what i mean when you finish :)

Posted
So what is so special about the Reiser that i call revolutionary? well its not just simple trees that i am referring to, the guy created a datastructure that he called a dancing tree. Its crasy even in concept, its a tree that ballances itself to optimize the search though itself...
Implementations of balanced b-trees were considered pretty revolutionary back around 1973, when the MUMPS programming language and various close cousins were first being ironed out.

 

I encountered them in 1985, by which time there were only a few significant variations on the 1977 ANSI language standard, and was duly awed and impressed. Rather than my usual cultish rant about its virtues, here’s a link to a previous ranting post.

 

In the days were a GB of storage was consider high-end, most were impressed by MUMPS’s access speed. These days, most are impressed by its data scalability. A typical database has 10-20 logical nodes per physical block, so a typical 6-level, 50 GB database scales to 20 TB with an increase in cached or uncached retrieve speed of only 33%. As you might expect from a 30-year-old technology, nearly all implementation of M are very solid.

 

Balanced b-trees do indeed implement awesome database. They’re not, however, very new. Though, I must admit “dancing trees” is a way catchier name than [Maxi-]M[uMPS][-tech], MIIS, Cache, or any of the names for this old technology, this after a big, international contest to come up with a catchier name for the 1995 language standard. The winner: “M”.

Posted

Oh yeah, I read it (you can't keep a computer scientist away from this stuff). Non-balanced trees have been long known to be just fine if not better, and for those of us who marketed those db technologies, only doing regular batches was sold as an *advantage*.

 

I don't want to belittle his stuff though, because he's breaking a lot of new ground that's gonna take the sticks in the mud along screaming and kicking....

 

Cheers,

Buffy

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...