Jump to content
Science Forums

Recommended Posts

Posted

Uncle Al has a calculation using time as (radius)^2 of a crystal lattice. At 107 trillion atoms contained it now requires 1.7 hours/point in an AMD Athlon FX-55,

 

http://www.mazepath.com/uncleal/bzhdense.png

Needs work

http://www.mazepath.com/uncleal/qzdense.png

Typical finished graph

 

That will not add another 10,000 points. Who has a bored cluster that needs thousands of CPU-hrs of love? You may compile the serial or (preferred) parallel C++ source code (no surprises). It runs 100% in CPU and RAM; maybe 1 MB total output. No Microsoft compiler or OS is satisfactory.

 

Post here or organiker'coupling symbol'lycos'the usual'com

Posted
Have you thought about looking into using distributed computing for your needs... You could also start your collection of Playstation 3s. .

 

It is a single remaining problem into which we prefer not adding more development. The code is debugged, optimized, compiled, and works to spec. It would do well running in slack time in capable hardware. An AMD FX-55 calculates 2.1 million decimal places of pi in 5.95 seconds versus hours/point in this problem. 60% the throughput in Pentium vs. Athlon, 60% the throughput in WinXP vs. Linux (Knoppix). Segmenting the calculation as a Wintel screen saver is an unpleasant thought. "8^>)

Posted
i have a friend with a beowulf in his basement.... just post mpi code, i will send it his way :Glasses:

 

Kindly ask him first. I suspect he'd like to see the C++ code and perhaps compile it himself. The serial version is a 560K Debian Linux static file (~66K executable plus math libraries). C++ source with documentation is 15K. We'd start with a timing run each in his slowest and fastest CPU. Total run time to any radius is then known within 5%. No hard drive space is needed. It runs wholly in CPU and RAM from a thumb drive if you like.

 

Parallel version C++ source code is 35K, uses MPI, and is best locally compiled for the cluster. A 100% CPU-bound app benefits from cache optimization on-site for maximum execution speed,

 

Valgrind

 

The problem can run as a stack of serial processes with a different radius interval in each CPU. It can run as a parallel process using all CPUs at once for each radius interval. A power failure during serial execution is a disaster. Crashing parallel execution loses a few dozen points in process and RAM storage.

 

Load it then ignore it for a month or two as it runs. Does unused hardware get lonely?

Posted

so did you write it using the MPI libraries and threading? if so, as i said, i have a friend (who i will get to post his cluster specs) would likely be willing to run the code for you (he can use it as a bench for his cluster).

 

but it's beowulf and nix, so the best combination would be C++, threading, mpi...

Posted

C++ source for parallel calculation is compiled (long_double_precision, NO Microsoft compilers!) with define MPI_ROUTINES to include MPI functionality with a main server CPU (small fraction of run time) and clients coding (99.9% continuous). If he has 1023 CPUs or fewer it should be a happy camper. The command line for each radius interval includes MPI control and number of processors used.

 

Severe questions can be routed to the UK programmer who parallelized the serial code. Stuff like: To compile CHIpir.CPP use LAM-MPI. They have comprehensive instructions on installation.

 

LAM/MPI Parallel Computing

 

Compile,

 

mpiCC -DMPI_ROUTINES -O3 -o chipir chipir.cpp

 

The -DMPI_ROUTINES (as the name suggests) compiles in the MPI routines. Without it you get the serial single CPU version. To run it use a series of shell script command lines like

 

time mpirun -np 3 ./chipir 100 10 500 1 2 >> output.txt

 

-np 3" = how many CPU's to use. Use 1 more than the number of CPU's since one task is the "master" which is very lightweight and uses very little CPU at all. For a four-processor box first try -np 5

 

./chipir = executable process

 

100 = starting radius

10 = radius increment

500 = ending radius

1 = flag. Set to "1" for output to file.

2 = work units/CPU From 1-3 shows speedup. There is no advantage above 5. Multiple scripts may be loaded to dynamically fill each work unit queue as processes complete. At the end the process will not terminate until all CPUs finish. The last command line should all load "1."

 

Uncle Al will write the command lines based upon the timing run and real time cluster access available. If your sysop is clever with a cluster I'm all ears for hearing better strategies. Data look like (radius, angstroms; atoms; CHI)

 

42340.800 29620928954114 0.999999997537607374

42352.000 29644441206354 0.999999998910011431

42363.200 29667965859458 0.999999997922274660

42374.400 29691502978924 0.999999997623357328

Posted

Well since Alexander was kind enough to mention my cluster, here are some simple specs on it as it is now...

 

Titan : Master Node[wolf00]

Homebrew Box

Soyo Dragon Board

AMD x64 Core Duo 6000+

3 Gigs Ram

750 gig HDD

CrossFire ATI cards

 

Trident1 :Slave Node 1[wolf01]

Precision 530 1.7GHZ Xeon

1.5 Gig Ram

 

Trident2: Slave Node 2[wolf02]

Precision 530 1.7GHZ Xeon

1.5 Gig Ram

 

Startup......

ssi:boot:base:linear:booting n0-n2[wolf00-wolf02]

 

I just got another P4 HL PC so I'll be adding that in to the mix as well.....

Posted
Well since Alexander was kind enough to mention my cluster, here are some simple specs on it as it is now...

 

AMD x64 Core Duo 6000+

3 Gigs Ram

Precision 530 1.7GHZ Xeon

1.5 Gig Ram

Precision 530 1.7GHZ Xeon

1.5 Gig Ram

 

I just got another P4 HL PC so I'll be adding that in to the mix as well.....

 

Ride 'em cowboy! If it is mostly not doing anything for a month or three... let's do a serial timing run in each CPU (about an hour each, less for the AMD). Linux static executable BigCHIBz is 561K as is and 265K ZIPped. Send me a private message with contact data and I'll e-mail it to you with instructions, or give you a URL for download, your choice.

 

The benzil molecule is a sock - it has no handedness. In the crystal the flat molecules slightly twist and stack into helices, all left- or right-handed in a given crystal, opposite shoes. General Relativity postulates the vacuum has no handedness. Teleparallel gravitation wholly includes GR as a restricted case and further allows the vacuum to be a left foot. One of them is detectably wrong by the energy of the different crystals versus their identical melts fitting into the vacuum,

 

http://www.mazepath.com/uncleal/benzil.png

Stereogram of structure

Calorimetric Equivalence Principle Test

One way to do the experiment

http://www.mazepath.com/uncleal/qz4.pdf

Technical readout (not for the faint of heart)

 

We need to calculate the handedness of the benzil crystal mass distribution (atom positions) to academic standards. Ivory Towers are reluctant to accept outside wash (Not Invented Here). One begins by getting their attention. Donkeys and 2x4s upside the head are a natural pairing. Professors and calculated numbers achieve the same final state.

Posted

Oh hey, i don't know if there is a distro out there, i could probably let you use my spark blade in your beowulf, also you will need that P4; I also have a PPC machine sitting around, doing nothing, so if you can find live distros for the platforms, you will have an 800MHz PPC machine added as a node, and a 440MHz Spark box (don't let the speed fool you, its 64 bit spark... comporable to like a 1.2GHz intel.... though its like comparing a watermelon to a capacitor...

Posted
don't know if there is a distro out there

As needed amend your BIOS to priority boot from the DvD or CD drive, then Knoppix LIVE 5.3.1 DvD (awesome) or 5.1.1 CD (entirely adequate),

 

KNOPPIX 5.3 - Live Linux Filesystem On DVD

Long download; burn your own DvD

Welcome to Linux Cd.org - Knoppix

Buy the DvD, $(US)5.95 plus shipping, then burn more.

 

You need a Knoppix disk if you run Windows. When the OS crashes your HD contents are held hostage to Redmond. Boot Knoppix, mount the drive, and take off what you need.

 

We've compiled and run 64-bit (AMD, of course). It runs slightly faster 32-bit. Long_double_precision is tough on pathways. The FX-55 is a very respectable single core, though obsolete. If you are feeling virtuous you could run the program through valgrind

 

Valgrind

 

and see if my code poet volunteers missed anything clever. Whatever is convenient and interesting is OK with me. When it stops being fun, quit - no harm no foul.

Posted
Oh hey, i don't know if there is a distro out there, i could probably let you use my spark blade in your beowulf, also you will need that P4; I also have a PPC machine sitting around, doing nothing, so if you can find live distros for the platforms, you will have an 800MHz PPC machine added as a node, and a 440MHz Spark box (don't let the speed fool you, its 64 bit spark... comporable to like a 1.2GHz intel.... though its like comparing a watermelon to a capacitor...

 

Well If I switched the distros over to BCCD than in their 2.2.2 BETA they have support for PPC's so definitely could add in the PPC and just boot all machines LIVE with BCCD. Already has all the MPI tools and whatnot built into the distro so that wouldnt be an issue as far as running and compiling the code...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...