Factoring

TeleMad · April 8, 2005

TeleMad: Looking at the loops, which constitute most of instructions executed, and taking the worst possible case (the usual method for comparing algorithms), and …

Qfwfq: I don't want to say your algorithm is less efficient than the simpler one ...

I hope not, because you'd be wrong if you did.

Qfwfq: ... but worst case isn't always the best performance comparison between algorithms ...

And I didn’t say otherwise.

But let me support what I did say.

”We might then attempt to measure T [‘execution time’] in the worst case or in the best case, or we might attempt to compute the average value of T over all possible cases. The best-case performance of an algorithm is usually not very informative and the average performance is often difficult to determine. Consequently, T(n) is frequently taken as a measure of the algorithm’s performance in the worst case.”
(emphasis added, C++: An Introduction to Data Structures, Larry Nyhoff, Pearson Education/Prentice Hall, 1999, p352)

and…

”We have seen that the shape of the graph obtained by comparing the time required for an algorithm to perform its task to the size of the input of data reflects the efficiency characteristics of the algorithm. Thus it is common to classify algorithms according to the shapes of the algorithm’s worst-case analysis.”
(emphasis added, Computer Science: An Overview: Edition 7, J. Glenn Brookshear, Addison Wesley/Pearson Education, 2003, p192)

TeleMad · April 8, 2005

:Qfwfq ... performance depends on many things. Apart from complex issues like cache misses and page faults or whatnot ...

And I guess one could say that a turtle can crawl faster than the speed of sound...just put him in an SST.

The standard implication, not just in comparing algorithms but also in comparing the speed of animals and all sorts of other types of comparisons, is "all else being equal ...". Of course if you don’t have all else being equal one can come to all kinds of different, and largely irrelevant, conclusions.

Qfwfq: …even just considering number of steps, I think Quicksort would lose even against the slow Bubblesort, in a worst case based estimate, but an estimate considering statistics gives it its name.

No, in a worst-case vs. worst-case comparison they’d tie in their general "big O"/"big theta" measures….

”It follows that the worst case computing time for bubble sort is O(n^2). … The worst case for quicksort occurs when the list is already ordered or the elements are in reverse order. The worst-case computing time is O(n^2)…”
(C++: An Introduction to Data Structures, Larry Nyhoff, Pearson Education/Prentice Hall, 1999, p578, 606)

Here much analysis has determined the average efficiency of these two sorting algorithms (which if it can be done is probably the better comparison to perform) and it is here that quicksort obliterates bubble sort. Filling out the above quote some more…

”It follows that the worst case computing time for bubble sort is O(n^2). The average computing time is also O(n^2), but his is considerably more difficult to show. … The worst case for quicksort occurs when the list is already ordered or the elements are in reverse order. The worst-case computing time is O(n^2)… thus giving O(n log2n) as the computing time in the average case.”
(C++: An Introduction to Data Structures, Larry Nyhoff, Pearson Education/Prentice Hall, 1999, p578, 606)

And by the way, I guarantee that if you calculate the average case efficiency of my algorithm vs. alexander’s you’ll still find mine to be much more efficient. In fact, I've already shown that it actually is more efficient for small n, which is about the only hope his algorithm would have for beating mine...and it doesn't. The larger the values for n the better my performs relative to his. So if my wins for small n, and clobbers his for large n, and logic tells us mine would win for intermediates n, then how could mine not beat his for the average case?

TeleMad · April 8, 2005

I repeat that having a list of primes enables a much quicker method and, if you need to do it for various numbers, the list of primes needn't be re-computed for each number.

Using my language of choice, here's my quickly thrown together algorithm for creating a sieve of Eratosthenes. I'll limit it to just numbers less than 500,000 (going up to a ten million or a billion seems like overkill for a simple demonstation that being quickly thrown together).

It's divided into two parts. First, there's a one-time setup that populates a table with a boolean value for Prime for all natural numbers between 1 and 500,000. Again, that's done only once and then forgotten about: it's a one-time time penalty. After that, whenever one wants to find the next prime in code s/he just calls the NextPrime() function. For example, issuing the command ? NextPrime(44) would print 47.

1) ONE-TIME SETUP

lnMaxNumber = 500000
CREATE TABLE sieve (prime L(1))
FOR lnLcv = 1 TO lnMaxNumber
    INSERT INTO sieve (prime) VALUES (.T.)
ENDFOR
UPDATE sieve WHERE RECNO() = 1 SET prime = .F.
UPDATE sieve WHERE RECNO() % 2 = 0 AND RECNO() != 2 SET prime = .F.
FOR lnLcv = 3 TO CEILING(SQRT(lnMaxNumber)) STEP 2
    UPDATE sieve WHERE RECNO() % lnLcv = 0 AND RECNO() != lnLcv SET prime = .F.
ENDFOR
INDEX ON prime TAG prime
INDEX ON RECNO() TAG recordno CANDIDATE
USE

Normal usage (assumes no issue with changing table workareas)

FUNCTION NextPrime(lnNumber)
    SELECT RECNO() AS number FROM sieve ;
         WHERE RECNO() >= lnNumber AND prime INTO CURSOR csrPrime
    IF (_TALLY) = 0
         lnNextPrime = 0
    ELSE
         lnNextPrime = csrPrime.number
    ENDIF
    RETURN lnNextPrime
ENDFUNC

PS: Potential alternate NextPrime(). All coding is being done without running it (mental programming) and I don't know which of the two methods would perform faster.

FUNCTION NextPrime(lnNumber)
    GO (lnNumber)
    IF (!prime)
         LOCATE REST FOR prime
    ENDIF
    lnNextPrime = IIF(EOF(), 0, RECNO())
    RETURN lnNextPrime
ENDFUNC

PPS: Okay, so I actually set this up and ran it: couldn't resist. It took only a minute or so for the one-time setup. Then, both functions returned results instantly for the 15 different test values I submitted to each one.

Qfwfq · April 8, 2005

Similar to PL/SQL, but I had never seen it, what's it called?

I didn't mean to strike up such a fuss TM I hadn't said worst case isn't often used, only that it isn't the ultimate word, and it was years ago I read Niklaus Wirth's "Data Structures + Algorithms = Programs" and I seemed to remember Quicksort would actually loose at first sight. What do you mean by an SST? Are you sure it would make sense to say the turtle was actually crawling that fast? As for "all else being equal ..." one might consider "all else" being the platform, as far as page faults and cache misses go. Look up literature about performance tuning, there are tricky issues including locality of reference wich can make code better or worse, even on all or most machines of a type, but I only meant that as an extra consideration.

My C++ code program runs fast and here is an app that uses basically the same algorithm I posted, typing the same value 500000 into the Max dialog, execution is instant, even clicking "Vai e Scrivi" which in one single click tells it to do the whole job: allocate & init, execute and then prepare to display. If you have the CRT and MFC .dlls, try it!

Max can be up to 4G-1 but this allocates a quarter of a gigabyte in RAM and the algorithm isn't very swapping-friendly. I ran it on the machine I'm working on which is nothing wonderful, it has about enough RAM for the job but I always have plenty of windows open while I'm working though I didn't have them doing much during the runs, swapping wasn't too bad. Even when Eratotene.exe is idle but with that much memory allocated, the whole machine becomes sluggish. These were the CPU times from task manager:

Max = 999999999, 0' 01 to allocate and init + 1' 04 to execute + 0' 10 to display.

Max = 4294967295, 0' 04 to allocate and init + 4' 51 to execute + 0' 45 to display.

A few improvements could be made to it yet.

BTW, what was said in the past about customers changing specs is true, most of my work is bloody sw maintenance, but I disagreed with the conclusions drawn. The dumbest way isn't always the best at all. Good analysis can also have the aim of project manageability. Great names like Knuth have always recomended doing the trickiest optimization on each single release but, from some overall structural points of view, I hold that performance and manageability can go hand in hand.

TeleMad · April 9, 2005

Similar to PL/SQL, but I had never seen it, what's it called?

It's MS Visual FoxPro (VFP).

Qfwfq: What do you mean by an SST?

The SST: supersonic transport ... the 'Z-shaped-nose' airliner that was recently taken out of commercial flights.

Qfwfq: As for "all else being equal ..." one might consider "all else" being the platform, as far as page faults and cache misses go.

Page faults are a function of the OS and the hardware, not of the algorithms under discussion; and cache misses are a function of the OS, the cache design, and the caching controller: again, not of the algorithms under discussion.

The closest the OS/hardware issues you mention come into consideration is when comparing algorithms' memory efficience. But there's at least two problems here too. First, memory efficiency is not what I've been discussing nor is it what is normally meant these days when discussion an algorithm's efficiency (an algorithm's efficiency is usually based on 'time': that is, how many statements an algorithm must execute in order to complete). Second, those concepts still aren't involved in calculations of algoritms' memory efficiencies. If one algorithm is a huge memory hog while the other has a small memory footprint then the second is more efficient in terms of memory (which says nothing as far as their relative time efficiencies). But that's as far as that comparison goes: comparing how much memory each algorithm needs to complete. Even this comparison doesn't take cache misses and page faults into account.

Qfwfq: Even when Eratotene.exe is idle but with that much memory allocated, the whole machine becomes sluggish. These were the CPU times from task manager:

Max = 999999999, 0' 01 to allocate and init + 1' 04 to execute + 0' 10 to display.

Max = 4294967295, 0' 04 to allocate and init + 4' 51 to execute + 0' 45 to display.

A few improvements could be made to it yet.

My non-one-time-setup execution times should be sub 1 second, I think. Sometime, maybe next week, I'll change mine to use a max of 750 million and give it a few spins and post the results.

Qfwfq: BTW, what was said in the past about customers changing specs is true, most of my work is bloody sw maintenance, but I disagreed with the conclusions drawn. The [most straightforward] way isn't always the best at all.

Yeah, I know, and I didn't say otherwise. I didn't simly say that the straightforward was preferable simply because it was more straightforward - althought that in itself is a benefit - but instead gave reasons (increased flexibility, increased maintainability, and reduction in design time) why in this particular case the most straightfoward way was preferable to the mathematical way alexander used.

Qfwfq · April 11, 2005

;) B) B) ;) :rant: :rant: :rant: :rant: :rant: !!!

SST: supersonic transport ... the 'Z-shaped-nose' airliner that was recently taken out of commercial flights.

Yeah, jait, put a turtle on a Concorde and, while at cruising spead, it is travelling faster than sound over the Atlantic but it isn't crawling that fast. It's travelling even faster around the sun, especially at midnight and going eastbound, or even if it's on dirt ground, but do we say that it's crawling that fast? Goodness, we should all be Olympic gold medals!

Page faults are a function of the OS and the hardware, not of the algorithms under discussion; and cache misses are a function of the OS, the cache design, and the caching controller: again, not of the algorithms under discussion.

The closest the OS/hardware issues you mention come into consideration is when comparing algorithms' memory efficience. But there's at least two problems here too. First, memory efficiency is not what I've been discussing nor is it what is normally meant these days when discussion an algorithm's efficiency (an algorithm's efficiency is usually based on 'time': that is, how many statements an algorithm must execute in order to complete). Second, those concepts still aren't involved in calculations of algoritms' memory efficiencies. If one algorithm is a huge memory hog while the other has a small memory footprint then the second is more efficient in terms of memory (which says nothing as far as their relative time efficiencies). But that's as far as that comparison goes: comparing how much memory each algorithm needs to complete. Even this comparison doesn't take cache misses and page faults into account.

First:

It was you that brought maintainability and customer whims into the agument with Alexander, why should you object to someone else mentioning other matters that aren't pure "big O/big theta" measures???? ;)

Second:

Yes code can be more or less prone to cache misses and page faults, according to how it's designed. Did you fail to notice I mentioned locality of reference? One doubt about my code is on caching, I don't know details of how it is managed, whether or not iterating with a decreasing pointer is worse than with an increasing one, I don't think so but I'm not sure. It wouldn't make a difference to page faults but my code could certainly be improved there, for Win32 API, by using a file intead and mapping three memory views on it.

My non-one-time-setup execution times should be sub 1 second, I think. Sometime, maybe next week, I'll change mine to use a max of 750 million and give it a few spins and post the results.

Perhaps, colleague, you didn't notice what I said about the same max value, 500000 that you tested and about which you you said:

PPS: Okay, so I actually set this up and ran it: couldn't resist. It took only a minute or so for the one-time setup. Then, both functions returned results instantly for the 15 different test values I submitted to each one.

I said that my time, for the same max, was just about instant. Try it.

I didn't simly say that the straightforward was preferable simply because it was more straightforward - althought that in itself is a benefit - but instead gave reasons (increased flexibility, increased maintainability, and reduction in design time) why in this particular case the most straightfoward way was preferable to the mathematical way alexander used.

Whether "dumbest" or "straightforward" I still say that these aren't the criteria for maintainability. Clarity is to the point, something I don't bother with when I do my own stuff and not for customers. Bottlenecks are another thing, and some C++ features are for just this purpouse, app stuctures also lean toward this. Anything that must be coherent throught the app or even whole groups of groups of management apps, fix it one place and use it where needed, and you'll be thankful every time it has to be re-touched.

TeleMad · April 16, 2005

Qfwfq: It was you that brought maintainability and customer whims into the agument with Alexander, why should you object to someone else mentioning other matters that aren't pure "big O/big theta" measures???? :)

Uhm, because maintainability should always be taken into consideration when writing code, of any kind: but your hardware-specific issues - cache misses and page faults - are hardly ever taken into consideration by application developers and is irrelevant for the small C++ algorithms being compared. Apples and oranges.

Qfwfq: Second:
Yes code can be more or less prone to cache misses and page faults, according to how it's designed.

So tell me which one of these is going to have more cache misses and page faults.

// alexander's 
int numOfDivisors = 0;
for (int divisor = 1; divisor <= number; divisor++)
{
    if (number % divisor == 0)
         numOfDivisors++;
}

// TeleMad's
int numOfDivisors = 2;
if (number == 1)
    numOfDivisors = 1;
else
{
    for (int divisor = 2; (divisor * divisor) <= number; ++divisor)
    {
         if (divisor * divisor == number)
              numOfDivisors++;
         else if (number % divisor == 0)
              numOfDivisors += 2;
    }
}

Neither. Their memory usage is the same: they both use only two algorithm-defined numeric variables.

Are you saying that the cache won't have room to store two lousy numbers for one of the algorithms, but it will for the other algorithm? Please enlighten us as to why.

And what is it anyway, a 2-byte cache?

Are you saying that when running one of these two algorithms - but not the other - that the measely two variables will not be able to be stored in memory and will generate page faults? Please enlighten us as to why that would be the case for one of the algorithms and not for the other.

And what is it, a computer with 2 bytes of RAM?

Your harware-related topics are irrelevant to these two algorithms. And, your hardware-related topics are NOT taken into account when comparing the efficiency of general algorithms.

Qfwfq: Did you fail to notice I mentioned locality of reference?

No. Did you fail to notice that that is irrelevant to the above two algorithms?

Qfwfq: Whether "dumbest" or "straightforward" I still say that these aren't the criteria for maintainability.

Code being straightforward is by itself a bonus in regards to maintainability. The more non-straightfoward code is the harder it is to understand, and therefore the harder it is to maintain.

But there's more to coding than JUST maintainability. Comparing the two C++ algorithms we can see that alexander's is a bit more straightforward, but mine is far more efficient ... for ALL n. Here the tradeoff of slightly less straightforwardness for greatly improved algorithm efficiency is warranted.

Qfwfq: Bottlenecks are another thing...

Another thing in regards to what? Maintainability? If that's what you meant, then I'd have to disagree. Bottlenecks are a performance issue, not a code maintainence issue. (If you have to modify your code to handle a bottleneck, then a bottleneck is a cause of code being written; but it is not a cause of how maintainable that new code is: maintainability is a function of how that code is written).

TeleMad · April 16, 2005

Speaking of maintainability, clarity, and straightforwardness ...

1) Overall, your C++ code is very cryptic. I know many C++ programmers get a kick out of writing complex-looking, cryptic code: the more "looks like Greek to me" they make it the better they like it. Makes their code look like some kind of advanced calculus … makes the programmer want to show it to others to make themselves feel smart. But that is poor programming practice as it makes the code harder to maintain. Especially by a different programmer.

2) Many of your variables names, such as ii, II, k, n, etc. don’t follow the general guideline that variables names should be self-describing.

”A variable’s name should make clear to anyone reading the listing the variable’s purpose and how it is used. Thus boilerTemplate is better than something cryptic like bT or t.” (Robert Lafore, Object-Oriented Programming in C++: Fourth Edition, Sams, 2002, p40)

Of course other texts on C++, and also those on programming in other languages, state this too.

3) Many C++ sources recommend reserving all-cap identifiers for constants. But you used identifiers J, Q, I, II for non-constant variables.

Here are some of your variables that violate one or more of the above recommendations/guidelines for identifier names.

i, I, ii, II

j, J

q, Q

One more thing, about optimization. If you will note I used preincrement for my loop-control variable, and whenever either pre or post will do (though I did slip and use postincrement also), whereas most people, including you and alexander, use postincrement. I do this for performance reasons.

"Performance Tip 2.5
Preincrement and predecrement operate slightly faster than postincrement and postdecrement." (C++ How to Program: Fourth Edition, H. M. Deitel & P. J. Deitel, Prentice Hall/Pearson Education, 2003, p101)

I've seen other sources that state this performance difference does not apply to primitive data types, but it's been too long ago for me to remember the exact details. Anyway, even if the post vs pre doesn't make a difference for primitive data types, it makes sense to be consistent and just use the one that wil be faster in some cases (instead of using postincrement when it doesn't make a difference and remembering to switch to using preincrement when it does).

TeleMad · April 16, 2005

Last night at 2:00 am I figured I'd setup something to address the thrid subtopic: about prime numbers. So I set to expanding the range and was going to increase it ten fold, from 500 thousand to 5 million, but said what the heck, I'll go for a 100 fold increase and allow for numbers up to 50 million. After making a couple optimizations to the one-time setup algorithm, I kicked it off and went to sleep. It finished sometime during the night so this morning when I woke up I added a timing function that uses pseudo-random numbers to see how long my NextPrime function takes to execute. Using 100 pseudo-random numbers between 1 and 50,000,000 returned an average of only 112 milliseconds (and my computer isn't the fastest thing around by any means).

FUNCTION CreateSieve(lnMaxNumber)
* One-time setup
CLOSE ALL
SET SAFETY OFF
CREATE TABLE sieve (prime L(1))
FOR lnLcv = 1 TO lnMaxNumber
   	 INSERT INTO sieve (prime) VALUES (.T.)
ENDFOR
UPDATE sieve WHERE RECNO() = 1 SET prime = .F.
UPDATE sieve WHERE RECNO() % 2 = 0 AND RECNO() > 2 SET prime = .F.
lnHighestFactor = CEILING(SQRT(lnMaxNumber))
FOR lnLcv = 3 TO lnHighestFactor STEP 2
	GO (lnLcv)
	IF (prime)
		UPDATE sieve WHERE RECNO() % lnLcv = 0 AND RECNO() > lnLcv SET prime = .F.
	ENDIF
ENDFOR
INDEX ON prime TAG prime
CLOSE ALL
ENDFUNC

FUNCTION NextPrime(lnNumber, lcSieveAlias)
LOCAL lnNextPrime, lnSelect
lnSelect = SELECT()
SELECT (lcSieveAlias)
GO (lnNumber)
IF (!prime)
	LOCATE REST FOR prime
ENDIF
lnNextPrime = IIF(EOF(), 0, RECNO())
SELECT (lnSelect)
RETURN lnNextPrime
ENDFUNC

FUNCTION TimeSieve(lnNumberOfTrials, lcSieveTable)
* Utility function that time-tests the NextPrime function
* using pseudo-random numbers.
LOCAL lnLncv, ltStarted, lnEnded, lnSeconds, lnNextPrime
LOCAL lnMinNumber, lnMaxNumber, lnNumber

CLOSE ALL
CREATE TABLE timesieve ( ;
	number		N(10), ;
	nextprime	N(10), ;
	seconds		N(10, 3))
USE (lcSieveTable) IN 0 ALIAS sieve

lnMinNumber = 1
lnMaxNumber = RECCOUNT("sieve")
=RAND(-1)

FOR lnLcv = 1 TO lnNumberOfTrials
	lnNumber = INT((lnMaxNumber - lnMinNumber + 1) * RAND( ) + lnMinNumber)
	lnStarted = SECONDS()
	lnNextPrime = NextPrime(lnNumber, "sieve")
	lnEnded = SECONDS()
	lnSeconds = lnEnded - lnStarted

	INSERT INTO timesieve (number, nextprime, seconds) ;
		VALUES (lnNumber, lnNextPrime, lnSeconds)
ENDFOR

CLOSE ALL
ENDFUNC

And here are the results of the TimeSieve test (sorry, the Excel columns line up appropriately in what I am posting, but once the page is submitted to the server, the columns no longer retain their proper formatting).

number	               nextprime	            seconds
30,600,100	30,600,107	0.198
13,024,643	13,024,673	0.110
 4,513,818	  4,513,829	0.111
30,706,104	30,706,129	0.122
43,147,802	43,147,813	0.110
 7,510,472	  7,510,483	0.116
 3,362,310	  3,362,327	0.149
 5,520,153	  5,520,157	0.115
30,175,122	30,175,127	0.115
 4,760,182	  4,760,183	0.108
31,217,662	31,217,663	0.118
37,350,926	37,350,931	0.113
45,683,454	45,683,461	0.113
41,174,483	41,174,503	0.112
13,927,370	13,927,373	0.132
45,145,847	45,145,861	0.114
21,305,755	21,305,761	0.131
 6,041,241	  6,041,267	0.119
48,294,776	48,294,787	0.109
 7,900,442	  7,900,457	0.113
49,012,244	49,012,253	0.112
37,412,549	37,412,569	0.109
17,621,674	17,621,687	0.115
15,540,297	15,540,307	0.120
27,341,474	27,341,497	0.119
25,061,885	25,061,891	0.110
 5,541,930	  5,541,931	0.106
28,836,223	28,836,229	0.117
31,313,055	31,313,071	0.131
 2,552,587	  2,552,587	0.024
24,552,359	24,552,373	0.111
11,633,865	11,633,879	0.110
28,678,263	28,678,271	0.115
22,948,918	22,948,921	0.115
38,147,132	38,147,141	0.116
11,179,024	11,179,033	0.116
27,664,794	27,664,801	0.108
25,200,953	25,200,979	0.112
17,961,936	17,961,949	0.120
24,819,925	24,819,929	0.114
 3,871,899	  3,871,909	0.121
10,734,541	10,734,547	0.114
39,834,156	39,834,161	0.111
 3,867,693	  3,867,707	0.110
16,472,173	16,472,173	0.006
12,277,603	12,277,619	0.121
35,185,324	35,185,327	0.118
 7,896,274	  7,896,277	0.109
36,376,225	36,376,259	0.111
49,454,004	49,454,047	0.107
26,632,074	26,632,087	0.123
35,263,072	35,263,079	0.125
21,516,492	21,516,527	0.120
28,176,250	28,176,257	0.103
39,231,880	39,231,883	0.109
18,131,529	18,131,551	0.119
36,906,442	36,906,449	0.115
 3,368,268	  3,368,269	0.114
41,269,787	41,269,817	0.110
32,776,910	32,776,943	0.119
29,631,419	29,631,439	0.112
37,049,514	37,049,521	0.118
30,992,431	30,992,441	0.115
38,456,542	38,456,563	0.110
31,427,381	31,427,381	0.003
46,311,656	46,311,659	0.111
48,195,556	48,195,559	0.112
20,625,762	20,625,769	0.113
24,003,771	24,003,773	0.116
48,219,087	48,219,121	0.114
 2,071,307	  2,071,319	0.140
26,780,802	26,780,821	0.137
14,766,789	14,766,797	0.142
21,164,513	21,164,567	0.111
19,221,534	19,221,539	0.114
45,709,900	45,709,919	0.110
33,099,296	33,099,301	0.113
14,710,880	14,710,901	0.124
28,243,366	28,243,417	0.110
43,433,842	43,433,843	0.109
19,853,611	19,853,611	0.004
 2,450,860	  2,450,863	0.115
 9,793,296	  9,793,307	0.112
31,617,220	31,617,239	0.115
45,213,453	45,213,457	0.109
34,915,187	34,915,187	0.010
 3,536,034	  3,536,041	0.111
 6,731,643	  6,731,651	0.109
 2,581,267	  2,581,273	0.117
12,061,051	12,061,061	0.116
25,884,373	25,884,377	0.126
10,752,144	10,752,149	0.118
38,647,196	38,647,201	0.121
 7,371,838	  7,371,839	0.118
14,936,328	14,936,329	0.115
26,668,046	26,668,051	0.121
42,295,938	42,295,951	0.122
38,099,430	38,099,437	0.119
31,515,779	31,515,821	0.125
34,589,530	34,589,531	0.115

Tormod · April 16, 2005

Tele, maybe you can post this within CODE tags? That might retain the formatting.

Qfwfq · April 18, 2005

Good example of apples and oranges! You say:

"Uhm, because maintainability should always be taken into consideration"

"but your hardware-specific issues - cache misses and page faults - are hardly ever taken into consideration"

The difference boils down to that at the most: 'should' vs. 'hardly ever'. In any case I wasn't claiming page faults and cache misses to be relevant for the algorithms that you and Alexander were discussing, I made an example in general, further to your point of maintainability, which also hardly applied to the same two algorithms. Anyway, locality of reference isn't just a matter of how many bytes of variables you declare. As the term suggests, it depends on where in virtual memory they will be. If you're familiar with typical implementations of compiled languages, work it out.

You may have noticed I mentioned locality of reference but you did not notice that I said: "Clarity is to the point, something I don't bother with when I do my own stuff and not for customers". Please read before criticizing. The recomendation about all-caps names is a convention and is meant for long and non local names anyway, not so much for local indices like the ones of mine that you quote, I chose these according to criteria that I decided and find logical although not conventional. Anyway, those that enjoy writing in Greek are usually called 'hackers' (not in the sense of 'crackers'!) and they usually understand coding practices better than anybody. Some people enjoy such boring things as crossword puzzles.

You completely misunderstood what I said about bottlenecks, there are different kinds of them you know. I won't repeat what I said in my previous post in this thread, it's still available. I wasn't talking about avoiding them for performance reasons, actually the ones I meant can be somewhat against better performance but can save a lot of future headaches.

I am aware that preincrement and predecrement may operate slightly faster than postincrement and postdecrement. The reason is quite obvious: post{in/de}crement may imply both r-values being used. A good optimizing compiler will distinguish and will only reserve the bytes for both values when they are both needed, i. e. when it can't be avoided anyway. How often do you take a look at your IDE's disassembly window? I have often done it when testing release version and I've seen the compiler do some amazing things! Let alone such trivial things as giving no semantics to for(i=0; i<999999999; i++); when the final value of i isn't used and otherwise making it equivalent to i=1000000000;

Although I agree that it would be a good habit, I never worry to write ++i when it makes no other difference (if at all) than performance. If, otoh, I give a C++ class these operators, they become a call to my member functions and I don't expect the compiler to choose which to call, there's no rule that obliges me to give the usual semantics so it would be wrong for the compiler to change from one to the other. In this case I do give it a thought. Is this, perhaps, the exact detail that you have a hard time remembering?

I'd say that's enough nitpicking for the time being, isn't it? :)

For a Max of 50 million, my app works them all out in hardly more than two seconds. I don't need to let it run while I go to sleep! If you call my nextPrime function for 50 million first (modified to allocate only 3125000 bytes) and then call it for lower values, I'm sure you'll get good performance results. When I posted it I didn't bother modifying the allocation scheme or anything. Sorry. It's the while block containing the fprintf and scanf calls.

TeleMad · April 19, 2005

Qfwfq: Anyway, locality of reference isn't just a matter of how many bytes of variables you declare.

And I never said or suggested otherwise.

I just pointed out that it – like just a lot of other things you bring up – is irrelevant. For example, it is irrelevant to:

(1) the specific case of alexander’s vs. my C++ algorithm

and

(2) the general method of comparing two algorithms’ efficiencies.

Qfwfq: As the term suggests, it depends on where in virtual memory they will be. If you're familiar with typical implementations of compiled languages, work it out.

I don’t need to, I already understand what you are saying. It's you who doesn't understand what I am saying (hint, it has to do with irrelevance).

Qfwfq: "Clarity is to the point, something I don't bother with when I do my own stuff and not for customers". Please read before criticizing.

I did read it before responding. My question now is, so what? What I wrote is perfectly in line with what you said the first time I read it, and now that you’ve emphasized it again.

So you don’t follow many recommended practices when you write code for yourself. Good for you for admitting it. And I agree with you: we all should because of the specific examples I gave that support that fact.

But, can you show me where I said you were also sloppy and cryptic as hell when you wrote for customers? No, you can't, because I didn't say that.

So your emphasizing here a distinction between your writing code for yourself and your writing code for customers is, like most everything else you bring up, irrelevant.

Qfwfq: The recomendation about all-caps names is a convention and is meant for long and non local names anyway, …

No, the convention says that all-caps identifiers are for constants … end of convention. There’s nothing about local vs non-local, or long names vs. short names.

When an experienced C++ programmer is reading someone else’s code – as I and others were doing doing here while reading your code that you chose to post here - and sees an identifier written in all caps he/she understands it to be a constant, because of the conventions good programmers follow when writing code. These C++ programmers can figure out that an all-cap identifier isn’t a constant, but that requires them to do some amount of work they shouldn’t have to do.

You chose to show your code to other C++ programmers here. I pointed out that you didn’t follow several conventions. When you show something to others and it doesn't follow recommended standards, then someone is likely to point such out...one should expect that.

TeleMad · April 19, 2005

Qfwfq: For a Max of 50 million, my app works them all out in hardly more than two seconds.

Ambiguous. Are you saying:

1) that your app works all next primes out in just over 2 seconds…each?

or

2) that your app, using the same 100 pseudo-random numbers I used, took just over 2 seconds to do all 100?

Either way, your execution time during runtime is worse than mine. Yours took either 2 times as long, or ~18 times as long, as mine did.

(What’s worse is that you are using C++ code while I am using VFP code. General C++ code is considered to execute much faster than VFP code. So I am “handicapped” by using VFP, yet my execution time is still only about ½ or 1/18th of yours).

Qfwfq: I don't need to let it run while I go to sleep!

But yours is slower!!!!!!!!!!!!!

And, let me point out the obvious fact that you are trying to exaggerate. My one-time setup was performed (1) only once, and (2) at night, when the computer would have otherwise been completely idle.

My one-time investment reaped huge dividends and it didn't cost me even 1 millisecond of needed CPU time to accomplish.

I prepared, just once, ahead of time DURING OTHERWISE UNPRODUCTIVE TIME so that I could be MORE PRODUCTIVE during runtime. Bosses love that kind of thing: you should give it try!

Qfwfq: If you call my nextPrime function for 50 million first (modified to allocate only 3125000 bytes) and then call it for lower values, I'm sure you'll get good performance results.

I don’t need to call my function in any particular sequence in order to get good results!!!

I can call mine in any order and still get faster runtime results than you.

Also, if people compare your C++ implementation to my VFP one, they’ll see that not only is my VFP function much shorter than yours, but also much easier to understand.

Finally, I was able to get my first VFP version operational in only about 5 minutes. And it took me only another 1 or 2 minutes to make the optimizations. The final product took me less than 10 minutes to throw together. How long did it take you to get to your final C++ version? Much longer.

Qfwfq · April 19, 2005

TeleMad, I have no intention of continuing a nitpicking game based on aberration of what I say, and on you denying that you meant this or that, when Susan said she doesn't like vodka. I have better things to do in life.

You did not say how long your algorithm took to "set up" so I can't make the comparison. Instead of complaining about your handicap, wouldn't it make more sense to translate your algorithm into C and compare the two fairly? You should, before claiming yours is faster, instead of unjustly tweaking on my words. Aside from my code being compiled with a lousy compiler, I'd like to see even an implementation in assembler run much faster on the same machine!

I don’t need to call my function in any particular sequence in order to get good results!!!

Neither do I, what you quoted was only to the purpose of comparison with your specific test, the one of which you posted tables of times excepting the time of the initial setup that you mentioned.

I still disagree with many of your remarks and I disagree with your accusation of my code being sloppy. I disagree with confusing conventions with good/bad coding practices. These are not the same thing, conventions are and have been more arbitrary and identifier naming conventions are less necessary when the name is declared and used in a short section. As for choosing to show my code to other C++ programmers here, I simply offered it to be tried for speed or even used for factoring, I supplied it "as is" and said so. I am not concerned with anybody being able to understand it, if anybody wants to it wasn't included in the price. When nobody is paying you to write code, be free to do so with what conventions you please and the same goes for me.

I also supplied a little app that can store results to a file on disk. The nextPrime function, as is, was how I adapted the algorithm to uses for factoring, where I wanted to obviate the necessity of using a file of a quarter of a gigabyte. A trivial adaption could do so. When you think I wasn't aware of such possibilities you are being even more offensive, and also when you point out the advantage that you did it on otherwise idle time, as if I've never seen night-time batches in my job. BTW, how much RAM, or disk storage, does your scheme use for a given size of sieve? Not that companies worry about that kind of thing nowadays, I've seen customers that declare field upon field as VARCHAR(8) or longer to store integer values in ASCII format in them.

It's you who doesn't understand what I am saying (hint, it has to do with irrelevance).

Yet another groundless accusation. Anyway your point about maintainability was no more relevant to:

(1) the specific case of alexander’s vs. my C++ algorithm

and

(2) the general method of comparing two algorithms’ efficiencies.

than my example was.

No further discussion on what type of schemes actually are used for maintainability and the likes. If you wish to discuss anything else, at least avoid being offensive, bitteschön.

TeleMad · April 19, 2005

TeleMad, I have no intention of continuing a nitpicking game based on aberration of what I say, and on you denying that you meant this or that, when Susan said she doesn't like vodka. I have better things to do in life.

Possible translation...

Qfwfq: I know I am out of my league here and think I've embarassed myself long enough.

Qfwfq: You did not say how long your algorithm took to "set up" so I can't make the comparison.

Once again you've made an ambiguous statement. Please learn to communicate effectively.

1) I said it took me less than 10 minutes overall to 'setup' my VFP algorithm (setup here as in 'complete the coding of').

2) I said it took my one-time 'setup' algorithm 0 milliseconds of needed CPU time to complete (setup here as in how I've been using it most frequently).

Now I know it took you a lot longer - a LOT longer - than a mere 5-10 minutes to get your crypitc-as-hell and non-conforming-to-conventions C++ code to its final state starting from scratch.

Qfwfq: Instead of complaining about your handicap, wouldn't it make more sense to translate your algorithm into C and compare the two fairly?

Nope. You used your language of preference and I chose my language of preference. And the outcome? My program executes faster than yours. I know you don't like that, but hey, sorry.

Now, it would be wrong for me to compare algorithmic efficiency of our two NextPrime algorithm's using "big O"/big theta" measures, because that would be comparing your C++ apple to my VFP orange. And note that I haven't done any such thing. My comparison between your C++ and my VFP algorithms was based on things such as speed of execution (mine wins), coding time (mine wins), number of lines of code needed (mine wins), and maintainability (mine wins).

Qfwfq: Neither do I, what you quoted was only to the purpose of comparison with your specific test, the one of which you posted tables of times excepting the time of the initial setup that you mentioned.

Initial setup time was 0 milliseconds of needed CPU time.

My NextPrime function's execution was ~18x faster than yours.

Qfwfq · April 20, 2005

I am not going to continue the nitpicking game because there have been misunderstandings and that isn't my fault. I am not out of my league here and I haven't embarassed myself. Your "translations" are totally arbitrary. I don't expect you to admit defeat any more than you can expect me to do so by invoking me in these ways that aren't any line of argument. I never understood what you meant about letting your code run while you took a nap and now I don't even need to know, I see that your algorithm has no setup at all. My meaning of "set up" wasn't that of developing the source code and your mention of letting it run while you were sleeping, and exploiting idle time for a once-for-all task misled me. Perhaps I'm not the only one that should learn to communicate effectively. In my previous post I requested you to kindly avoid the more offensive types of nitpicking when it isn't justified by my own shortcomings, if you disagree take the bother of reading the following carefully.

I'm less busy today and I looked through your table, your execution times seem to remain similar no matter how many times you query for primes not greater than ones already found. This is not so in my case, hence the need for fair comparison. I haven't measured the times for my function when it only needs to lookup, without further execution of the sieve, but I'm sure the times are tiny unless the necessary page has been swapped out of RAM.

Fair comparison, in the range of under 50 million:

If you only need to find 10 or 20 primes in this range, your times add up to not more than the couple of seconds in which my function is able to work out the entire sieve.

If one needs to repeatedly iterate through increasing primes in this range, to find the prime factorization of number after number as Alexander apparently was interested in doing and as I myself have often done, my function saves execution when it has already been called on numbers greater than the current argument.

In this case it only performs an almost trivial lookup on the flags that are already correct. See the if(n < maxPrm) block. Failing this it continues executing the sieve but only as necessary, so if you call it scores of times on numbers less than, say, 1000 it doesn't waste time on the flags for greater numbers. If you need to factor several non-consecutive numbers in a certain range, you don't know how large the greatest prime factor will be until you have found all the prime factors.

No mathematician could give you a quicker algorithm than that, if you don't use an already prepared table of primes. If one wants the table to survive from one process to the next, just save it to disk, not a difficult thing for a good C hacker to do. In the new process, this may save some time, especially for the full range up to 4,294,967,295 and if the amount of RAM is a problem, but not such a huge time compared with a lot of factoring to be done. If this is preferred, the necessary file can be saved from my little app and then have the first 8 bytes shaven off in a text editor, these bytes aren't part of the actual sieve but the value of Max and the number of primes found.

Now I know it took you a lot longer - a LOT longer - than a mere 5-10 minutes to get your crypitc-as-hell and non-conforming-to-conventions C++ code to its final state starting from scratch.

Of course it takes longer to develop something sophisticated. Most of my company's customers will always say something should have been done in less time, no matter how fast you did it. When I'm hacking in my spare time and in no hurry, I do what I'll be the most pleased with, not what takes the least time to write and debug.

No naming convention is authoritative unless it's requested by who pays you. One guy I worked for was ridiculously picky on stylistic details and I'm not the only one to have said so about him. He once made a girl invert the order of the two operands of an OR aperator in VB6 which doesn't have short-circuit evaluation or anything, he just said it was less clear the other way around! Imagine, just like the difference between:

If bFluffy Or bFrilly Then...

and:

If bFrilly Or bFluffy Then...

How's that for conforming-to-conventions, clarity and maintainability?

When I first developped the app I have supplied here, just for practicing MFC a bit, Hungarian notation was very much used, a fashion that has somewhat faded. If you look at past editions of Bjarne Stroustrup you will see that he regularly used variable and type names all in lower case but nowadays it is typical to use capitalization to mark words in long names, with initial cap for class names and not for class members or other variables. These conventions have their advantages, including the faded Hungarian notation, I have often used it and some customers require it strongly. For some variables, including many iteration indices, the name is important mainly to the compiler and only to distinguish it from other names. People hardly bother with names in such cases and a comment like j = 32J + j%32 should be good enough for the expert C programmers you mentioned.

My NextPrime function's execution was ~18x faster than yours.

I wouldn't be so sure. In a fair comparison, you would have to compare the typical execution time of:

if(n < maxPrm){
 uLong ii = (n >> 1), dwii, maskii = 1 << (ii%32), II = ii >> 5, max = (maxPrm >> 1);
 for(; ii < max; II++, maskii = 1){
   dwii = m_Flgs[iI];
   for(; maskii && ii <= lhFF; ii++, maskii <<= 1){//  ii = 32II + ii%32
     if(dwii & maskii){
       n = 1 + (ii << 1);
       return true;
     }
   }
 }
}

to the values that you report which are mostly over 100 ms. How long does it take to read a few unsigned longs from virtual memory and find a bit that's 0? Not long, I expect, especially in the case of increasing primes while factoring. I'm undaunted.

At home I have an old Tecra, only 133 MHz and 80 megs of RAM and an ailing disk drive, yet it can handle the algorithm fine enough for primes up to a billion or so and I have done plenty of factoring on it. Of course, I don't do the heaviest factoring while using the machine for other unrelated tasks. Turtle was very grateful some weeks ago for some insight I gave him using prime numbers, gleaned partly with help of trial runs that used this function, as well as a pair of C++ classes for which I had adapted it from the basic algorithm as I had written it in past years, just for the hacking kick of it.

While I'm still bothering with this silly debate, I'll ask you how your algorithm performs for primes up to a billion, or up to 4,294,967,295.

TeleMad · April 21, 2005

Posted yesterday @ 5:15 pm in a Chemistry thread. Moved now that I can get back into the Physics/Mathematics section.

It dawned on me that I could leverage my 50-million row lookup table of prime/non-prime numbers. That is, using my already-precalculated 50-million known prime/nonprime values, I can rather quickly calculate whether or not any number up to a theoretical 50-million squared ... that's up to 2,500,000,000,000,000 which is 2.5 quadrillion ... is prime.

All it required was a simple change to an existing function I had. Here's the update algorithm.

FUNCTION IsPrime(lnNumber, lcSieveAlias)
LOCAL llIsPrime, lnSelect, lnLcv, lnMax
lnSelect = SELECT()
SELECT (lcSieveAlias)
IF (lnNumber <= RECCOUNT())
	GO (lnNumber)
	llIsPrime = prime
ELSE
	llIsPrime = .T.	
	lnMax = CEILING(SQRT(lnNumber))
	SCAN FOR prime
		IF (RECNO() > lnMax)
			EXIT
		ENDIF
	
		IF (lnNumber % RECNO() = 0)
			llIsPrime = .F.
			EXIT
		ENDIF
	ENDSCAN
ENDIF
SELECT (lnSelect)
RETURN llIsPrime
ENDFUNC

The only cleaning up that would need to be done is that IF someone specifies a number that it soo large - a number greater than 2.5 quadrillion - the function doesn't throw an exception. A simple fix, but one I will ignore for now.

I'll time test this function tonight using 10,000 pseudo-randomly generated numbers between 1 and a quadrillion, posting the MIN, MAX, and AVG time intervals when it completes.

Sign In

Factoring

Recommended Posts

TeleMad

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

TeleMad

TeleMad

Qfwfq

TeleMad

Qfwfq

TeleMad

TeleMad

TeleMad

Tormod

Qfwfq

TeleMad

TeleMad

Qfwfq

TeleMad

Qfwfq

TeleMad

Join the conversation

Browse

Activity