What with the Mandelbrot Engine prototype sitting on the corner of my desk, I guess I should describe the innards so you guys know what you're getting into... First, the cold water spray: This thing is designed to do Mandelbrot calculations. It's not a general-purpose array processor, and it can't be contorted into being one. There are three main issues: inter-processor communications, dynamic range of the numeric operations, and data I/O. The Mandelbrot calculations don't need any communication between the processors, so that's what we've got. A "real" array processor should have some communication, but there's no good way to pull it off. Homework: sketch an inter-processor communications method that will work for any number of processors between 1 and 256, satisfy the needs of algorithms you don't know about yet, and can be implemented on a single chip micro without affecting performance of the main routine that doesn't need it... The internal math is fixed point, with 60 fractional bits. That gives a dynamic range of -8.0 to +7.999, which is ideally suited to Mandelbrot calculations, but not much else. The routines support addition, complementing, multiplying, squaring, but not division. The multiply routine is a 300 line macro that expands into about 3K of code -- it's written without run-time loops! There are some interesting side effects, because some of the routines check for overflow and clamp the numbers to 3.9999... With 60 fractional bits you can zoom in pretty much to your heart's content... we lose precision at about 1.0E-12, so you'll lose interest first. The iteration counts can range up to 64K-1, which ought to be enough for any picture you can get in your sights. With 8-byte reals there isn't enough RAM to hold all the values for Julia calculations... which is why we need an 8052-style processor. The results stream out of the array in daisy-chain order, which means the AT has to wait for the slowest processor. The big advantage of this method is that the communications channel doesn't have to carry addressing information, so the data rate isn't a limiting factor for any practical image. We could have the AT poll the processors, but then the effective serial data rate would be under 1/3 the actual rate, which isn't a good idea either. We could have any processor that's not ready send a zero count, but then the AT has to maintain a map of which points are filled in, which are pending, and so forth... this puts the AT smack into the critical path, which was exactly the reason we wanted an array processor in the first place! Providing a hook so you can download an 8052 program is no big to-do, so it'll be in the masked version. You'll need to add a RAM and address latch per processor, which will double the PC board footprint and expense. Basically, you get to download code to any one or all processors, then specify that that code be executed instead of the standard Mandelbrot iteration or on a one-off basis to replace the whole smash. Fair enough? My feeling is that you're kidding yourself. If you don't have the full suite of cross-assemblers and simulators you don't have a chance of getting the code running. If you've got all that, why do you want to bother hacking around with our hardware? Just get some 8031s, EPROMs, RAMs, and roll your own... and write it up for INK, of course! Now, the good news: This thing is a killer. Steve said that it takes about 3 minutes to compute the overall image with 64 processors. What he didn't say is that much of that is spent drawing the dots on the screen -- the Mandelbrot engine is waiting on the AT! The initial image is the whole Mandebrot set, centered on (-0.405,0) with a real axis size of 3.59 and aspect ratio of about 1.33. The image has 19.7% black points (44154 pels) with an average 6.6 iterations/point and computes in 2.8 minutes. If overhead in transmission and dot drawing weren't a factor, it would take 1.9 minutes. A better measure is to run the same scene with an iteration count of 128. It takes 6.2 minutes to display 37667 Mset points (16.8%), with an average of 15.0 iterations/point. The communications and display overhead accounts for about 1.8 minutes of that time. A LONG computation with this thing set to 1024 iterations is maybe 45 minutes... I don't have comparative times, but it greatly outruns my 8 MHz AT/10 MHz 80287 running the official IBM MSET program. Benchmarking this thing is a problem because we don't have an apples-to-apples comparison with a 286 running the same code. Sometime Real Soon Now... Ballpark performance: each iteration takes 5 to 7 ms per processor (including ALL overhead). Divide by the number of processors to get the average time per iteration for the whole array. For 64 processors it's about 94 us, including the data transmission and dot drawing times. Your mileage may vary, but that's a good starting point. Multiply by the number of points on the screen (224,000 for an EGA) and multiply THAT by the average number of iterations per point (which depends on the scene). Divide by 60,000 to get seconds and send us a check! Ed Nisley