Project

General

Profile

TFoC – The Fabric of Computing

Added by JohnO about 3 years ago

The weblog raises good points about how heavy everything can be to to reach quite small achievements. In addition to IDE’s and bloatware we might also consider the desirable test equipment, oscilloscopes, logic and spectrum analysers for instance. GPIO toggling of LED’s was in vogue not too long ago.

We also should not forget about the journey to get a sensor delivering a few bytes of information. Writing code in C continues to be a challenge for my cut and paste approach to coding. Having said that I don’t think I would want to miss that step since that is a catalyst to discover something new. A similar argument exists for oscilloscopes, logic and spectrum analysers since they can open up understanding, the Micro Power Snitch series being a prime example.

It isn’t all about arriving at B from A, there is the journey too! Unpacking a box from China then connecting a battery and seeing the packets flow into a central isn’t what JeeLabs is about.


Replies (9)

RE: TFoC – The Fabric of Computing - Added by jcw about 3 years ago

Agree - the field is immense. I’ve been dabbling in retro-computing a bit recently, to get back a feel of how things were done on very limited hardware. It’s been a very enlightening experience. With ultra low-power nodes, it sort of all comes back full circle: do we want / need more computing power and memory in all those remote nodes, or have we lost the ability to solve these relatively simple problems in correspondingly simple ways?

The other point you make also interests me a lot, and it falls into a similar category: how simple can the instruments be which we need to explore and investigate these designs and the challenges in making them work properly. An oscilloscope is absolutely immensely useful, but I have this feeling again that there must be something between an €0.10 LED and a €300+ oscilloscope to reach that sweet spot of being just the right window into this world.

Oh, and journeys… for me, it’s **definitely* all about the journey!*

RE: TFoC – The Fabric of Computing - Added by monsonite about 3 years ago

Jean Claude,

Thank you for an interesting series of posts this week on the Fabric of Computing - I found the Tiny BASIC and 6502/8080 emulation interesting - as there is a lot of interest in the emulation of old machines from the 1970’s and early 80’s. There is even a Z80 emulator that codes into 4K of 80x86 code - so that it can emulate the Z80 and run old Sinclar Spectrum games on a modern PC. Other Z80 emulators have been written in Javascript

http://bacteria.speccy.org/ie.htm

I too am a fan of minimal computing, and I have even ventured into the design of an interpreted language that can run on virtually microcontroller - and offer a range of commands, aimed specifically at exercising the hardware.

LED flashing, audio tones, PWM, ADC reading, digital input and output form just a few of the commands available directly to the User via a serial terminal. Loops, integer arithmetic, conditional branches millisecond and microsecond delays and easily extendable from the keyboard terminal are key features of this minimal user interface.

At the moment it is coded in C for portability, and it sets up a virtual machine on the target processor. I have ported it to everything from Arduino ATmega, STM32xxx and MSP430. The language directly interprets the serial characters input from the terminal (or a file) and executes a series of subroutines. The characters are chosen for ease of memorability - not unlike the “Order Codes” that were used in the 1949 EDSAC http://www.cl.cam.ac.uk/\~mr10/Edsac/edsacposter.pdf

The “language” was inspired by Ward Cunningham’s Txtzyme Interpreter - which I have built upon and greatly enhanced to include integer maths, defining new words, compilation, and a host of other features. The langauge “SIMPL” has been continually evolving since May 2013 - so I refer you my original blog post - and you can follow the progress from there.

http://sustburbia.blogspot.de/2013/05/txtzyme-minimal-interpreter-and.html

SIMPL is based on a virtual stack machine - which makes it very suitable for “Forth - like” languages. Many of the ideas in SIMPL were borrowed from Forth, and the SIMPL vocabulary is almost a subset of Forth. I am now working towards porting SIMPL onto the J1b Forth Processor - to get blistering performance from a specialised, but simple open core cpu.

In the future, I see SIMPL as almost equivalent to a bootloader - in that it is a tiny program, loaded into the target microcontroller once at the start of the debug session - and from that point on it gives you access to a series of debugging resources, a user interface and the means to write interactive interpreted code to exercise whatever code that you are working on.

Any microcontroller, once procgrammed with the SIMPL kernel will be able to interpret the SIMPL commands - a form of bytecode - either typed from the terminal or loaded from a file.

On a different note - the STM32F7 Discovery board could make a useful Euro 50 Oscilloscope or logic analyser - complete with 4.3" colour capacitive touch screen.

Ken

RE: TFoC – The Fabric of Computing - Added by jcw about 3 years ago

Thanks for the interesting notes, Ken. Fun times, eh? Sooo many neat things to look into!

-jcw

RE: TFoC – The Fabric of Computing - Added by monsonite about 3 years ago

Good evening Jean Claude,

Computers come alive when you give them even humble VGA video - very easy to do with an FPGA or an embedded video engine (EVE) chip from FTDI FT812 - about 7euros in 1 off from Farnell, Mouser etc.

I have just completed a new pcb “shield” to take the FT812 or FT813 - plus PS/2 mouse and Keyboard.

Really enjoying your TFoC and FPGA posts

Have a Happy Christmas

Ken

RE: TFoC – The Fabric of Computing - Added by monsonite about 3 years ago

Jean Claude,

The Arduino libraries also do not encourage minimal programming - all of the inbuilt functions have a significant code size penalty.

For example, to print “Hello World” using the Arduino Serial.begin() and Serial.println() functions uses 1796 bytes of program storage and 212 bytes (10%) of the available RAM.

Whilst this is trivial to program in just 6 or so lines of code it really hammers the available resources:

void setup() {
Serial.begin(115200);
}

void loop() {
Serial.println(“Hello World”);
}

So I set about trying to reduce the “bloat” from the standard Arduino sketch.

I successfully managed to eliminate Serial.Print() - from a terminal application and saved 1082 bytes just by not calling Serial.begin(115200). I found a minimum UART sketch online and coded “Hello World” as a series of inline putchars.

Just using void setup() and void loop() - adds 298 bytes to a sketch - but if you use a traditional main() and a while(1) loop instead you can reduce this overhead significantly. But a word of caution, - you will lose millis() delay() and delaymicroseconds().

After these quick hacks, the results were just 280 bytes of program storage, and zero bytes of RAM, to perform the same overall task, the repeated printout of “Hello World”.

Some of the other Arduino functions are not much better, - Here are some of the main culprits

Serial.begin(115200) 1082 bytes

Serial.print(“”); 62 bytes

digitalRead() 116 bytes

digitalWrite() 454 bytes

analogRead 114 bytes

Total bloat that could be avoided 2126 bytes

Not only are these functions are costly on resources, but some like digitalWrite are immensely slow compared to what can be achieved with direct manipulation of the I/O registers.

When it comes to sending formatted output to a terminal, functions like sprintf() are no better - it adds a massive 1000 bytes!

That just over 2K bytes of code that could be eliminated by careful use of hand-crafted functions.

My UART initialisation for example is just 38 bytes and by using hand written get_char and put_char, and a complete program including a function that prints out an integer number to a terminal can be done in 650 bytes.

This effectively removes much of the bloat of the Arduino serial code.

I rewrote Ward Cunningham’s Txtzyme - a tiny interpreted serial “language” - and got it down from 3422 bytes down to just 1332 - a saving of 2090 bytes.

It just shows that if we were a little more inquisitive about the inner workings of library code, we might learn more efficient ways to accomplish a particular task.

It also illustrates that there is a lot more mileage to be had from these low resource 8-bit microcontrollers. An ATmega328 at 20MHz chomps through instructions approximately 20 times faster than the old 4MHz Z80. It’s only real limitation is that it is bitterly constrained by its 2K internal RAM.

I guess that it’s a trade off between time and convenience - I have no real desire to return to hand assembled Z80 or AVR machine code - where the only tool to help was a pencil and paper!

An Arduino running a Tiny Basic interpreter (written in assembler) - would be no slouch compared to say many of the early 1980 home micros.

Innovative products - such as James Bowmans Gameduino, and Gameduino2 - which effectively offload the overheads of video and sound generation to a graphics co-processor, allow even the humble Arduino to perform a realistic facsimilie of the 1980’s arcade games.

How do we move into more powerful processors and hardware, and yet retain the simplicity and convenience of the likes of an interpreted language such as Tiny Basic? Is there a simple, yet scalable solution, a kind of lingua-franca, which allows us to exploit the inner potential of any microcontroller, soft core or whatever - no matter the speed or complexity - using a simple open source toolchain - that doesn’t consume hundreds of megabytes on the hard drive?

Ken

RE: TFoC – The Fabric of Computing - Added by monsonite about 3 years ago

Re: Saturday 12-12 TFoC

Jean Claude,

In my opinion, FPGAs come into their own when you want to integrate a whole bunch of hardware functions onto the one IC - effectively the System on a Chip or SoC.

Already you have shown that they can host a variety of softcore processors - such as Z80, 6502 and the J1 Forth cpu - now the strength of the FPGA is that you can add all the other hardware whizz-bang peripherals that turn that processor into a system - such as colour video generation, (for LCD or VGA monitor), multi-channel sound synthesis, high speed communications, RAM interface, sd card for storage, keyboard, mouse etc. Previously, these would have been multi-chip solutions - just lift the lid on a 48K spectrum or Atari-1040 for example - and see the complexity of the motherboard - and the size of the power supply!

What the FPGA allows is the means to put all this complexity onto one low-cost, compact device, and run it using - worst case, about 100mA at 3V3 - thus allowing the convenience of battery power and portability.

Using the \$10 FPGA approach - we could now put a custom, reconfigurable computer - with all the resources of a mid-1990s (100MHz) workstation into an optical mouse enclosure - for significantly under \$20.

Using the microcontroller approach for SoC integration - as has been demonstrated so effectively by the Raspberry Pi Zero - we get even more bang for our buck - a 1GHz ARM11 core running Linux for \$5 - potentially offering 10 times the performance - and able to run off the shelf, open source software.

However, the hardware of the Pi - Zero, is so tightly bound up in the Broadcom SoC - that to gain any more functionality requires additional, off-board hardware - and that in my opinion is where the FPGA gains - with it’s flexibility.

Perhaps the ideal solution that offers both computing power and hardware flexibility is the combination of microcontroller and FPGA - either on the same board - or better still, on the same IC - and that’s where the Xilinx Zynq comes in, offering a powerful ARM processor and a substantial FPGA in the one package.

RE: TFoC – The Fabric of Computing - Added by monsonite about 3 years ago

TFoC Saturday 12-12-15

Jean Claude,

Your observations about orders of magnitude difference in terms of computing performance are clear and stated at a timely moment.

I too have been looking at performance of different machines over the decades.

When you look at the first valve machines, specifically EDSAC of the late 1940s - about 600 instructions per second was all that could be achieved, and this was mostly down to the fact that the ALU handled data in a serial fashion - as you really cannot build a parallel 35 bit ALU with just 1500 triode valves - the 1940’s switch equivalent of the transistor.

Jumping forward 25 years to 1965 and the PDP8 - this was the first of the mass-market “mini-computers”. By this time digital hardware was transistorised - using DTL (diode, transistor logic) - essentially diodes were used to create the “OR” function, and a transistor was use for the invert or “NOT” function - thus allowing the full range of logic gate functions to be synthesised.

The first PDP 8 used about 1500 transistors (PNP germanium) and about 3000 diodes. The engineers at DEC worked hard to get the transistor count down - because back then a transistor cost about \$2 or \$3 each - but faling rapidly - and Gordon Moore’s illustrates this point graphically.[[[https://www.eecs.berkeley.edu/\~boser/courses/40/lectures/Course%20Notes/Moore’s%20Law.pdf]]]

The PDP8 used magnetic core memory - as was common at that time, and it was the memory cycle time (as usual) that had the most influence on the overall processing speed. Core memory was very labour intensive - so the whole 4K word machine sold in 1965 for \$18,000 - at a time when a new convertible VW Beetle cost \$1750.

Ten years later, when the 6502 was created, the transistor price had fallen by 2 orders of magnitude per decade, and the whole CPU could be integrated on the one silicon die - allowing the 3510 transistor 6502 to be sold for about \$20. Smaller integrated transistors meant faster operation - and so the 6502 could be clocked at 2MHz - allowing 1 million operations per second.

Another decade, and the team at Acorn Computers were working on the first ARM processor. Here a tiny British design team, took a radical approach, that flew in the face of conventional cpu design wisdom, and created a processor with 25,000 transistors.

It’s contemporary - the Intel 80386 used 275,000 - more than 10X the transistor count.

The ARM 1, first ran April 1985 - and here I believe was the start of a revolution in computing devices. Intel continued to plug away at their ’86 architecture - with it’s transistor count and power consumption rapidly spiraling skywards.

By 1995 an Intel Pentium Pro used 5,500,000 transistors and a 307mm2 die whilst the ARM 700 still used a tenth of this number on a much smaller die area. The bigger the die area, the more likely that there is a defect, and this lowers the overall yield from the wafer. Hence the price per die increases.

Intel’s insistance of sticking to a 1976 architecture has cost them dearly, both in terms of complexity, transistor count and cost. This is why ARM processors now dominate the mobile computing market, plus other low cost consumer and automotive markets.

Intel hit a brick wall around 2000, with their power greedy Pentium 4. I had a laptop at the time with a 3.06GHz P4 - which cooked your legs when using it on your lap. It took Intel a further 8 years to manoeuvre out of the P4 road block, and come out with their lower power Atom devices.

There has to be a way to reduce complexity - As you stated:

_“Four decades later, on a 2015-era 4-core 2.8 GHz i7 CPU with its advanced pipelining and branch prediction, each of the cores can process billions of instructions per second – with an optimising gforth compiler for example, the “1000000000 0 do loop” takes around 2 seconds – that’s 2 nanoseconds per loop iteration“_
Well, as you know, the J1 Forth computer implemented as an open soft core on a \$10 FPGA can also achieve credible results - executing the same billion empty loop ”1000000000 0 DO LOOP " on an 80MHz J1 executes in almost exactly 100 seconds. About 100nS per loop - not bad for a device running 1 core and at 1/35th of the clock speed and a tiny fraction of the power.

If the J1 could run at 2.8GHz it would do the task in 2.85 seconds - only 2/3rds of the performance of the billion transistor Intel - What are they doing with all those other transistors……..?

Here we see that a transistor count of 1 billion is not the best way to get a task done. I am looking forward to exciting times ahead…….

RE: TFoC – The Fabric of Computing - Added by jcw about 3 years ago

Good points - I guess it all depends on what you’re after: minimal power? cheapest? most general? smallest?

In terms of wireless sensor nodes, at least today for the hobbyist, I think it’s hard to beat an ARM Cortex M0+ w/ RFM69, or something similar.

The Micro Power Snitch will periodically send out wireless packets, using nothing but “stolen” power via a galvanically isolated Current Transformer (I think someone was able to keep it going with 18W @ 230 VAC). And that’s not even the limit, I expect. A wireless node drawing 2 µA in sleep mode would run 10 years on a standard CR2032 coin cell with these power levels.

I’ve been doodling a bit with various retro setups lately, in addition to the FPGA on the weblog. Great fun, to see “simh” act like a PDP-8, or an emulated PDP-11 boot up in an original Unix system. I really would like to build this small PDP-8/I replica.

Also built this one. It runs BSD Unix… with just 128 KB RAM, and a µSD card as emulated disk:

But that PiZero sure is a game changer. Arduino clones, ESP’s and EMW’s, and now PiZero - all for under \$10.

What’s sorely lacking IMO, is software (and a stable foundation) to make these devices convenient. It takes only 876 bytes of flash to make a wireless node which sends out periodic blips, using an LPC8xx µC and an RFM69 module. And while that’s pretty small and oodles of fun, it’s far from convenient to develop at such a conceptually low level…

RE: TFoC – The Fabric of Computing - Added by Rolf about 3 years ago

First, thanks a lot to Jean-Claude for the very interesting and instructive articles dring the last weeks and months. Here is some feedback.

I like it small and effective, my house monitoring system ist distributed, a bifferboard linux server (today RPi is the choice) with jeelink attached as central node and jeenodes and j-micro for sensores and switches. Nothing special with this concept, I think. Concerning CPU-power, flash and RAM the Atmega 328p (jeelink) is more than I need. So why ARM? For me low power consumption is the only interesting aspect at the moment.

Never the less the journey in the ancient micro computer world was very interesting. I started with C64 as my first affordable private computer. Had already ecperience with assembler on Z80 and was programming 6502 in assembler too. Then, midst of the 80th, PROMAL came up. This was structured like C, looking like OCCAM, very nice. The trick was implementing a C-like language on a CPU with only 265 byte stack. With this background it was very easy for me to switch to C, which is still my favorite language. I have still the source code of the PROMAL compiler, but never implemented it on another hardware. But with PROMAL, on the C64, a full screen editor was delivered. And this is what I still use today as program editor and even as text editor for simple tasks. I added a lot of functions during the years, even support of unicode, and since my PCs run with linux, development came to some end (before I took it from C64 to IBM PC, Atari(68000), AIX, Solaris, linux).

Looking forward how things will proceed at Jeelabs, at the moment very, very interesting. When will the first STM32F103 board be deliverd at JeeLabs Shop, with ports and RFM69CW radio?

    (1-9/9)