Project

General

Profile

HY-STM32F1xxCore144 Core/Dev Board and PSRAM

Added by lmamakos about 4 years ago

After many weeks, two HY-STM32F1xx Core144 boards arrived in my mailbox! I got the fully decked-out version of the board, with the 8MB RAM and NOR flash devices installed. I loaded the latest mecrisp-forth and started playing about with the FORTH libraries and "cbo" board definitions on the Jeelabs github repo. First mission was to play with the 8MB of PSRAM.

I found that I had problems getting the PSRAM to work and pass the memory tests. If I slowed the board down using the 8MHz forth word, it appeared to "work". I noticed that the two boards that I received both behaved the same way. Also noticed that the PSRAM part populated on the board was different than indicated on the schematic; so possibly optimistic values that worked with another part might be out of spec for the alternative. My two boards are populated with EM7644SU16ASZP-70LF parts.

Finally, I started fiddling around with the FSMC_AddressSetupTime and FSMC_DataSetupTime initialization values to see what effect that might have. I had much better success values of 3 for address setup and 5 for datasetup.

Except, every once in a while, the psram-full test would fail. Hmmm..

So I wrote another word to wrap psram-full and attempt to run it 10000 times. I added another few tests to write and read values with less delay (not calling the RANDOM word). And then I would get occasional failures after a few dozens of iterations. I bumped up the address and data setup times higher and higher, and even with both set to 30 (and also increasing BusTurnAround to 5) I would still get occasional errors.

This is all very disturbing, to have unreliable memory. After the very first round of testing, I tried both boards that I received to verify that at least there wasn't some obvious assembly problem. Lately, all the testing was with one board and I will have to try to repeat with the other.

I wonder if you've tried doing any extended memory testing of the PSRAM beyond one pass? I don't know if there are signal integrity problems or what might be going on quite yet. It would be interesting to see if other shipments of this board also have the same PSRAM part, or the part as listed on the schematic. I may also try to run extended tests at 8MHz vs 72Mhz to see if that makes a difference.

One random thought - changing between 8MHz and 72MHz, the systick timer is not reprogrammed (even though the baud rate clock is adjusted). Trying to see how much slower something runs results... in exactly the same interval as returned using 'millis'! Ha, ha..


Replies (18)

RE: HY-STM32F1xxCore144 Core/Dev Board and PSRAM - Added by jcw about 4 years ago

Hmmm... very disturbing indeed.

(yes, you need to call "1000 systick-hz" again after a speed switch, I was planning to add some more wrappers for that)

Will need to look into this - unreliable memory is worse than useless, IMO.

PS. Have edited your post to show the image you attached inline. And here's mine:

RE: HY-STM32F1xxCore144 Core/Dev Board and PSRAM - Added by jcw about 4 years ago

I just re-ran the psram-full test 100 times here, without any reported failures. My code is exactly as in the embello repository.
So for now my conclusion has to be: can't replicate the problems you're running into.

Another thought: check that your power supply feed is ok, switch to another one or add some capacitors (both a small ≤ 0.1 µF one for HF spikes, and a larger ≥ 100 µF one for slower fluctuations) to rule out any problems with that.

Oh and it better be a 5V feed: if you're feeding it less, the on-board regulator won't have enough margin to do its work properly.

RE: HY-STM32F1xxCore144 Core/Dev Board and PSRAM - Added by lmamakos about 4 years ago

Thanks for the follow-up! I will try to repeat the tests with a different power scheme. At the moment, it is powered via the USB/serial adapter, which is connected to a powered hub. I can see how this is less than ideal. When I first started having the problem, I did try to power each of the boards from an external power source, via the USB connector and disconnected the power from the USB/serial adapter. That seemed to have no effect in improving the situation.

But that was before I increased the address and data setup times. I was getting consistent failures with the values in the embello repository.. Hopefully I don't have a counterfeit part on my boards..

However, there are different parts. Yours EM7644SU16ANP and mine EM7644SU16ASZP. I believe the "AN" is "2nd generation, 16 page mode / DPD", while "AS" would seem to be "2nd generation, non-page mode / Non-DPD". I seem to have that extra "Z" in my part number; hopefully it's not a counterfeit part! I've not looked at the page-mode operation of the part, and I don't know if the FSMC supports that mode of access?

Well, I will connect it to the bench power supply next and see how that works out. Given the large footprint around the PSRAM device for the alternative 1MB part, perhaps I'll try look at the power supply on those pads with my scope and see how nasty that looks. With that smaller package inside the larger footprint, the decoupling capacitors are relative far away from that BGA package..

It would be too boring to just have it work, I suppose :-)

Thanks for the suggestions, and for all the work that you've shared with us.

RE: HY-STM32F1xxCore144 Core/Dev Board and PSRAM - Added by lmamakos about 4 years ago

Just an update here.. running off an external USB power bank of relatively high quality still did not improve my situation with errors whilst running the memory tests.

Hmm..

So, now I'm running some tests to see if the failures that I see are corrupted reads or corrupted writes into the 8MB PSRAM device. My approach is to generate 4MB of pseudo random data, and write it into the first 4MB of memory. Then I copy it all to the second 4MB of memory using the "move" word. My thinking here is that this will cause more closely spaced writes than with calls to random, etc. Still, not back-to-back memory cycles, but at least beating on it a bit harder. Then I do a bunch of loops comparing the first half to the second half, reading out 1 cell (4 bytes) at a time. This seemed to work after some very limited testing.

So then I did a related test, where I generate the same 4MB of pseudo-random memory. And then in a loop I copy it to the upper half and do a compare. Each iteration is a copy and a compare. This kicked out an error after about a dozen iterations. Hmm.. Looking at the failed compare, I see:

Compare error @00235C10 E3BAA16C 02BAA16C

.. a one byte difference, where E3 and 02 are different. Hmm.

So before jumping to conclusions, I'm going to run an extended "read-only" test, where I'll fill the PSRAM device upper and lower halves with the same data and see if the reads continue to return the identical values. If that seems to work, then perhaps it's the occasional write operations that fail? We'll see what happens next..

(And it really is quite handy to hack up quick little tests like this in forth, without resorting to the C compiler. Which, of course, is the whole point of it all..)

RE: HY-STM32F1xxCore144 Core/Dev Board and PSRAM - Added by martynj about 4 years ago

@imamakos,

Isn't the MCU epoxy package cracked? A quick check on the integrity of the package would be a squirt of freezer - the error rate will change substantially.

RE: HY-STM32F1xxCore144 Core/Dev Board and PSRAM - Added by lmamakos about 4 years ago

No, what looks like a crack in the photo is probably residue from a cleaning solution. It's just an artifact.

Curiously, things seem to be working with the newer tests. This is puzzling. One difference between these and the origin ones from jcw are that the psram-init function isn't being called each time. I wonder if that's significant or not?

The testing I've done so far seems to have the error rate have some correlation with the "tightness" of the data and address set-up times with jcw's test against the random number generator. And I was previously able to observe failures with the write-intensive tests, which are not happening again. I'll have to look at how I refactored some code to see what effect that might have had.

At this point, I'm sure I've done something stupid along the way. It will be interesting to determine just what that might have been!

RE: HY-STM32F1xxCore144 Core/Dev Board and PSRAM - Added by martynj about 4 years ago

The switch to the ball-grid packaging is a concern - those decoupling caps end up on very "long" traces. With a low cost board, do the ball joints get X-rayed?

If you can get to a consistent failure case that won't go away with timing tweaks, I'd suggest trying the freezer treatment on that package.

RE: HY-STM32F1xxCore144 Core/Dev Board and PSRAM - Added by lmamakos about 4 years ago

As I understand it, the BGA package for the PSRAM is what happens when you order the "upgrade" from 1MB of SRAM which seems to occupy the larger footprint, at least based on the photos of the board on their web site.

I have two of these boards that I received. Both of them had problem passing any of the memory tests that jcw had provided, until I increased the data bus and address bus setup times in the code. Then they started behaving more or less the same. I then noted the unreliable behavior when running extended testing, but I've only been doing follow-up testing on one of the boards to avoid introducing too many variables at once.

I'm to the point now where the tests I've built seem to "work", but I'm still not sure why as compared to jcw's tests. One obvious difference is when running them in a loop, the jcw words initialize the PSRAM FSMC configuration on each pass, while mine do not. The other difference was during verification, the jcw tests were extracting (the repeatable) stream of values from the pseudo-random number generator while writing and verifying. In my tests, I filled half the memory with the random number generator, wrote that to the other half and then either
* copied once, and then repeatedly compared the two to see if they were the same on multiple passes
* copied and compared each time for each pass. Initially, this failed on rare occasions, but that seems to have stopped subsequent to some refactoring. Need to find out what that is.
* copied and then compared both halves to the output of the random number generator

all of these seem to work. I'm now trying to find out what the effective difference is between my testing and the jcw tests that occasionally fail. And one variation of my test.

Spooky hardware stuff? Maybe I'll try to drop some decoupling caps on the unpopulated footprint, closer to the BGA to see if that makes a difference. I didn't want to go there until I've convinced myself that the FSMC programming and configuration really is correct and if I can identify some pattern. Likewise, not sure I want to try to reflow the BGA package with my hot air rework tool quite yet.

I don't have a fancy enough logic analyzer with very high-Z probes, and I feature the cheaper one I do have without any buffered probes would disturb things enough to not be representative of what's really going on.

RE: HY-STM32F1xxCore144 Core/Dev Board and PSRAM - Added by jcw about 4 years ago

I'd also suggest contacting Haoyu - although two similarly-failing boards can probably only be attributed to a bad/wrong batch of chips.

FWIW, I could try and run one of your tougher tests over here, if you send me a self-contained source file.

RE: HY-STM32F1xxCore144 Core/Dev Board and PSRAM - Added by lmamakos about 4 years ago

I've been using this file with the "cbo" board hardware definitions. I loaded the "h" file, and then this into RAM to test with.

My new tests are rw-tester which seems to fail on one of my boards about every 700 or 800 iterations. The corresponding read-tester word seems to run without errors. I was getting more errors using the looping-tester word, which call your psram-test word repeatedly.

You might start up the** rw-tester** word and let it run for 10000 iterations over many hours and see what happens. A 0.1% failure rate of tests that each access 8 million memory locations doesn't sound that bad, except for causing ghosts..

What I find curious is that when an comparison error occurs, it's not just a single bit or in the same location:
~~~

Compare error @00295358 BE296624 BE726624
Compare error @0009E99C 1EEC757F 1EEC758B
Compare error @002DA234 9602BDE1 9602BDCD
Compare error @003D345C 473A5412 473A543C
Compare error @002AAE24 C6160340 C61A0340
Compare error @0019CFA4 7A9A8B62 7AD28B62
~~~

This from a total of 4669 iterations before I stopped it.

Curious that the errored bit are in the low 8 bits of the 16 bit memory word in the PSRAM. Wonder if that is of an significance?

I found a couple of 0603 0.1uF capacitors that look like I can bodge over the Vcc/Gnd pins of the 1MB SRAM footprint and will "fit". Don't know if I want to try that yet. This all annoyingly hard to reproduce rapidly..

psram (6.26 KB) psram

RE: HY-STM32F1xxCore144 Core/Dev Board and PSRAM - Added by jcw about 4 years ago

Thanks - this has been running for a few hours - past #2200 now - without errors so far:

I'll let it run to see if there's an error further down the line.

RE: HY-STM32F1xxCore144 Core/Dev Board and PSRAM - Added by jcw about 4 years ago

Crazy thought: try shielding, of some sort?

RE: HY-STM32F1xxCore144 Core/Dev Board and PSRAM - Added by lmamakos about 4 years ago

Speaking of crazy.. a couple more 0.1uF caps. The 0603 packages are a little large, 0402 would be better, but I don't have any and I've only hand-soldered 0603 sized parts. At least I've not broken the board and have another memory test running. Curious to see if this makes any difference. Will probably take a few hours of the test running to see.

Oh, "not recommended". Those pads seem dangerously easy to lift with a soldering iron. Maybe I should have tried hot air, but then I'd likely be chasing invisible parts across the bench and I'm worried about lifting other nearby parts..

I also discovered the schematic is a little misleading. While I have little doubt the various decoupling caps are connected to the right nets, their appearance and proximity to components on the schematic don't always seem to line up with the physical proximity to components on the PCB. This was a little disconcerting.

RE: HY-STM32F1xxCore144 Core/Dev Board and PSRAM - Added by martynj about 4 years ago

Neat re-work. Would noise on an address line explain the data corruption pattern? You mentioned the problem seems to occur on a Write - was this confirmed by reading back a failure multiple times?

RE: HY-STM32F1xxCore144 Core/Dev Board and PSRAM - Added by jcw about 4 years ago

FWIW, the unmodified psram test code ran here without any errors:

rw-tester [1st][2nd] 0 1 2 3 [...] 9997 9998 9999  ok.

Took over 12 hours, but it looks like that means it did 8E+10 successful reads and writes.
I wonder if that part number difference of the PSRAM could have something to do with it.

RE: HY-STM32F1xxCore144 Core/Dev Board and PSRAM - Added by lmamakos about 4 years ago

The additional decoupling capacitors didn't resolve the problem. I don't know if I can say that it makes it occur less frequently or not, but regardless that's not the answer.

I have another instance of this board I will be doing further testing with to see if the same thing happens. I'll see if I can contact the manufacturer and see what they have to say, at least about the different component selection on the two boards I received vs. the ones they've apparently used before.

I'm away from home this week on business travel, so not much additional debugging will happen over the next few days.

RE: HY-STM32F1xxCore144 Core/Dev Board and PSRAM - Added by jcw about 4 years ago

I know next to nothing about how PSRAM works, but could it be related to refresh cycles, perhaps?
Would it be an idea to cool or heat the chip and see whether the data corruption patterns change?
Then again... one would assume that any continuous RAM test would regularly refresh every cell/row/column.

RE: HY-STM32F1xxCore144 Core/Dev Board and PSRAM - Added by lmamakos about 4 years ago

It's all very curious, and I don't know if I ought to ascribe blame to a marginal hardware design or board layout, or dodgy components?

My suspicion is the PSRAM component; it clearly is different from the board that you have vs. the ones that I have received. The other telling bit is that the timing that seems to work well for you fails pretty quickly for me; it wasn't until I increased them some that I had any success.

I haven't had a chance to take this up with the vendor as I've been doing quite a bit of business travel lately. In any case, I don't really need the RAM; so I may note pursue this much more. It's all a bit of a lark in any case..

Thanks for sharing the work you done with Mecrisp Forth; it really is an interesting platform. And the more that I dig into the Mecrisp source code, the more that I'm impressed at some of what goes on in there. I wish there was an easy way to point Google Translate at the code and have it generate translations of the comments :-)

    (1-18/18)