Root/m1rc3/norruption/LOG

Source at commit 852210cdfa0b62e85df9c7aa4ca0195ff93f19d6 created 9 years 20 days ago.
By Werner Almesberger, m1rc3/norruption/: next round of tests, just resetting without power-cycling
1--- Tue 2011-09-06 ------------------------------------------------------------
2
3Running "loop": power-cycle, sleep 2 s, jtag-boot, sleep 70 seconds,
4which is enough to boot into FN and render "The Tunnel" for a moment,
5then power-cycle again (off-time is 5 s).
6
7Note that the test loop is "open-loop" and will cycle also past any
8problems. The first time a corrupt standby (or any other issue) is
9observed may therefore be well after the actual event.
10
111: started around 11:53 (M1 configuration is original, without locking)
12(around 500) visually checked boot process; standby was reached normally
13
14--- Wed 2011-09-07 ------------------------------------------------------------
15
16645: neocon stopped working (around 01:58)
17666: detected neocon failure at run 666: restarted neocon; urjtag failed
18     this cycle; back to normal at 667
19684: checked LEDs again (first time since ~500) and found that standby
20     may be failing. stopping test at 685 (around 02:50) for
21     investigation.
22
23Downloaded the standby bitstream:
24
25  wget https://raw.github.com/milkymist/scripts/master/scripts/reflash_m1.sh
26  chmod 755 reflash_m1.sh
27
28  ./reflash_m1.sh --read-flash
29
30Found two corruptions in the standby bitstream:
31
32  diff -u <(hexdump -C standby.fpg) <(hexdump -C /home/root/.qi/milkymist/read-flash/2011...)
33
34-00000080 00 00 4c 83 00 00 4c 87 00 00 cc 85 d8 47 cc 43 |..L...L......G.C|
35+00000080 00 00 4c 83 00 00 4c 87 00 00 c4 80 d8 47 cc 43 |..L...L......G.C|
36
37-00002840 00 08 cc 26 00 00 00 00 00 00 00 00 0c 44 00 98 |...&.........D..|
38+00002840 00 00 cc 26 00 00 00 00 00 00 00 00 0c 44 00 98 |...&.........D..|
39
40CRC-checked the partitions:
41
42  git clone git://github.com/milkymist/milkymist
43  cd milkymist/tools/
44  gcc -Wall -I. -o flterm flterm.c
45  wget http://milkymist.org/updates/current/for-rc3/boot.4e53273.bin
46  ./flterm --port /dev/ttyUSB0 --kernel boot.4e53273.bin
47
48  serialboot
49  a
50
51  only standby.fpg failed the CRC check
52
53Reflashed the standby bitstream:
54
55  wget http://milkymist.org/updates/2011-07-13/for-rc3/fjmem.bit
56  (or http://milkymist.org/updates/fjmem.bit.bz2)
57  wget http://milkymist.org/updates/current/standby.fpg
58
59  jtag
60
61  cable milkymist
62  detect
63  instruction CFG_OUT 000100 BYPASS
64  instruction CFG_IN 000101 BYPASS
65  pld load fjmem.bit
66  initbus fjmem opcode=000010
67  frequency 6000000
68  detectflash 0
69  endian big
70  flashmem 0 standby.fpg noverify
71
72M1 enters standby normally again.
73
74Running "loop2": power-cycle, sleep 2 s, jtag-boot, sleep 10 seconds,
75which is enough to begin (but not finish) booting RTEMS, then
76power-cycle again (off-time is 5 s).
77
781: started around 05:01. Observed until about 200-300 (06:00-06:30)
79that standby was okay.
80~730 (08:48): observed that standby didn't load anymore (note: due to
81a bug in labsw, power is not turned on in about 5-10% of the cycles,
82so the real cycle count should be around 650-700.)
83
84Standby bitstream difference:
85
86-00000080 00 00 4c 83 00 00 4c 87 00 00 cc 85 d8 47 cc 43 |..L...L......G.C|
87+00000080 00 00 00 00 00 00 4c 87 00 00 cc 85 d8 47 cc 43 |......L......G.C|
88
89Reflashed standby and locked the NOR. Testing with loop2 again.
90
911 (09:18): started
92... continuing through the night ...
93
94--- Thu 2011-09-08 ------------------------------------------------------------
95
963483 (03:18): standby is good so far
974325 (07:40): manually ended test. Standby is still good, but starting
98    with cycle 3704, booting RTEMS failed with
99
100    I: Booting from flash...
101    I: Loading 1889692 bytes from flash...
102    E: CRC failed (expected aa12a56a, got 68ec25e6)
103
104A CRC check yielded:
105
106Images CRC:
107  Checking : standby.fpg CRC passed (got c58e8905)
108  Checking : soc-rescue.fpg CRC passed (got 30dcc535)
109  Checking : bios-rescue.bin(CRC) CRC passed (got c78353fa)
110  Checking : splash-rescue.raw CRC passed (got e8ff824f)
111  Checking : flickernoise.fbi(rescue)(CRC) CRC passed (got aa12a56a)
112  Checking : soc.fpg CRC passed (got 3a31e737)
113  Checking : bios.bin(CRC) CRC passed (got 86e23684)
114  Checking : splash.raw CRC passed (got 978f860c)
115  Checking : flickernoise.fbi(CRC) CRC failed (expected aa12a56a, got 68ec25e6)
116
117Read back the FlickerNoise partition with
118
119  readmem 0x920000 0x0400000 fn.bin
120
121Compare with the original:
122
123  wget http://www.milkymist.org/updates/2011-07-13/flickernoise.fbi
124  md5sum flickernoise.fbi
125  5b7367e71bda306b080bde124615859b flickernoise.fbi
126
127  diff -u <(hexdump -C flickernoise.fbi) <(hexdump -C fn.bin)
128
129...
130-0008a380 28 43 00 00 34 64 00 01 58 44 00 00 5c 60 00 1e |(C..4d..XD..\`..|
131+0008a380 28 43 00 00 00 00 00 01 58 44 00 00 5c 60 00 1e |(C......XD..\`..|
132...
133
134Recovered the FN partition and unlocked the NOR:
135
136  flashmem 0x920000 flickernoise.fbi noverify
137  unlockflash 0 55
138
139New test series with script loop4. This differs from loop2 in that
140it uses "pld reconfigure" to return to standby, instead of
141power-cycling. If we still observe corruption with this test, then
142a software problem would be to blame.
143
1441 (09:11): started
1452403 (19:07): standby still looks good
146

Archive Download this file

Branches:
master



interactive