AIRobot

AIRobot quick note


  • 首页

  • 关于

  • 标签

  • 分类

  • 归档

  • 搜索

dpdk report bus error on arm64

发表于 2020-08-22
本文字数: 13k 阅读时长 ≈ 12 分钟

一开始以为是内存对齐导致的bus error。

最后发现是/var/run空间不足导致存不下大页信息。

参考链接
http://mails.dpdk.org/archives/users/2017-February/001590.html

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
Thank you Keith and Monroy, with your help I was able to track down the
problem, My var/run was too small to hold the hugepage information so when
I increased its size, it worked. Thank you so much.

On Thu, Feb 23, 2017 at 10:35 AM, Sergio Gonzalez Monroy <
sergio.gonzalez.monroy at intel.com> wrote:

> As Keith suggested, gdb is probably your best bet now.
> You could also do 'strace' to see if something shows up there.
>
> If you are running as root, the application is opening a file in /var/run
> to store some hugepage information, then it memsets to 0.
>
> What distro and kernel are you running on?
>
>
>
> On 23/02/2017 16:19, Sushil Adhikari wrote:
>
>> I didn't understand what you mean by hugepage value, if you mean number of
>> hugepages here's what it looks like
>> [~]$ grep -ri hugepages /proc/meminfo
>> AnonHugePages: 0 kB
>> HugePages_Total: 512
>> HugePages_Free: 512
>> HugePages_Rsvd: 0
>> HugePages_Surp: 0
>> Hugepagesize: 2048 kB
>>
>> And the linux version is 4.4.20.
>>
>> On Thu, Feb 23, 2017 at 9:17 AM, Wiles, Keith <keith.wiles at intel.com>
>> wrote:
>>
>> On Feb 22, 2017, at 7:18 PM, Sushil Adhikari <sushil446 at gmail.com>
>>>>
>>> wrote:
>>>
>>>> Thank you Keith for the response,
>>>>
>>>> Yes it should be line 1142 not 1405, I was using 16.11 and now I'm using
>>>>
>>> 17.02 and still getting the same error.
>>>
>>> Not sure what to say here, it looks like some type of system
>>> configuration
>>> issue as I do not see it on my machine.
>>>
>>> Can you tell if the hugepage has a value and is it sane? The next thing
>>> is
>>> to see where in that memory is it failing start, end or middle someplace.
>>> Use GDB and compile the code with ‘make install
>>> T=x86_64-native-lunixapp-gcc EXTRA_CFLAGS=“-g -O0”' then set a break
>>> point
>>> on ‘b eal_memory.c:1142’ and inspect the memory pointer hugepage. I do
>>> not
>>> think it is overrun error meaning the size for memset is different then
>>> what was allocated and just stepping off the end.
>>>
>>> Also you did not tell me the linux version you are using?
>>>
>>> On Wed, Feb 22, 2017 at 8:46 PM, Wiles, Keith <keith.wiles at intel.com>
>>>>
>>> wrote:
>>>
>>>> On Feb 22, 2017, at 6:43 PM, Wiles, Keith <keith.wiles at intel.com>
>>>>>
>>>> wrote:
>>>
>>>> On Feb 22, 2017, at 6:30 PM, Sushil Adhikari <sushil446 at gmail.com>
>>>>>>
>>>>> wrote:
>>>
>>>> I used the basic command line option "dpdkTimer -c 0xf -n 4"
>>>>>> And to update on my findings so far I have narrowed down to this
>>>>>>
>>>>> line(1405)
>>>
>>>> memset(hugepage, 0, nr_hugefiles * sizeof(struct hugepage_file));
>>>>>> of function rte_eal_hugepage_init() in file
>>>>>>
>>>>> dpdk\lib\librte_eal\linuxapp\eal\eal_memory.c
>>>
>>>> What version of DPDK are you using? I was looking at the file at 1405
>>>>>
>>>> and I do not see a memset() call.
>>>
>>>> I found the memset call at 1142 in my 17.05-rc0 code. Please try the
>>>>
>>> latest version and see if you get the same problem.
>>>
>>>> Yes I have the hugepages of size 2MB(2048) and when I calculate the
>>>>>>
>>>>> memory this memset function is trying to set, it comes out to
>>> 512(nr_hugefiles) * 4144 ( sizeof(struct hugepage_file) ) = 2121728 which
>>> larger than 2MB, so my doubt is that the hugepages I have
>>> allocated(512*2MB) is not contiguous 1GB memory its trying to access
>>> memory
>>> thats not part of hugepage, is that a possibility, even though I am
>>> setting
>>> up hugepages during boot time by providing it through kernel option.
>>>
>>>>
>>>>>> On Wed, Feb 22, 2017 at 8:05 PM, Wiles, Keith <keith.wiles at intel.com>
>>>>>>
>>>>> wrote:
>>>
>>>> On Feb 22, 2017, at 3:05 PM, Sushil Adhikari <sushil446 at gmail.com>
>>>>>>>
>>>>>> wrote:
>>>
>>>> Hi,
>>>>>>>
>>>>>>> I was trying to run dpdk timer app by setting 512 2MB hugepages but
>>>>>>>
>>>>>> the
>>>
>>>> application crashed with following error
>>>>>>> EAL: Detected 4 lcore(s)
>>>>>>> EAL: Probing VFIO support...
>>>>>>> Bus error (core dumped)
>>>>>>>
>>>>>>> If I reduce the number of hugepages to 256 it works fine. I
>>>>>>>
>>>>>> wondering what
>>>
>>>> could be the problem here. Here's my cpu info
>>>>>>>
>>>>>> I normally run with 2048 x 2 or 2048 per socket on my machine. What
>>>>>>
>>>>> is the command line you are using to start the application?
>>>
>>>> processor : 0
>>>>>>> vendor_id : GenuineIntel
>>>>>>> cpu family : 6
>>>>>>> model : 26
>>>>>>> model name : Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz
>>>>>>> stepping : 5
>>>>>>> microcode : 0x11
>>>>>>> cpu MHz : 2794.000
>>>>>>> cache size : 8192 KB
>>>>>>> physical id : 0
>>>>>>> siblings : 4
>>>>>>> core id : 0
>>>>>>> cpu cores : 4
>>>>>>> apicid : 0
>>>>>>> initial apicid : 0
>>>>>>> fpu : yes
>>>>>>> fpu_exception : yes
>>>>>>> cpuid level : 11
>>>>>>> wp : yes
>>>>>>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
>>>>>>>
>>>>>> pge mca
>>>
>>>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
>>>>>>>
>>>>>> syscall nx
>>>
>>>> rdtscp lm constant_tsc arch_
>>>>>>> perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni
>>>>>>>
>>>>>> dtes64
>>>
>>>> monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt
>>>>>>> lahf_lm ida dtherm tpr_shadow vnm
>>>>>>> i flexpriority ept vpid
>>>>>>> bugs :
>>>>>>> bogomips : 5600.00
>>>>>>> clflush size : 64
>>>>>>> cache_alignment : 64
>>>>>>> address sizes : 36 bits physical, 48 bits virtual
>>>>>>> power management:
>>>>>>>
>>>>>>> processor : 1
>>>>>>> vendor_id : GenuineIntel
>>>>>>> cpu family : 6
>>>>>>> model : 26
>>>>>>> model name : Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz
>>>>>>> stepping : 5
>>>>>>> microcode : 0x11
>>>>>>> cpu MHz : 2794.000
>>>>>>> cache size : 8192 KB
>>>>>>> physical id : 0
>>>>>>> siblings : 4
>>>>>>> core id : 1
>>>>>>> cpu cores : 4
>>>>>>> apicid : 2
>>>>>>> initial apicid : 2
>>>>>>> fpu : yes
>>>>>>> fpu_exception : yes
>>>>>>> cpuid level : 11
>>>>>>> wp : yes
>>>>>>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
>>>>>>>
>>>>>> pge mca
>>>
>>>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
>>>>>>>
>>>>>> syscall nx
>>>
>>>> rdtscp lm constant_tsc arch_
>>>>>>> perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni
>>>>>>>
>>>>>> dtes64
>>>
>>>> monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt
>>>>>>> lahf_lm ida dtherm tpr_shadow vnm
>>>>>>> i flexpriority ept vpid
>>>>>>> bugs :
>>>>>>> bogomips : 5600.00
>>>>>>> clflush size : 64
>>>>>>> cache_alignment : 64
>>>>>>> address sizes : 36 bits physical, 48 bits virtual
>>>>>>> power management:......
>>>>>>>
>>>>>>> And Here's my meminfo
>>>>>>>
>>>>>>> MemTotal: 24679608 kB
>>>>>>> MemFree: 24014156 kB
>>>>>>> MemAvailable: 23950600 kB
>>>>>>> Buffers: 3540 kB
>>>>>>> Cached: 31436 kB
>>>>>>> SwapCached: 0 kB
>>>>>>> Active: 21980 kB
>>>>>>> Inactive: 22256 kB
>>>>>>> Active(anon): 10760 kB
>>>>>>> Inactive(anon): 2940 kB
>>>>>>> Active(file): 11220 kB
>>>>>>> Inactive(file): 19316 kB
>>>>>>> Unevictable: 0 kB
>>>>>>> Mlocked: 0 kB
>>>>>>> SwapTotal: 0 kB
>>>>>>> SwapFree: 0 kB
>>>>>>> Dirty: 32 kB
>>>>>>> Writeback: 0 kB
>>>>>>> AnonPages: 9252 kB
>>>>>>> Mapped: 11912 kB
>>>>>>> Shmem: 4448 kB
>>>>>>> Slab: 27712 kB
>>>>>>> SReclaimable: 11276 kB
>>>>>>> SUnreclaim: 16436 kB
>>>>>>> KernelStack: 2672 kB
>>>>>>> PageTables: 1000 kB
>>>>>>> NFS_Unstable: 0 kB
>>>>>>> Bounce: 0 kB
>>>>>>> WritebackTmp: 0 kB
>>>>>>> CommitLimit: 12077660 kB
>>>>>>> Committed_AS: 137792 kB
>>>>>>> VmallocTotal: 34359738367 kB
>>>>>>> VmallocUsed: 0 kB
>>>>>>> VmallocChunk: 0 kB
>>>>>>> HardwareCorrupted: 0 kB
>>>>>>> AnonHugePages: 2048 kB
>>>>>>> CmaTotal: 0 kB
>>>>>>> CmaFree: 0 kB
>>>>>>> HugePages_Total: 256
>>>>>>> HugePages_Free: 0
>>>>>>> HugePages_Rsvd: 0
>>>>>>> HugePages_Surp: 0
>>>>>>> Hugepagesize: 2048 kB
>>>>>>> DirectMap4k: 22000 kB
>>>>>>> DirectMap2M: 25133056 kB
>>>>>>>
>>>>>> Regards,
>>>>>> Keith
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>> Keith
>>>>>
>>>> Regards,
>>>> Keith
>>>>
>>>>
>>>> Regards,
>>> Keith
>>>
>>>
>>>
>
`
checksum
记一次linux大内存踩坑
AIRobot

AIRobot

AIRobot quick note
130 日志
15 分类
23 标签
GitHub E-Mail
Creative Commons
0%
© 2023 AIRobot | 716k | 10:51