assembly - NEON memcpy , memset and using .c with .s files -

June 15, 2013

I am trying to get acquainted with the Neon instructions both the assembly and the internal. I would like to use Neon Memcpy with GCV V4.8.2 hardfp accordindg to I usee:

I've also found this topic: but it's slightly different from the official ARM Page Implementation

Unfortunately I have never C is not used with files, so I need some help. my. The c file looks like this:

  #include & lt; Stdlib.h & gt; # Include & lt; Stdio.h & gt; #include & lt; String.h & gt; # Include & lt; Math.h> # Include & lt; Time.h> # Include & lt; Stdint.h & gt; #include & lt; arm_neon.h & gt; Int main () {clock_t start, end; // timer variable uint32_t i, X = 100; Size_t size = 2048 * 32 / * arbitrary * /; Size_t offset = 1; Char * src = malloc (sizeof (char) * (size + offset)); Char * dst = malloc (sizeof (char) * (size)); NEONCopyPLD (DST, SRT + offset, size); Memcpy (dst, src + offset, size); Return (0); }    and the assembly.s file is the following:  
  .global NEONCopyPLD NEONCopyPLD: PLD [r1, # 0xC0] VLDM r1, {D0-d7} VSTM r0, {D0-d7} SUBS r2, r2, # BGE, NEONCopyPLD 0x40    I use the following compilation instruction:  
  hand-linux -gnueabihf- GCC -mthumb March = ARMv7- a -mtune = Cortex-a 9 -mcpu = Cortex-a 9 -mfloat- abi = hard -mfpu = neon -Ofast -fprefetch loop arrays assembly.s asm_pr.c -o Output   
 and I get the following error:  
  potentially unanticipated fatal signal 11. CPU: 0p ED: 670 Com: out_asm not tainted 3kl0k9-RT 5 + # 2 functions: BF 907 Siand TI: Beef 4 EFT Taskktii: Beef 4 Aaftiaks PC on 0x4c90 CCR 0x852 D PC: [& lt; 004c90 ccs]] LR: [and lieutenum; 0000852 D & gt;] SSR: 40030030 SP: 7 EME 9 8 CB IP: 00000107 FP: 00000000 r10: 76f91000 r9: 00000000 r8: 00000000 r7: 00001017 r6: 0001855 r5: 00e75009 r4: 00010001 r3: 000f4240 r2: 00010000 R1: 00e75009 r0: 00e85010 Flag: nZcv IRQs at FIQs on mode USER_32 ISA Thum B. Segment User Control: 10C5387D Table: 4F 7404 DAC: 00000015 CPU: 0 PID: 670 com: Out_jum not spotted 3.10.9-RT 5 + # 2 Backrass: [& lt; 800120a4 & gt;] (dump_backtrace + 0x0 / 0x118) with [80012318 to & gt; ; & Lt; 804fab0c & gt;] (dump_stack + 0x24 / 0x28) [& lt; 804faae8 & gt;] from (dump_stack + 0x0 / 0x28);] [show_stack + 0x20 / 0x24] [& lt; 800122f8> gt; [show_stack + 0x0 / 0x24] [& lt; 8000f560 & gt;] [show_regs + 0x30 / 0x34] [& lt; 8000f530 & gt;] [show_regs + 0x0 / 0x34] [& lt; 800334 9c & gt;] (get_signal_to_deliver + 0x318 / 0x668) [& lt; 80033184 & gt;] (get_signal_to_deliver + 0x0 / 800x664 & gt;] (do_signal + 0x11c / 0x450) [& lt; 80011548 & gt;] [& the (do_signal + 0x0 / 0x450) lt; 80011b20 & gt;] (do_work_pending + 0x74 / 0xac) [& lt; 80011aac & gt;] [& lt; 80011664 & gt; the (do_work_pending + 0x0 / 0xac) [& LT; 8000e500 & gt;] (work_pending + 0xc / 0x20) segmentation fault    I have another question if we can use the SIMD instructions (using intrinsics or autovectorization) to speed up the initiality of an array with 0? I have seen that the following code can not be autovectorized:  
 for  (i = 0; i    Although this code The block can be autovectorized:  
 for  (i = 0; i & lt; n; i ++) a [i] = i;    My ultimate goal is to check that if I have a neon function that runs faster than  memset () .  
 In the end, I would like to have some ambiguous ends According to ask: The following code can not be autovectorized:  
  while (* p! = NULL) {* q ++ = * p ++; }    While it is possible to use internal or assembly to develop a fast version of this loop? If you have done something then can you post it here?   
 
  You never return to your assembler functions, so whatever code is stored under the assembler function It will be executed. This will crash sooner or later.  
 Exit your work on this:  
  mov pc, lr    This is very likely to fix your problems. You should also check which registers (Neon  and  General Purpose registers) you should maintain during the softness function calls.  
 This page is a useful resource that shows examples of how to do this:   

 



















Get link





Facebook





X





Pinterest





Email





Other Apps

Comments Post a Comment

Search This Blog

SET RT

assembly - NEON memcpy , memset and using .c with .s files -

Comments

Post a Comment

Popular posts from this blog

Java - Error: no suitable method found for add(int, java.lang.String) -

java - JPA TypedQuery: Parameter value element did not match expected type -

c++ - static template member variable has internal linkage but is not defined -