Non-stable throughput on test with 1 000 000 objects

Hello, I'm playing with Gigaspaces to find out how well GS could be applicable in my case.
I've run a test on 3 nodes with 3 partitions with 1 backup per each. I have 1 000 000 objects initialized in space. Test consists of updates of the objects with a random ids (implemented as a Task with SpaceRouting). Tasks are generated by a client run on the other node.

I've got 26 857 TPS on average, but with variance from 38 000 to 0. It looks puzzling that throughput kept stable over time, but went down to around 0 for a periods of 2-4s sometimes. Looks like GC is not the reason: the longest stop-the-world pauses on Full GC took less than 0.6s. I'm puzzled.

Do you have any ideas about the reasons?

I run on bare metal, Xmx=3G, with default pool and connection settings on both nodes and client. During tests CPU is idle 70%, memory footprint: free 1295M, total: 2379M, max memory: 2730M.

     I am suspecting it to be either GC or network bottleneck as you mentioned the CPU is 70% idle.
     Regarding the GC you mentioned there is only 0.6 sec Full GC but could this be in primary or backup JVM and not sure if you checked the GC on all VMs?
     Regarding the network, what type of network do you have between the machines (Gigabit or 100Mb )? Could this be network issue?
     Some suggestions to narrow into the actual cause,
1) Could you try using CMS?
2) Could you try using Change API inside the Task which performs the update? More information on change API is here, http://wiki.gigaspaces.com/wiki/display/XAP95/ChangeAPI. Idea is that Change API is more performant and generates less Garbage.
3) Can you monitor network activity somehow? You can use a tool like, Wireshark, http://www.wireshark.org/


