Welcome to the new Gigaspaces XAP forum. To recover your account, please follow these instructions.

Ask Your Question
0

Gigaspaces hosts are getting lost during deploy

I have the following configuration in application.xml:

<os-admin:pu
        processing-unit="element1.jar"
        sla-location="#{systemEnvironment['TARGET_DIR']}/element1.xml">

</os-admin:pu>

<os-admin:pu
        processing-unit="element2.jar"
        sla-location="#{systemEnvironment['TARGET_DIR']}/element2.xml">
        <os-admin:depends-on name="element1" min-instances-per-partition="1"/>

<os-admin:pu
        processing-unit="element3.jar"
        sla-location="#{systemEnvironment['TARGET_DIR']}/element3.xml">
        <os-admin:depends-on name="element2" min-instances-per-partition="1"/>
 ....

And so on. So (nearly) each artifact depends on previous one. The thing is that not in all the cases (especially on integration test servers) all the GSC instances are consumed.

It comes to the next situation. The first screenshot is for the "All" tab, the second is for my application. image description image description

My SLAs are similar:

<os-sla:sla cluster-schema="partitioned-sync2backup"
        number-of-instances="15" number-of-backups="1"
        max-instances-per-vm="1">
<os-sla:requirements>
    <os-sla:zone name="space" />
</os-sla:requirements>

</os-sla:sla> There are two zones: space and webapp do differ artifacts between machines. So, the count is:

  • element1 - (instance/backup) 1/1
  • element2 - 15/1
  • element3 - 1/1

They are doubled, so they are 2, 30 and 2 respectively.

As you can see there are 38 GSC instances, but only 30 are available for application. Why?

What happens during deploy? First element (2) are deploying, next element2 start (30). After that it stops. Element3 never starts to deploy.

In my main server, where I run gs.sh deploy-application in logs:

  2014-10-14 10:00:30,624 GSM INFO [com.gigaspaces.grid.gsm] - Registered GSC - [GSC pid[40047]    host[srv02234./10.3.18.53]], count [1]
 ....
 2014-10-14 10:00:41,733 GSM INFO [com.gigaspaces.grid.gsm] - Deploying element1
 2014-10-14 10:00:41,799 GSM INFO [com.gigaspaces.grid.gsm] - Registered GSC - [GSC pid[12243] host[srv02228./10.3.18.47]],    count [36]
 2014-10-14 10:00:41,963 GSM INFO [com.gigaspaces.grid.gsm.provision] - Attempting to allocate processing unit instance [element1.1 [1]] on [GSC pid[40047] host[srv02234./10.3.18.53]]
 2014-10-14 10:00:42,089 GSM INFO [com.gigaspaces.grid.gsm] - Registered GSC - [GSC pid[12312] host[srv02228./10.3.18.47]], count [37]
 2014-10-14 10:00:42,391 GSM INFO [com.gigaspaces.grid.gsm] - Registered GSC - [GSC pid[12311] host[srv02228./10.3.18.47]], count [38]

So, the first artifact is deployed before registering 3 GSC to go. Then:

  2014-10-14 10:00:43,249 GSM WARNING [com.gigaspaces.grid.gsm] - Pending allocation request for [element2.15 [2]] until an   available GSC is obtained, and all dependencies are met.
 2014-10-14 10:00:44,876 GSM INFO [com.gigaspaces.grid.gsm] - Deploying element3
 2014-10-14 10:00:44,882 GSM WARNING [com.gigaspaces.grid.gsm] - Could not meet required dependencies of: [element3.1 [1]] reasons: {waitForDeploymentToComplete=false, minimumNumberOfDeployedInstances=0, name=element2, minimumNumberOfDeployedInstancesPerPartition=1}

What is the problem? I have met no people facing the same problem.

The problem is that this exact configuration was working, when independent deploy line by line was used. Can using deploy-application create this problem?

Previous configuration - shell script with:

$GS -user $giga_deploy_user -password $giga_deploy_pwd deploy -properties embed:// ... -sla file://$TARGET_DIR/element1.xml $TARGET_DIR/element1.jar
$GS -user $giga_deploy_user -password $giga_deploy_pwd deploy -properties ... -sla ... .../element2.jar

and so on

asked 2014-10-23 09:38:45 -0600

Vlad Slepukhin gravatar image

updated 2014-10-24 11:02:09 -0600

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
0

Few things you should check:

  • ulimit settings no all machines. I suggest you increase it to 32000
  • LUS and GSM heap size. I suggest you increase these to 512m.
  • Set the LOOKUPLOCATORS variable to have the LUS host on all machines. You might have wrong multicast configuration that cause problems with the multicast lookup discovery on some machines.
  • Have a copy of all third party jars you use with the \gigaspaces-xap-premium\lib\platform\ext and remove these from the deployed war/jar files. These will speed up the deploy time and avoid expensive file transfer to each GSC work folder.
  • if you have anywhere within the pu.xml or client app reference to remote space URL make sure it is using the LUS - i.e. jini://LUS_HOST1,LUS_HOST2/*/spaceName
  • Make sure all machines hosts file correctly configured. Make sure all machines can ping each other and all ports are open.
  • Make sure all machines have their NIC_ADDR variable set to machine IP

If during the deploy time the machine running the LUS or GSM are 100% CPU utilized , you should lower the number of GSCs running on this machine.

See more: http://docs.gigaspaces.com/sbp/moving...

Please let me know if this was helpful.

Tnx Shay

answered 2014-10-23 12:58:40 -0600

shay hassidim gravatar image

updated 2014-10-27 08:23:21 -0600

edit flag offensive delete link more

Comments

Have you seen update on my post?

Vlad Slepukhin gravatar imageVlad Slepukhin ( 2014-10-27 02:05:40 -0600 )edit

so you are not having any problems when not using the deploy-application but use the deploy command instead?

shay hassidim gravatar imageshay hassidim ( 2014-10-27 08:24:03 -0600 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2014-10-23 09:38:45 -0600

Seen: 541 times

Last updated: Oct 27 '14