Uploaded image for project: 'Marathon'
  1. Marathon
  2. MARATHON-7155

can't able to run new tasks when leading mesos masters from machine m1 and leading marathon from m2 machine,

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Cannot Reproduce
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Leader Election

      Description

      HI I've test environment having 3 masters and 2 slaves those are like m1, m2, m3 and s1, s2 respectively.

      I want to achieve HA for mesos and marathon with help of zookeeper.

      Installation :

      OS: RHEL 7.2 

      Type : VM's Virtual Box

      I've installed mesos, marathon and zookeeper in offline mode. I,e

      Mesos : Downloaded mesos binaries and extract rpm packages.

      Marathon and Zookeeper : Downloaded tar.gz file and extracted using binaries.

      3 masters : m1, m2, m3

      2 slaves : s1, s2

      Staring zookeeper 

      Started zookeeper first in masters i.e m1, m2, m3 one chosen as leader ex: m1 -> leader,

      m2-> follower,  m3->follower.

      zoo.cfg 

       

      tickTime=2000
      initLimit=10
      syncLimit=5
      dataDir=/opt/ncms/zkWorkDir
      clientPort=2181 
      server.1=192.168.1.36:2888:3888
      server.3=192.168.1.42:2888:3888
      server.5=192.168.1.45:2888:3888
      

       

      Starting Mesos masters and slaves:

       Executed mesos binary with options in leading master i.e m1. and then started mesos in followers also i.e m2, m3.

       

      m1: mesos-master --ip=192.168.1.36 --hostname=192.168.1.36 --quorum=2 --cluster=testcluster --zk=zk://192.168.1.36:2181,192.168.1.42:2181,192.168.1.45:2181/mesos --work_dir=/opt/mesosWorkDir --log_dir=/opt/mesosWorkDir/logs 
      m2: mesos-master --ip=192.168.1.42 --hostname=192.168.1.42 --quorum=2 --cluster=testcluster --zk=zk://192.168.1.36:2181,192.168.1.42:2181,192.168.1.45:2181/mesos --work_dir=/opt/mesosWorkDir --log_dir=/opt/mesosWorkDir/logs 
      m3: mesos-master --ip=192.168.1.45 --hostname=192.168.1.45 --quorum=2 --cluster=testcluster --zk=zk://192.168.1.36:2181,192.168.1.42:2181,192.168.1.45:2181/mesos --work_dir=/opt/mesosWorkDir --log_dir=/opt/mesosWorkDir/logs 
      s1 :
      mesos-slave --master=zk://192.168.1.36:2181,192.168.1.42:2181,192.168.1.45:2181/mesos --ip=192.168.1.53 --containerizers=docker,mesos --hostname=192.168.1.53 executor_registration_timeout=10minss1 :
      
      s2 :
      mesos-slave --master=zk://192.168.1.36:2181,192.168.1.42:2181,192.168.1.45:2181/mesos --ip=192.168.1.53 --containerizers=docker,mesos --hostname=192.168.1.53 executor_registration_timeout=10mins
       
      

       

      Starting Marathon : 

      Started Marathon in leading machine i.e m1 and then started marathon in remaining machines i.e m2 and m3.

       

      ./start --master=zk://192.168.1.36:2181,192.168.1.42:2181,192.168.1.45:2181/mesos --zk zk://192.168.1.36:2181,192.168.1.42:2181,192.168.1.45:2181/marathon
      

       

      Now Cluster state is like below

      m1–> leading mesos master, leading marathon master.

      m2-> non-leading mesos master, non leading marathon

      m3-> non-leading mesos master, non leading marathon

      and slave1, slav2.

       

      I created some sample applications(t1, t2) via marathon from m1 and Able to run successfully.

      When I power off m1 vm, then m2 took leading for mesos master and m3 took leading for marathon Cluster state is like below.

       m1-> Power off(Unavaliable)

      m2-> leading mesos master, non-leading marathon

      m3->non-leading mesos master, leading marathon.

      I tried to create and run sample app(t3) via marathon and task status is went for "Waiting" status forever. i.e can't able run but previous task running i.e t1 and t2. 

      Questions 

       Mesos leading from one machine and marathon leading from another machine is expected behaviour? If yes why can't able to run new task from marathon and how can we run.

       Will it happen like this.i.e choosing masters from different machines for leading mesos and marathon?

      Am I doing correct of in all the config.?

       

       I did like this for five times, 2 to 3 times happened like this. other cases was choosing mesos and marathon from same machine.

       

       

        Attachments

        1. image-2017-03-26-15-48-54-786.png
          168 kB
          naren970
        2. marathon.log
          539 kB
          naren970
        3. marathon.log
          59 kB
          naren970
        4. marathon.log
          17 kB
          naren970
        5. mesos.log
          46 kB
          naren970
        6. mesos.log
          215 kB
          naren970
        7. mesos.log
          624 kB
          naren970
        8. mesos-master.ERROR
          4 kB
          naren970
        9. mesos-master.ERROR
          29 kB
          naren970
        10. mesos-master.INFO
          30 kB
          naren970
        11. mesos-master.INFO
          207 kB
          naren970
        12. mesos-master.INFO
          608 kB
          naren970
        13. mesos-master.WARNING
          1 kB
          naren970
        14. mesos-master.WARNING
          43 kB
          naren970
        15. mesos-master.WARNING
          319 kB
          naren970

          Activity

            People

            • Assignee:
              matthias Matthias Veit (Inactive)
              Reporter:
              naren970 naren970
              Team:
              Orchestration Team
              Watchers:
              Matthias Veit (Inactive), naren970
            • Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: