Uploaded image for project: 'Marathon'
  1. Marathon
  2. MARATHON-7155

can't able to run new tasks when leading mesos masters from machine m1 and leading marathon from m2 machine,

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Cannot Reproduce
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Leader Election

      Description

      HI I've test environment having 3 masters and 2 slaves those are like m1, m2, m3 and s1, s2 respectively.

      I want to achieve HA for mesos and marathon with help of zookeeper.

      Installation :

      OS: RHEL 7.2 

      Type : VM's Virtual Box

      I've installed mesos, marathon and zookeeper in offline mode. I,e

      Mesos : Downloaded mesos binaries and extract rpm packages.

      Marathon and Zookeeper : Downloaded tar.gz file and extracted using binaries.

      3 masters : m1, m2, m3

      2 slaves : s1, s2

      Staring zookeeper 

      Started zookeeper first in masters i.e m1, m2, m3 one chosen as leader ex: m1 -> leader,

      m2-> follower,  m3->follower.

      zoo.cfg 

       

      tickTime=2000
      initLimit=10
      syncLimit=5
      dataDir=/opt/ncms/zkWorkDir
      clientPort=2181 
      server.1=192.168.1.36:2888:3888
      server.3=192.168.1.42:2888:3888
      server.5=192.168.1.45:2888:3888
      

       

      Starting Mesos masters and slaves:

       Executed mesos binary with options in leading master i.e m1. and then started mesos in followers also i.e m2, m3.

       

      m1: mesos-master --ip=192.168.1.36 --hostname=192.168.1.36 --quorum=2 --cluster=testcluster --zk=zk://192.168.1.36:2181,192.168.1.42:2181,192.168.1.45:2181/mesos --work_dir=/opt/mesosWorkDir --log_dir=/opt/mesosWorkDir/logs 
      m2: mesos-master --ip=192.168.1.42 --hostname=192.168.1.42 --quorum=2 --cluster=testcluster --zk=zk://192.168.1.36:2181,192.168.1.42:2181,192.168.1.45:2181/mesos --work_dir=/opt/mesosWorkDir --log_dir=/opt/mesosWorkDir/logs 
      m3: mesos-master --ip=192.168.1.45 --hostname=192.168.1.45 --quorum=2 --cluster=testcluster --zk=zk://192.168.1.36:2181,192.168.1.42:2181,192.168.1.45:2181/mesos --work_dir=/opt/mesosWorkDir --log_dir=/opt/mesosWorkDir/logs 
      s1 :
      mesos-slave --master=zk://192.168.1.36:2181,192.168.1.42:2181,192.168.1.45:2181/mesos --ip=192.168.1.53 --containerizers=docker,mesos --hostname=192.168.1.53 executor_registration_timeout=10minss1 :
      
      s2 :
      mesos-slave --master=zk://192.168.1.36:2181,192.168.1.42:2181,192.168.1.45:2181/mesos --ip=192.168.1.53 --containerizers=docker,mesos --hostname=192.168.1.53 executor_registration_timeout=10mins
       
      

       

      Starting Marathon : 

      Started Marathon in leading machine i.e m1 and then started marathon in remaining machines i.e m2 and m3.

       

      ./start --master=zk://192.168.1.36:2181,192.168.1.42:2181,192.168.1.45:2181/mesos --zk zk://192.168.1.36:2181,192.168.1.42:2181,192.168.1.45:2181/marathon
      

       

      Now Cluster state is like below

      m1–> leading mesos master, leading marathon master.

      m2-> non-leading mesos master, non leading marathon

      m3-> non-leading mesos master, non leading marathon

      and slave1, slav2.

       

      I created some sample applications(t1, t2) via marathon from m1 and Able to run successfully.

      When I power off m1 vm, then m2 took leading for mesos master and m3 took leading for marathon Cluster state is like below.

       m1-> Power off(Unavaliable)

      m2-> leading mesos master, non-leading marathon

      m3->non-leading mesos master, leading marathon.

      I tried to create and run sample app(t3) via marathon and task status is went for "Waiting" status forever. i.e can't able run but previous task running i.e t1 and t2. 

      Questions 

       Mesos leading from one machine and marathon leading from another machine is expected behaviour? If yes why can't able to run new task from marathon and how can we run.

       Will it happen like this.i.e choosing masters from different machines for leading mesos and marathon?

      Am I doing correct of in all the config.?

       

       I did like this for five times, 2 to 3 times happened like this. other cases was choosing mesos and marathon from same machine.

       

       

        Attachments

        1. mesos-master.WARNING
          319 kB
        2. mesos-master.WARNING
          43 kB
        3. mesos-master.WARNING
          1 kB
        4. mesos-master.INFO
          608 kB
        5. mesos-master.INFO
          207 kB
        6. mesos-master.INFO
          30 kB
        7. mesos-master.ERROR
          29 kB
        8. mesos-master.ERROR
          4 kB
        9. mesos.log
          624 kB
        10. mesos.log
          215 kB
        11. mesos.log
          46 kB
        12. marathon.log
          17 kB
        13. marathon.log
          59 kB
        14. marathon.log
          539 kB
        15. image-2017-03-26-15-48-54-786.png
          image-2017-03-26-15-48-54-786.png
          168 kB

          Activity

            People

            • Assignee:
              matthias Matthias Veit (Inactive)
              Reporter:
              naren970 naren970
              Team:
              Orchestration Team
              Watchers:
              Matthias Veit (Inactive), naren970
            • Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: