Autor: admin

Nutanix

Nutanix: AHV(Hypervisor) Upgrade stuck on Nutanix CE Cluster

Hello everybody,

in my LAB CE Cluster we run in following problem during Upgrade our AHV Version from 2018.01.31 to 2018.05.01.

We start the upgrade over the Upgrade Wizard from Prism… so far so good

but also after hours the AHV Upgrade are stuck..

after some genesis and cluster restarts the Upgrade failed with the error „Failed to revoke token from „x.x.x.x“, taken for reason…“

so we try again to run the AHV Upgrade over the „Upgrade Software“ Wizard from Nutanix, but again it stucks but on a another Subtask..

we have to dig deeper and we found out that we have a host that are in the maintenance mode.

to list all host we use the command „acli host.list“ or you use first only „acli“ to enter in the „acropolis“ cli and use only the „host.list“ command. (quicktipp in the acropolis cli you can use the tab 😉 )

with the command „host.exit_maintenance_mode x.x.x.x“ the host should exit the maintenance mode, but not in our case 🙁

we don´t find any solution also in the web, nobody have the same problem.. 🙁 so we decided to try a manually upgrade of the AHV Version with a new USB Stick Image -> the only way we have, to upgrade our AHV Version without data loss.

IMPORTANT: Our AOS Version was already at the 2018.05.01 Version. Only Upgrade your USB Image manually when you have the same AOS Version.

Because our CVM´s are on the SSD Storage no data or configuration should be lost after we upgrade manually the AHV Version. So i prepared three new USB-Stick with the new 2018.05.01 Version.

We shutdown our first CVM with the command „shutdown -h now“ and check the status on the AHV with the command „virsh list“ after a while the cvm are stopped. (sorry for the bad picture quality i was on the physical console)

after the CVM is stopped we change the USB-Stick and starting the install process from Nutanix CE. At the Point with the Install Options we choose the option „Repair Host (All data preserved)“

a few minutes later our new AHV are finish and the „old“ CVM is running and after a check all hosts are not longer in the maintenance mode and uptodate. 🙂

 

 

 

But wait.. our shitty Task are still at the Recent Task list.. 🙁

To delete this task follow the KB1217 Article from Nutanix.

On a CVM with the command „progress_monitor_cli -fetchall| egrep „entity_id|entity_type|operation““ you list only the important information that you need to delete the tasks.

With the command „progress_monitor_cli –entity_id=“6″ –entity_type=node –operation=upgrade_hypervisor -delete“ i delete every single task. (quicktip: write the operation,entitytype only lowercase)

 

And Done! We hope this Post will help someone of you 😉

Greetz diekolbs

Nutanix

Nutanix: Task in Prism is hanging

Nutanix: Task in Prism is hanging

 

**UPDATE: In the current AOS release, the task.update command has been removed / moved because the command is risky. Write me a PM when you need more Informations.

Hello everybody,

last time at the preparation of a customer Nutanix cluster we run in following problem. A prism task (in my case a reboot of the HV) was hanging over 48 hours.

 

Also after a restart of the complete cluster the task appear in prism. So we started to search for a soulution and find a KB Article in the Nutanix portal.

We tried the commands from the KB Article but without success. The task in prism remains.

So we tried some other commands and vollá our problem were solved. But how? Which commands we had used?

With the first command we checked wich task are in the running queue.

acli task.list limit=2500| grep -i running

After we identified our hanging task we can cancel the task. In our case it had the following id „ff1fa263-ca02-455e-b8e6-101a0532dd0e“. Now we can set the status of the task to aborted with „status=kAborted“.

acli task.update task_list=ff1fa263-ca02-455e-b8e6-101a0532dd0e status=kAborted

At last a short check with following command.

acli task.list limit=2500| grep -i running

Now the task should gone. 😉

This solved our problem, we hope this will help someone of you 😉

Greetz diekolbs

**UPDATE: In the current AOS release, the task.update command has been removed / moved because the command is risky. Write me a PM when need more Informations.