Hi All,
VSAN rigorous testing.I have some doubts again,Could some one please clarify my doubts if possible?
My setup is as follows:-
3 ESXi servers with 6.7.0
7 test windows/Linux VMs are running including VCSA 6.7.0
VSAN version is 6.7.0 with allflash configurations.
DG---->2* 2TB SSD disk/per ESXi host(each ESXi host has one disk group and 1* 2TB is going for cache tier and another 1* 2TB is going for capacity tier).Totally 3 ESXi servers 6 TB-->cache tier and another 6 TB--->capacity Tier
In storage policy:- FTT is set as 1 and selected RAID 1 configuration.
Last week testing scenario is as follows:-
* Removed 2 TB(cache disk) from first ESXi server and inserted 1 TB cache disk(non-uniform configuration).After inserting non-uniform 1 TB SSD disk and DG configuration,we saw the object sync was in progress,however the 1 TB disk shown as unhealth.
we dont know what happened to that 1 TB SSD disk(new cache disk).Soon after,the VCSA appliance stopped responding.While taking 1st ESXi reboot,all the VMs running shown as invalid.
* While checking the vm status(get all vms) from cli,i shown as skipping.
# vim-cmd vmsvc/getallvms
Skipping invalid VM '24'
Skipping invalid VM '25'
Skipping invalid VM '28'
We dont know what was the cause behind this and soon after all the VMs went invalid/inaccessible(after ESXi reboot).This is due to non-uniform configuration of cache disk or not ? Im not pretty sure.
This is the logs of ruby(VCSA) and We could not recover any VMs.
/localhost/VC-VSAN-310/computers> vsan.check_state 0
2018-12-14 08:58:54 +0000: Step 1: Check for inaccessible vSAN objects
Detected 41 objects to be inaccessible
vsan.check_state 0 -r
2018-12-14 09:02:22 +0000: Step 1: Check for inaccessible vSAN objects
Detected 2cc80f5c-f242-c40e-fb39-d4ae52886942 to be inaccessible, refreshing state
Detected 0f9f095c-6e37-8624-d45a-d4ae52886942 to be inaccessible, refreshing state
Detected 2cc80f5c-1a5b-1228-c0ee-d4ae52886942 to be inaccessible, refreshing state
Detected 986a095c-7ce5-572b-f5d5-d4ae52886942 to be inaccessible, refreshing state
After this issue,we have built a new VSAN setup freshly and like to test again with same 7 test VMs.FTT-1 and RAID 1 storage policy(3 ESXi servers only).
1) Remove one cache SSD disk from first ESXi host and will try to see the impact.By default,we have enabled dedupe and compression.I like to know what will happen if we remove any SSD disk(either cache or capacity from first ESXi server)?
I guess there should not be any impact to all the running virtual machines. Am i correct ? If i use de-duplication and compression on 3 node on all flash cluster,will there be any impact?
I saw this community from Bob.
https://communities.vmware.com/thread/577526
Any feedback or suggestions?
Thanks,
Manivel RR