Trouble Shooting
-
What are the resolutions when PCI out of resource error for the server BIOS occurs?
When multiple PCIe adapters (e.g. GPU) are installed, the following errors could occur during POST, and the server host halts - PCI out of resource or Insufficient PCI Resources Detected.
To resolve the issue, please follow the following steps:
For Intel Xeon Phi Server
1. Temporarily remove the mini-SAS HD cable of Falcon 4010 (4210).
2. Update the BIOS and firmware to the latest version.
3. Disable any unused devices and Option ROMs in the BIOS.
- For onboard SATA/SAS controllers, go to Advanced > Mass Storage Controller Configuration
- For onboard NICs, go to Advanced > NIC Configuration.
4. Go to Advanced > PCI Configuration.
- Set Maximize Memory below 4 GB to Disabled
- Set Memory Mapped I/O above 4 GB to Enabled.
- Set Memory Mapped I/O Size to 512 G or higher.
5. Connect the mini-SAS HD cable of Falcon 4010 and see if the server host boots properly.
For Supermicro Xeon Phi Server
1. Temporarily remove the Mini-SAS HD cable of Falcon 4010 (4210).
2. Go to the BIOS Advanced
- Advanced->PCIe/PCI/PnP configuration-> Above 4G Decoding = Enabled
- Advanced->PCIe/PCI/PnP Configuration->MMIOH Base = 56T
- Advanced->PCIe/PCI/PnP Configuration->MMIO High Size = 512G or higher
3. Connect the Mini-SAS HD cable of Falcon 4010 (4210) and see if the server host boots properly.
-
What should I do if Falcon chassis fails to connect to a 10M/100M switch?
Falcon 4010 and 4210 NIC ports are compliant with IEEE 802.3ab (1000Base-T) standards only. Failed connection could be expected.
-
How to resolve GPU peer-to-peer underperforming issue?
1. Make sure that your GPU model supports peer-to-peer function.
2. Disable the PCI Access Control Services (ACS) from host side. (See descriptions below)
IO virtualization (VT-d for Intel platform, or IOMMU for AMD platform) can interfere with GPU Direct by redirecting all PCI point-to-point traffic to the CPU root complex, causing a significant performance reduction or even a hang. You can check whether ACS is enabled on PCI bridges by executing following commands:
# sudo lspci -vvv | grep ACSCtl
If it shows “SrcValid+”, then ACS might be enabled. Looking at the full output of lspci, one can check if a PCI bridge has ACS enabled.
If PCI switches have ACS enabled, it needs to be disabled. On some systems this can be done from the BIOS by disabling IO virtualization or VT-d and ACS.
Disabling IO virtualization:
Host BIOS > IO or Advanced
Disable VT for Direct IO (VT-d) for Intel platforms.
Disable IOMMU for AMD platforms.
Other platforms may have different name for the IO virtualization function. Please ask your server vendor if the function cannot be found.
-
Failure to assign/remove a device.
Make sure that the device is on the compatible list.
Wait for a minute then retry assigning/ removing the device.
Make sure the device is in good condition.
- PCIe power cable is properly connected to the device.
- the device is properly plugged into the PCIe slot.
- Clean the PCIe slot and gold finger of the device.
- Run a device power-cycle
Make sure that the host is properly linked to the Falcon chassis
- the mini-SAS HD cables are properly connected.
- The HBA is properly installed.
- reboot the host machine.
Retry assigning/removing the device.
If it still fails, try rebooting the whole system.
-
Failure to access GUI
1. Make sure that the management port is connected to your network.
2. Make sure that the client and the Falcon system are under the same domain.
- If the LCD on the chassis is functioning, please check your network.
- If the LCD is not functioning, the BMC of Falcon system may have hanged, try rebooting the system
3. If you forget the IP address of Falcon GPU system or GUI log-in identity
- Check the LCD on the chassis for IP address.
- If that does not help, reset Falcon GPU system to default
-
Device link down issue
Please check if the device is on the compatible list of your Falcon GPU solution model.
If so, try rebooting the Falcon GPU system.
-
Information does not display properly on GUI
- Try refreshing the page.
- Update the browser to the latest version.
- If the above steps do not fix the issue, try rebooting the Falcon GPU system.
-
Host link down issue
Host link down can happen due to improper cable connection or incorrect boot sequence.
Please check the connection of mini-SAS HD cables on both host adapter and Falcon chassis. (make sure all the cables are properly plugged into the connectors.)
Booting sequence:
- Boot up Falcon GPU system. When the system is ready, the LCD should display "model name" and "IP address".
- Boot up the host machine(s) only when Falcon GPU system is ready.