V100双卡显卡坞

硬件安装

V100显卡坞>SFF 8654线>pcie转换卡>PCIE显卡坞>oculink>max2笔记本

驱动安装

nvidia官方 582.16-data-center-tesla-desktop-win10-win11-64bit-dch-international.exe

硬件识别状态

(py312torch)PS C:\windows\system32> Get-PnpDevice -PresentOnly | Where-Object { ($_.HardwareID -match "VEN_10DE" -or $_.HardwareID -match "VEN_10B5") -and $_.FriendlyName -notmatch "Virtual|Display Adapter" } | Select-Object Status, Class, FriendlyName, HardwareID | Format-Table -AutoSize

Status Class   FriendlyName                HardwareID
------ -----   ------------                ----------
Error          基本系统设备                {PCI\VEN_10B5&DEV_87D0&SUBSYS_87D010B5&REV_CA, PCI\VEN_10B5&DEV_87D0&SUBSYS_87D010B5, PCI\VEN_10B5&DEV_87D0&CC_088000, PCI\VEN_10B5&DEV_87D0&CC_0880}
Error          基本系统设备                {PCI\VEN_10B5&DEV_87D0&SUBSYS_87D010B5&REV_CA, PCI\VEN_10B5&DEV_87D0&SUBSYS_87D010B5, PCI\VEN_10B5&DEV_87D0&CC_088000, PCI\VEN_10B5&DEV_87D0&CC_0880}
Error          基本系统设备                {PCI\VEN_10B5&DEV_87D0&SUBSYS_87D010B5&REV_CA, PCI\VEN_10B5&DEV_87D0&SUBSYS_87D010B5, PCI\VEN_10B5&DEV_87D0&CC_088000, PCI\VEN_10B5&DEV_87D0&CC_0880}
Error          基本系统设备                {PCI\VEN_10B5&DEV_87D0&SUBSYS_87D010B5&REV_CA, PCI\VEN_10B5&DEV_87D0&SUBSYS_87D010B5, PCI\VEN_10B5&DEV_87D0&CC_088000, PCI\VEN_10B5&DEV_87D0&CC_0880}
OK     System  PCI Express 上游交换机端口  {PCI\VEN_10B5&DEV_8749&SUBSYS_874910B5&REV_CA, PCI\VEN_10B5&DEV_8749&SUBSYS_874910B5, PCI\VEN_10B5&DEV_8749&CC_060400, PCI\VEN_10B5&DEV_8749&CC_0604}
OK     System  PCI Express 下游交换机端口  {PCI\VEN_10B5&DEV_8749&SUBSYS_874910B5&REV_CA, PCI\VEN_10B5&DEV_8749&SUBSYS_874910B5, PCI\VEN_10B5&DEV_8749&CC_060400, PCI\VEN_10B5&DEV_8749&CC_0604}
OK     System  PCI Express 下游交换机端口  {PCI\VEN_10B5&DEV_8749&SUBSYS_874910B5&REV_CA, PCI\VEN_10B5&DEV_8749&SUBSYS_874910B5, PCI\VEN_10B5&DEV_8749&CC_060400, PCI\VEN_10B5&DEV_8749&CC_0604}
OK     Display NVIDIA Tesla V100-SXM2-16GB {PCI\VEN_10DE&DEV_1DB1&SUBSYS_121210DE&REV_A1, PCI\VEN_10DE&DEV_1DB1&SUBSYS_121210DE, PCI\VEN_10DE&DEV_1DB1&CC_030200, PCI\VEN_10DE&DEV_1DB1&CC_0302}
OK     Display NVIDIA Tesla V100-SXM2-16GB {PCI\VEN_10DE&DEV_1DB1&SUBSYS_121210DE&REV_A1, PCI\VEN_10DE&DEV_1DB1&SUBSYS_121210DE, PCI\VEN_10DE&DEV_1DB1&CC_030200, PCI\VEN_10DE&DEV_1DB1&CC_0302}

检测监控

  • 显卡信息:nvidia-smi,nvidia-smi -l 1
  • 显存ECC:nvidia-smi -q -d ECC
  • 显卡资源:nvidia-smi dmon
  • 实时监控:nvitop

算力应用

  • torch运算
  • lm studio运行大模型提供推理api

window重启后不识别问题修复

  • 需要在重启之前,删除设备即可
  • 硬件管理看不到,但是系统信息里还是有的
  • 可以增加一个ps1脚本,加入到关机执行脚本里: gpedit.msc > 计算机配置-windows设置-脚本(启动/关机)>双击 关机,添加进去

    # 请求管理员权限
    if (-not ([Security.Principal.WindowsPrincipal][Security.Principal.WindowsIdentity]::GetCurrent()).IsInRole([Security.Principal.WindowsBuiltInRole]::Administrator)) {
        Start-Process PowerShell -ArgumentList "-NoProfile -ExecutionPolicy Bypass -File `"$PSCommandPath`"" -Verb RunAs
        exit
    }
    $device_name='NVIDIA Tesla V100-SXM2-16GB'
    
    $devices=Get-PnpDevice -friendlyname $device_name |select-object -Expandproperty InstanceId
    
    if (-not $devices) {
      Write-Host "没有找到显示适配器设备。"
    }
    
    # 删除设备
    foreach ($device in $devices) {
        try {
            pnputil.exe /remove-device $device
            Write-Host "已删除设备: $($device.FriendlyName)"
        } catch {
            Write-Host "删除设备 $($device.FriendlyName) 失败: $_" -ForegroundColor Red
        }
    }
    
    Write-Host "操作完成。"