Mellanox NDR 交换机设置一分二模式

交换机信息

本次配置的是 MQM9790-NS2X_Ax 交换机,他有 32 个 OSFP 端口,一个 OSFP 端口相当于 2 个 NDR 端口,所以共有 64 个 NDR 端口,每个 NDR 端口 400 Gbps,设置一分二就是给 NDR 端口设置一分二。

设置一分二

设置前需要先安装 MLNX_OFED 驱动,驱动下载地址:https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/

这个交换机和 HDR 的不一样,HDR 交换机设置一分二是配置 SPLIT_MODE 为 1 就行,NDR 交换机不仅要设置 SPLIT_MODE 为 1,让 OSFP 一分二,然后再对 NDR 端口单独设置 SPLIT_2X

这次使用的 IB 线是 NDR 一分四的线,800 Gbps 分 4 个 200 Gbps,分别是 1-1、1-2、2-1 和 2-2。可以把一个 800 Gbps OSFP 口分成 4 个 200 Gbps 口来用。

在未设置 SPLIT_MODE 为 1 时,一分四的线只有 1-1 号线可用, SPLIT_MODE 设置为 1 后,1-1 和 2-1 可用。

将指定的端口设置 SPLIT_2X 后,1-1、1-2、2-1 和 2-2 四根线都可用。

查询交换机信息

[root@admin ~]# ibswitches
Switch  : 0x0000000000000000 ports 129 "Quantum-2 Mellanox Technologies" base port 0 lid 2 lmc 0

重点是 lid 2,这个是交换机的本地标识,后面会用到。

设置交换机为 SPLIT_MODE 模式

通过刚刚的 lid 2 来查询交换机的 SPLIT_MODE 模式。

[root@admin ~]# mlxconfig -d lid-2 q SPLIT_MODE

Device #1:
----------

Device type:        Quantum2
Name:               MQM9790-NS2X_Ax
Description:        Mellanox Quantum 2 based NDR InfiniBand Switch; 64 NDR ports; 32 OSFP ports; non-blocking switching capacity of 51.2Tbps; 2 Power Supplies (AC); Standard depth; Unmanaged; airflow;
Device:             lid-2

Configurations:                                          Next Boot
        SPLIT_MODE                                  NO_SPLIT_SUPPORT(0)

[root@admin ~]# flint -d lid-2 swreset
-I- Sending reset command to device lid-2 ...
-I- Reset command accepted by the device.

修改 SPLIT_MODE 参数为 1:

[root@admin ~]# mlxconfig -d lid-2 s SPLIT_MODE=1

Device #1:
----------

Device type:        Quantum2
Name:               MQM9790-NS2X_Ax
Description:        Mellanox Quantum 2 based NDR InfiniBand Switch; 64 NDR ports; 32 OSFP ports; non-blocking switching capacity of 51.2Tbps; 2 Power Supplies (AC); Standard depth; Unmanaged; airflow;
Device:             lid-2

Configurations:                                          Next Boot       New
        SPLIT_MODE                                  NO_SPLIT_SUPPORT(0)  SPLIT_2X(1)

重启交换机:

[root@admin ~]# flint -d lid-2 swreset
-I- Sending reset command to device lid-2 ...
-I- Reset command accepted by the device.

# 重启后查看模式修改是否成功
[root@admin ~]# mlxconfig -d lid-2 q SPLIT_MODE

Device #1:
----------

Device type:        Quantum2
Name:               MQM9790-NS2X_Ax
Description:        Mellanox Quantum 2 based NDR InfiniBand Switch; 64 NDR ports; 32 OSFP ports; non-blocking switching capacity of 51.2Tbps; 2 Power Supplies (AC); Standard depth; Unmanaged; airflow;
Device:             lid-2

Configurations:                                          Next Boot
        SPLIT_MODE                                  SPLIT_2X(1)

重启后交换机变为 SPLIT_2X 模式,但是还没完,还需要针对 64 个 NDR 端口继续做设置,需要哪个口一分二就设置哪个口。

设置 NDR 端口为 SPLIT_2X 模式

查询端口模式

以 19 号端口为例:

[root@admin ~]# mlxconfig -d lid-2 q SPLIT_PORT[19]

Device #1:
----------

Device type:        Quantum2
Name:               MQM9790-NS2X_Ax
Description:        Mellanox Quantum 2 based NDR InfiniBand Switch; 64 NDR ports; 32 OSFP ports; non-blocking switching capacity of 51.2Tbps; 2 Power Supplies (AC); Standard depth; Unmanaged; airflow;
Device:             lid-2

Configurations:                                          Next Boot
        SPLIT_PORT[19]                         NO_SPLIT(0)

修改端口模式

[root@admin ~]# mlxconfig -d lid-2 s SPLIT_PORT[19]=1

Device #1:
----------

Device type:        Quantum2
Name:               MQM9790-NS2X_Ax
Description:        Mellanox Quantum 2 based NDR InfiniBand Switch; 64 NDR ports; 32 OSFP ports; non-blocking switching capacity of 51.2Tbps; 2 Power Supplies (AC); Standard depth; Unmanaged; airflow;
Device:             lid-2

Configurations:                                          Next Boot       New
        SPLIT_PORT[19]                         NO_SPLIT(0)          SPLIT_2X(1)

设置完仍然需要 flint -d lid-2 swreset 重启交换机。

端口支持批量设置和查询:

# 批量设置
[root@admin ~]# mlxconfig -d lid-2 s SPLIT_PORT[20..25]=1

Device #1:
----------

Device type:        Quantum2
Name:               MQM9790-NS2X_Ax
Description:        Mellanox Quantum 2 based NDR InfiniBand Switch; 64 NDR ports; 32 OSFP ports; non-blocking switching capacity of 51.2Tbps; 2 Power Supplies (AC); Standard depth; Unmanaged; airflow;
Device:             lid-2

Configurations:                                          Next Boot       New
        SPLIT_PORT[20]                         NO_SPLIT(0)          SPLIT_2X(1)
        SPLIT_PORT[21]                         NO_SPLIT(0)          SPLIT_2X(1)
        SPLIT_PORT[22]                         NO_SPLIT(0)          SPLIT_2X(1)
        SPLIT_PORT[23]                         NO_SPLIT(0)          SPLIT_2X(1)
        SPLIT_PORT[24]                         NO_SPLIT(0)          SPLIT_2X(1)
        SPLIT_PORT[25]                         NO_SPLIT(0)          SPLIT_2X(1)

 Apply new Configuration? (y/n) [n] : y
Applying... Done!
-I- Please reboot machine to load new configurations.

# 批量查询
[root@admin ~]# mlxconfig -d lid-2 q SPLIT_PORT[20..25]

Device #1:
----------

Device type:        Quantum2
Name:               MQM9790-NS2X_Ax
Description:        Mellanox Quantum 2 based NDR InfiniBand Switch; 64 NDR ports; 32 OSFP ports; non-blocking switching capacity of 51.2Tbps; 2 Power Supplies (AC); Standard depth; Unmanaged; airflow;
Device:             lid-2

Configurations:                                          Next Boot
        SPLIT_PORT[20]                         SPLIT_2X(1)
        SPLIT_PORT[21]                         SPLIT_2X(1)
        SPLIT_PORT[22]                         SPLIT_2X(1)
        SPLIT_PORT[23]                         SPLIT_2X(1)
        SPLIT_PORT[24]                         SPLIT_2X(1)
        SPLIT_PORT[25]                         SPLIT_2X(1)

查看设置一分二之后的连接信息

[root@admin ~]# iblinkinfo
CA: comput1 HCA-1:
      0x0000000000000000      5    1[  ] ==( 2X        106.25 Gbps Active/  LinkUp)==>       2   22[  ] "Quantum-2 Mellanox Technologies" (Could be 4X )
CA: Mellanox Technologies Aggregation Node:
      0x0000000000000000      3    1[  ] ==( 4X        106.25 Gbps Active/  LinkUp)==>       2   65[  ] "Quantum-2 Mellanox Technologies" ( )
Switch: 0x0000000000000000 Quantum-2 Mellanox Technologies:
           2    1[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           2    2[  ] ==(                Down/ Polling)==>             [  ] "" ( )
...output omitted...
           2   20[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           2   21[  ] ==( 2X        106.25 Gbps Active/  LinkUp)==>       1    1[  ] "admin HCA-1" (Could be 4X )
           2   22[  ] ==( 2X        106.25 Gbps Active/  LinkUp)==>       5    1[  ] "comput1 HCA-1" (Could be 4X )
           2   23[  ] ==(                Down/ Polling)==>             [  ] "" ( )
...output omitted...
           2   63[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           2   64[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           2   65[  ] ==( 4X        106.25 Gbps Active/  LinkUp)==>       3    1[  ] "Mellanox Technologies Aggregation Node" ( )
CA: admin HCA-1:
      0x0000000000000000      1    1[  ] ==( 2X        106.25 Gbps Active/  LinkUp)==>       2   21[  ] "Quantum-2 Mellanox Technologies" (Could be 4X )
Mellanox NDR 交换机设置一分二模式
https://www.linuxstudynotes.com/2025/03/07/linux/mellanox-ndr-%e4%ba%a4%e6%8d%a2%e6%9c%ba%e8%ae%be%e7%bd%ae%e4%b8%80%e5%88%86%e4%ba%8c%e6%a8%a1%e5%bc%8f/
暂无评论

发送评论 编辑评论


				
|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠( ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ °Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
颜文字
Emoji
小恐龙
花!
上一篇
下一篇