什么是标签匹配接口?
What is Tag-matching interface?
听说PSM是一个支持标签匹配的库。
什么是标签匹配接口?为什么标签匹配对于 MPI 上下文中的性能很重要?
MPI 标签匹配的简短介绍:https://www.hpcwire.com/2006/08/18/a_critique_of_rdma-1/ 部分 "Matching"
MPI is a two-sided interface with a large matching space: a MPI Recv is associated with a MPI Send according to several criteria such as Sender, Tag, and Context, with the first two possibly ignored (wildcard). Matching is not necessarily in order and, worse, a MPI Send can be posted before the matching MPI Recv ... MPI requires 64 bits of matching information, and MX, Portals, and QsNet provide such a matching capability.
InfiniBand Verbs and other RDMA-based APIs do not support matching at all
因此,听起来 PSM 是包含与 Infiniband 式网络适配器的快速匹配的方法(第一个版本具有软件匹配,但有可能将部分匹配转移到硬件)。
我找不到 public PSM 文档(用户指南 http://storusint.com/pdf/qlogic/InfiniPath%20User%20Guide%202_0.pdf 中没有详细信息)。
但是有库的来源:https://github.com/01org/psm
PSM2 演示文稿中列出了一些详细信息https://www.openfabrics.org/images/eventpresos/2016presentations/304PSM2Features.pdf
What is PSM?
Matched Queue (MQ) component
- • Semantically matched to the needs of MPI using tag matching
- • Provides calls for communication progress guarantees
- • MQ completion semantics (standard vs. synchronized)
PSM API
- • Global tag matching API with 64-bit tags
- • Scale up to 64K processes per job
- • MQ APIs provide point-to-point message passing between endpoints
- • e.g. psm_mq_send, psm_mq_irecv
- • No “recvfrom” functionality – needed by some applications
所以,有 64 位标签。每条消息都有一个标签,Matched Queue 有标签(在一些标签匹配实现中也有标签掩码)。根据来源psm_mq_internal.h: mq_req_match()
https://github.com/01org/psm/blob/67c0807c74e9d445900d5541358f0f575f22a630/psm_mq_internal.h#L381, PSM中有mask:
typedef struct psm_mq_req {
...
/* Tag matching vars */
uint64_t tag;
uint64_t tagsel; /* used for receives */
...
} psm_mq_req_t;
mq_req_match(struct mqsq *q, uint64_t tag, int remove)
)
{
psm_mq_req_t *curp;
psm_mq_req_t cur;
for (curp = &q->first; (cur = *curp) != NULL; curp = &cur->next) {
if (!((tag ^ cur->tag) & cur->tagsel)) { /* match! */
if (remove) {
if ((*curp = cur->next) == NULL) /* fix tail */
q->lastp = curp;
cur->next = NULL;
}
return cur;
}
}
因此,匹配是指传入标签与接收的 tag
进行异或,发布到 MQ,结果与接收的 tagsel
相乘。如果在这些操作之后只有零位,则找到匹配项,否则处理下一个接收。
评论来自 psm_mq.h
、psm_mq_irecv()
函数、https://github.com/01org/psm/blob/4abbc60ab02c51efee91575605b3430059f71ab8/psm_mq.h#L206
/* Post a receive to a Matched Queue with tag selection criteria
*
* Function to receive a non-blocking MQ message by providing a preposted
* buffer. For every MQ message received on a particular MQ, the tag and @c
* tagsel parameters are used against the incoming message's send tag as
* described in tagmatch.
*
* [in] mq Matched Queue Handle
* [in] rtag Receive tag
* [in] rtagsel Receive tag selector
* [in] flags Receive flags (None currently supported)
* [in] buf Receive buffer
* [in] len Receive buffer length
* [in] context User context pointer, available in psm_mq_status_t
* upon completion
* [out] req PSM MQ Request handle created by the preposted receive, to
* be used for explicitly controlling message receive
* completion.
*
* [post] The supplied receive buffer is given to MQ to match against incoming
* messages unless it is cancelled via psm_mq_cancel @e before any
* match occurs.
*
* The following error code is returned. Other errors are handled by the PSM
* error handler (psm_error_register_handler).
*
* [retval] PSM_OK The receive buffer has successfully been posted to the MQ.
*/
psm_error_t
psm_mq_irecv(psm_mq_t mq, uint64_t rtag, uint64_t rtagsel, uint32_t flags,
void *buf, uint32_t len, void *context, psm_mq_req_t *req);
将数据编码到标签中的示例:
* uint64_t tag = ( ((context_id & 0xffff) << 48) |
* ((my_rank & 0xffff) << 32) |
* ((send_tag & 0xffffffff)) );
使用 tagsel
掩码我们可以同时编码 "match everything"、"match tags with some bytes or bits equal to value, and anything in other"、"match exactly".
有更新的 PSM2 API,也是开源的 - https://github.com/01org/opa-psm2, programmer's guide published at http://www.intel.com/content/dam/support/us/en/documents/network/omni-adptr/sb/Intel_PSM2_PG_H76473_v1_0.pdf。
在PSM2中标签较长,定义了匹配规则(stag为"Message Send Tag"-消息中发送的标签值,rtag为接收请求的标签):
https://www.openfabrics.org/images/eventpresos/2016presentations/304PSM2Features.pdf#page=7
Tag matching improvement
- • Increased tag size to 96 bits
- • Fundamentally
((stag ^ rtag) & rtagsel) == 0
- • Supports wildcards such as
MPI_ANY_SOURCE
or MPI_ANY_TAG
using zero bits in rtagsel
- • Allows for practically unlimited scalability
- • Up to 64M processes per job
PSM2 TAG MATCHING
#define PSM_MQ_TAG_ELEMENTS 3
typedef
struct
psm2_mq_tag {
union {
uint32_t tag[PSM_MQ_TAG_ELEMENTS] __attribute__((aligned(16)));
struct {
uint32_t tag0;
uint32_t tag1;
uint32_t tag2;
};
};
} psm2_mq_tag_t;
- • Application fills ‘tag’ array or ‘tag0/tag1/tag2’ and passes to PSM
- • Both tag and tag mask use the same 96 bit tag type
实际上 psm2_mq_req
结构中的匹配变量附近有源对等地址:https://github.com/01org/opa-psm2/blob/master/psm_mq_internal.h#L180
/* Tag matching vars */
psm2_epaddr_t peer;
psm2_mq_tag_t tag;
psm2_mq_tag_t tagsel; /* used for receives */
和软件列表扫描匹配,mq_list_scan()
从 mq_req_match()
https://github.com/01org/opa-psm2/blob/85c07c656198204c4056e1984779fde98b00ba39/psm_mq_recv.c#L188:
调用
psm2_mq_req_t
mq_list_scan(struct mqq *q, psm2_epaddr_t src, psm2_mq_tag_t *tag, int which, uint64_t *time_threshold)
{
psm2_mq_req_t *curp, cur;
for (curp = &q->first;
((cur = *curp) != NULL) && (cur->timestamp < *time_threshold);
curp = &cur->next[which]) {
if ((cur->peer == PSM2_MQ_ANY_ADDR || src == cur->peer) &&
!((tag->tag[0] ^ cur->tag.tag[0]) & cur->tagsel.tag[0]) &&
!((tag->tag[1] ^ cur->tag.tag[1]) & cur->tagsel.tag[1]) &&
!((tag->tag[2] ^ cur->tag.tag[2]) & cur->tagsel.tag[2])) {
*time_threshold = cur->timestamp;
return cur;
}
}
return NULL;
}
听说PSM是一个支持标签匹配的库。 什么是标签匹配接口?为什么标签匹配对于 MPI 上下文中的性能很重要?
MPI 标签匹配的简短介绍:https://www.hpcwire.com/2006/08/18/a_critique_of_rdma-1/ 部分 "Matching"
MPI is a two-sided interface with a large matching space: a MPI Recv is associated with a MPI Send according to several criteria such as Sender, Tag, and Context, with the first two possibly ignored (wildcard). Matching is not necessarily in order and, worse, a MPI Send can be posted before the matching MPI Recv ... MPI requires 64 bits of matching information, and MX, Portals, and QsNet provide such a matching capability.
InfiniBand Verbs and other RDMA-based APIs do not support matching at all
因此,听起来 PSM 是包含与 Infiniband 式网络适配器的快速匹配的方法(第一个版本具有软件匹配,但有可能将部分匹配转移到硬件)。
我找不到 public PSM 文档(用户指南 http://storusint.com/pdf/qlogic/InfiniPath%20User%20Guide%202_0.pdf 中没有详细信息)。 但是有库的来源:https://github.com/01org/psm
PSM2 演示文稿中列出了一些详细信息https://www.openfabrics.org/images/eventpresos/2016presentations/304PSM2Features.pdf
What is PSM? Matched Queue (MQ) component
- • Semantically matched to the needs of MPI using tag matching
- • Provides calls for communication progress guarantees
- • MQ completion semantics (standard vs. synchronized)
PSM API
- • Global tag matching API with 64-bit tags
- • Scale up to 64K processes per job
- • MQ APIs provide point-to-point message passing between endpoints
- • e.g. psm_mq_send, psm_mq_irecv
- • No “recvfrom” functionality – needed by some applications
所以,有 64 位标签。每条消息都有一个标签,Matched Queue 有标签(在一些标签匹配实现中也有标签掩码)。根据来源psm_mq_internal.h: mq_req_match()
https://github.com/01org/psm/blob/67c0807c74e9d445900d5541358f0f575f22a630/psm_mq_internal.h#L381, PSM中有mask:
typedef struct psm_mq_req {
...
/* Tag matching vars */
uint64_t tag;
uint64_t tagsel; /* used for receives */
...
} psm_mq_req_t;
mq_req_match(struct mqsq *q, uint64_t tag, int remove)
)
{
psm_mq_req_t *curp;
psm_mq_req_t cur;
for (curp = &q->first; (cur = *curp) != NULL; curp = &cur->next) {
if (!((tag ^ cur->tag) & cur->tagsel)) { /* match! */
if (remove) {
if ((*curp = cur->next) == NULL) /* fix tail */
q->lastp = curp;
cur->next = NULL;
}
return cur;
}
}
因此,匹配是指传入标签与接收的 tag
进行异或,发布到 MQ,结果与接收的 tagsel
相乘。如果在这些操作之后只有零位,则找到匹配项,否则处理下一个接收。
评论来自 psm_mq.h
、psm_mq_irecv()
函数、https://github.com/01org/psm/blob/4abbc60ab02c51efee91575605b3430059f71ab8/psm_mq.h#L206
/* Post a receive to a Matched Queue with tag selection criteria
*
* Function to receive a non-blocking MQ message by providing a preposted
* buffer. For every MQ message received on a particular MQ, the tag and @c
* tagsel parameters are used against the incoming message's send tag as
* described in tagmatch.
*
* [in] mq Matched Queue Handle
* [in] rtag Receive tag
* [in] rtagsel Receive tag selector
* [in] flags Receive flags (None currently supported)
* [in] buf Receive buffer
* [in] len Receive buffer length
* [in] context User context pointer, available in psm_mq_status_t
* upon completion
* [out] req PSM MQ Request handle created by the preposted receive, to
* be used for explicitly controlling message receive
* completion.
*
* [post] The supplied receive buffer is given to MQ to match against incoming
* messages unless it is cancelled via psm_mq_cancel @e before any
* match occurs.
*
* The following error code is returned. Other errors are handled by the PSM
* error handler (psm_error_register_handler).
*
* [retval] PSM_OK The receive buffer has successfully been posted to the MQ.
*/
psm_error_t
psm_mq_irecv(psm_mq_t mq, uint64_t rtag, uint64_t rtagsel, uint32_t flags,
void *buf, uint32_t len, void *context, psm_mq_req_t *req);
将数据编码到标签中的示例:
* uint64_t tag = ( ((context_id & 0xffff) << 48) |
* ((my_rank & 0xffff) << 32) |
* ((send_tag & 0xffffffff)) );
使用 tagsel
掩码我们可以同时编码 "match everything"、"match tags with some bytes or bits equal to value, and anything in other"、"match exactly".
有更新的 PSM2 API,也是开源的 - https://github.com/01org/opa-psm2, programmer's guide published at http://www.intel.com/content/dam/support/us/en/documents/network/omni-adptr/sb/Intel_PSM2_PG_H76473_v1_0.pdf。
在PSM2中标签较长,定义了匹配规则(stag为"Message Send Tag"-消息中发送的标签值,rtag为接收请求的标签): https://www.openfabrics.org/images/eventpresos/2016presentations/304PSM2Features.pdf#page=7
Tag matching improvement
- • Increased tag size to 96 bits
- • Fundamentally
((stag ^ rtag) & rtagsel) == 0
- • Supports wildcards such as
MPI_ANY_SOURCE
orMPI_ANY_TAG
using zero bits inrtagsel
- • Allows for practically unlimited scalability
- • Up to 64M processes per job
PSM2 TAG MATCHING
#define PSM_MQ_TAG_ELEMENTS 3 typedef struct psm2_mq_tag { union { uint32_t tag[PSM_MQ_TAG_ELEMENTS] __attribute__((aligned(16))); struct { uint32_t tag0; uint32_t tag1; uint32_t tag2; }; }; } psm2_mq_tag_t;
- • Application fills ‘tag’ array or ‘tag0/tag1/tag2’ and passes to PSM
- • Both tag and tag mask use the same 96 bit tag type
实际上 psm2_mq_req
结构中的匹配变量附近有源对等地址:https://github.com/01org/opa-psm2/blob/master/psm_mq_internal.h#L180
/* Tag matching vars */
psm2_epaddr_t peer;
psm2_mq_tag_t tag;
psm2_mq_tag_t tagsel; /* used for receives */
和软件列表扫描匹配,mq_list_scan()
从 mq_req_match()
https://github.com/01org/opa-psm2/blob/85c07c656198204c4056e1984779fde98b00ba39/psm_mq_recv.c#L188:
psm2_mq_req_t
mq_list_scan(struct mqq *q, psm2_epaddr_t src, psm2_mq_tag_t *tag, int which, uint64_t *time_threshold)
{
psm2_mq_req_t *curp, cur;
for (curp = &q->first;
((cur = *curp) != NULL) && (cur->timestamp < *time_threshold);
curp = &cur->next[which]) {
if ((cur->peer == PSM2_MQ_ANY_ADDR || src == cur->peer) &&
!((tag->tag[0] ^ cur->tag.tag[0]) & cur->tagsel.tag[0]) &&
!((tag->tag[1] ^ cur->tag.tag[1]) & cur->tagsel.tag[1]) &&
!((tag->tag[2] ^ cur->tag.tag[2]) & cur->tagsel.tag[2])) {
*time_threshold = cur->timestamp;
return cur;
}
}
return NULL;
}