将组合的索引与值相关联

Question

我正在编写一个程序，我需要原子之间的距离组合，或 3D 中的各个点 space。这是一个例子：

一个文件'test'包含以下信息：

Ti 1.0 1.0 1.0

O 0.0 2.0 0.0

O 0.0 0.0 0.0

Ti 1.0 3.0 4.0

O 2.0 5.0 0.0

我希望我的代码计算点之间距离的所有组合（我已经做到了！），然后，我需要计算一个原子与另一个原子之间的距离小于 2.2 的次数.

这在文字上令人困惑，所以我会向您展示我目前所掌握的内容。

#!/usr/bin/env python
import sys, math, scipy, itertools
import numpy as np

try:
    infile = sys.argv[1]

except:
    print "Needs file name"
    sys.exit(1)

#opening files for first part
ifile = open(infile, 'r')
coordslist = []

#Creating a file of just coordinates that can be 'mathed on'
for line in ifile:
    pair = line.split()
    atom = (pair[0]); x = float(pair[1]); y = float(pair[2]); z = float(pair[3])
    coordslist += [(x,y,z)]
ifile.close()

#Define distance
def distance(p0,p1):
    return math.sqrt((p0[0] - p1[0])**2 + (p0[1] - p1[1])**2 + (p0[2] - p1[2])**                                          2)

#Initializing for next section
dislist = []
bondslist = []

#Compute distances between all points 1-2, 1-3, 1-4, etc.
for p0, p1 in itertools.combinations(coordslist,2):
    print p0, p1, distance(p0,p1)
    dislist += [distance(p0, p1)]
    if distance(p0,p1) < 2.2:
        bondslist += [(p0, distance(p0,p1))]
print bondslist
print dislist

我不确定制作这些列表是否对我有帮助。到目前为止，他们还没有。

输出为：

(1.0, 1.0, 1.0) (0.0, 2.0, 0.0) 1.73205080757

(1.0, 1.0, 1.0) (0.0, 0.0, 0.0) 1.73205080757

(1.0, 1.0, 1.0) (1.0, 3.0, 4.0) 3.60555127546

(1.0, 1.0, 1.0) (2.0, 5.0, 0.0) 4.24264068712

(0.0, 2.0, 0.0) (0.0, 0.0, 0.0) 2.0

(0.0, 2.0, 0.0) (1.0, 3.0, 4.0) 4.24264068712

(0.0, 2.0, 0.0) (2.0, 5.0, 0.0) 3.60555127546

(0.0, 0.0, 0.0) (1.0, 3.0, 4.0) 5.09901951359

(0.0, 0.0, 0.0) (2.0, 5.0, 0.0) 5.38516480713

(1.0, 3.0, 4.0) (2.0, 5.0, 0.0) 4.58257569496

[((1.0, 1.0, 1.0), 1.7320508075688772), ((1.0, 1.0, 1.0), 1.7320508075688772), ((0.0, 2.0, 0.0), 2.0)]

[1.7320508075688772, 1.7320508075688772, 3.605551275463989, 4.242640687119285, 2.0, 4.242640687119285, 3.605551275463989, 5.0990195135927845, 5.385164807134504, 4.58257569495584]

我需要从这个输出中得到的一件事是每个原子的距离小于 2.2 的次数，例如：

1 2 (because atom 1 has two distances less than 2.2 associated with it)

2 2

3 2 

4 0

5 0

我还需要查看是什么两个原子使距离小于 2.2。我这样做是为了计算鲍林费用；这是您需要查看原子的地方，确定它有多少个键（距离小于 2.2 埃的原子），然后查看原子附加到该原子，并查看那些上连接了多少原子。这非常令人沮丧，但这一切都将取决于跟踪每个原子，而不仅仅是它们的组合。数组可能会非常有用。

我已查看 here and here 寻求帮助，我认为我需要以某种方式组合这些方法。非常感谢任何帮助！

Answer 1

在我们开始之前，请注意，如果是晶体（我有点怀疑您处理的不是 Ti2O3 分子），您应该注意周期性边界条件，即离每个人都远的最后两个原子可能更靠近相邻单元中的原子。

如果您知道使用什么工具，您要做的事情就非常简单。您正在寻找一种可以告诉您集合中所有点之间的 成对距离 的方法。执行此操作的函数称为 pdist，准确地说是 scipy.spatial.distance.pdist。这可以计算任意维度、任意类型距离的任意点集的成对距离。在您的特定情况下，默认的欧几里德距离就可以了。

一组点的成对矩阵距离（元素 [i,j] 告诉您点 i 和 j 之间的距离）在构造上是对称的，其中零对角线。由于这个原因，pdist return 的通常实现仅在对角线一侧的非对角线元素，scipy 的版本也不例外。但是，有一个方便的 scipy.spatial.distance.squareform 函数可以将包含这种纯非对角对称矩阵的压缩版本的数组转换为满数组。从那里很容易 post-process.

我会这样做：

import numpy as np
import scipy.spatial as ssp

# atoms and positions:
# Ti 1.0 1.0 1.0
# O  0.0 2.0 0.0
# O  0.0 0.0 0.0
# Ti 1.0 3.0 4.0
# O  2.0 5.0 0.0

# define positions as m*n array, where n is the dimensionality (3)
allpos = np.array([[1.,1,1],  # 1. is lazy for dtype=float64
                   [0,2,0], 
                   [0,0,0],
                   [1,3,4],
                   [2,5,0]])

# compute pairwise distances
alldist_condensed = ssp.distance.pdist(allpos)       # vector of off-diagonal elements on one side
alldist = ssp.distance.squareform(alldist_condensed) # full symmetric distance matrix

# set diagonals to nan (or inf) to avoid tainting our output later
fancy_index = np.arange(alldist.shape[0])
alldist[fancy_index,fancy_index] = np.nan

# find index of "near" neighbours
thresh = 2.2
neighbslist = [np.where(alldist[k,:]<thresh)[0] for k in range(alldist.shape[0])]  # the k'th element is an array containing the indices which are "close" to atom number k

# find total number of "near" neighbours
nearnum = [neighbs.size for neighbs in neighbslist] # the k'th element is the number of atoms which are "close" to atom number k

因此对于您的具体情况，alldist 包含完整的距离矩阵：

array([[        nan,  1.73205081,  1.73205081,  3.60555128,  4.24264069],
       [ 1.73205081,         nan,  2.        ,  4.24264069,  3.60555128],
       [ 1.73205081,  2.        ,         nan,  5.09901951,  5.38516481],
       [ 3.60555128,  4.24264069,  5.09901951,         nan,  4.58257569],
       [ 4.24264069,  3.60555128,  5.38516481,  4.58257569,         nan]])

如您所见，我手动将对角线元素设置为 np.nan。这是必要的，因为我打算检查此矩阵中小于 thresh 的元素，并且对角线中的零肯定符合条件。在我们的例子中，np.inf 对于这些元素来说同样是一个不错的选择，但是如果你想获得比 thresh 更远的点怎么办？？显然对于这种情况 -np.inf 或 np.nan 是可以接受的（所以我选择了后者）。

post-近邻处理将我们带出 numpy 的领域（你应该尽可能地坚持使用 numpy，这通常是最快的）。对于每个原子，您都希望获得它附近的那些原子的列表。好吧，这不是一个每个原子都具有恒定长度的对象，因此您不能将其很好地存储在数组中。合乎逻辑的结论是使用 list，但是你可以全部使用 python 并使用列表理解来构造此列表（来自上面的提醒）：

neighbslist = [np.where(alldist[k,:]<thresh)[0] for k in range(alldist.shape[0])]  # the k'th element is an array containing the indices which are "close" to atom number k

这里np.where会在k行中找到距离足够小的索引，索引的一维数组存储在结果的第k个元素中列表 neighbslist。然后检查每个原子的这些数组的长度是微不足道的，给你你的 "number of near neihbours" 列表。请注意，我们可以将 np.where 的输出转换为列表 comp 中的 list 以完全离开 numpy，但是我们将不得不使用 len(neighbs) 而不是 neighbs.size 在下一行。

因此，您有两个关键变量，准确地说是两个列表； nearnum[k] 是原子 k 的 "near" 个邻居的数量（k 在 range(allpos.shape[0]) 中， neighbslist[k] 是一个一维 numpy 数组，列出了最近的原子 k 的索引，因此 neighbslist[k][j]（对于 range(nearnum[k]) 中的 j）是 range(allpos.shape[0]) 中的一个数字，不等于 k。想想看其中，这个数组列表的构造可能有点难看，所以你应该在构造期间将这个对象转换为适当的列表列表（即使这意味着一些开销）。

我最后才注意到你的输入数据在一个文件中。不用担心，也可以使用 numpy 轻松读取！假设那些空行不在你输入的名字test中，你可以调用

allpos = np.loadtxt('test',usecols=(1,2,3))

将位置矩阵读入您的变量。 usecols 选项让 numpy 忽略数据的第一列，这不是数字，并且会导致问题。反正我们真的不需要那个。

将组合的索引与值相关联

Associating a combination's indices with a value

python

arrays

numpy

chemistry