从文件中解析数据
Parsing data from a file
我收到了一份包含物种目击记录数据的文件,格式如下:
"Species", "\t", "Latitude", "\t", "Longitude"
我需要定义一个函数,将文件中的数据加载到列表中,同时将列表中的每一行都分为三个部分:物种名称、纬度和经度。
这是我拥有的,但它不起作用:
def LineToList(FileName):
FileIn = open(FileName, "r")
DataList = []
for Line in FileIn:
Line = Line.rstrip()
DataList.append(Line)
EntryList = []
for Entry in Line:
Entry = Line.split("\t")
EntryList.append(Entry)
FileIn.close()
return DataList
LineToList("Mammal.txt")
print(DataList[1])
我需要将每条线上的数据分开,以便以后可以使用它来计算物种在给定位置一定距离内的位置。
示例数据:
Myotis nattereri 54.07663633 -1.006446707
Myotis nattereri 54.25637837 -1.002130504
Myotis nattereri 54.25637837 -1.002130504
我正在尝试打印一行数据集以测试它是否正确拆分,但 shell
中没有显示任何内容
更新:
这是我现在使用的代码;
def LineToList(FileName):
FileIn = open(FileName, "r")
DataList = []
for Line in FileIn:
Line = Line.rstrip()
DataList.append(Line)
EntryList = []
for Entry in Line:
Entry = Line.split("\t")
EntryList.append(Entry)
return EntryList
FileIn.close()
return DataList
def CalculateDistance(Lat1, Lon1, Lat2, Lon2):
Lat1 = float(Lat1)
Lon1 = float(Lon1)
Lat2 = float(Lat2)
Lon2 = float(Lon2)
nDLat = (Lat1 - Lat2) * 0.017453293
nDLon = (Lon1 - Lon2) * 0.017453293
Lat1 = Lat1 * 0.017453293
Lat2 = Lat2 * 0.017453293
nA = (math.sin(nDLat/2) ** 2) + math.cos(Lat1) * math.cos(Lat2) * (math.sin(nDLon/2) ** 2 )
nC = 2 * math.atan2(math.sqrt(nA),math.sqrt( 1 - nA ))
nD = 6372.797 * nC
return nD
DataList = LineToList("Mammal.txt")
for Line in DataList:
LocationCount = 0
CalculateDistance(Entry[1], Entry[2], 54.988056, -1.619444)
if CalculateDistance <= 10:
LocationCount += 1
print("Number Recordings within Location Range:", LocationCount)
当运行程序出现错误时:
CalculateDistance(Entry[1], Entry[2], 54.988056, -1.619444) NameError: name 'Entry' is not defined
您的 DataList 变量是 LineToList 函数的局部变量;您必须在文件范围内分配给另一个变量:
DataList = LineToList("Mammal.txt")
print(DataList[1])
我认为您有一个常规的制表符分隔的 CSV,csv.reader
可以轻松地为您解析。
import csv
DataList = [row for row in csv.reader(open('Mammal.txt'), dialect='excel-tab')]
for data in DataList:
print(data)
这导致
['Myotis nattereri', '54.07663633', '-1.006446707']
['Myotis nattereri', '54.25637837', '-1.002130504']
['Myotis nattereri', '54.25637837', '-1.002130504']
我在你的个人资料中看到 "Biological Sciences",正因为如此,我建议你仔细看看 Pandas 模块。
可以很简单:
import pandas as pd
df = pd.read_csv('mammal.txt', sep='\t',
names=['species','lattitude','longitude'],
header=None)
print(df)
输出:
species lattitude longitude
0 Myotis nattereri 54.076636 -1.006447
1 Myotis nattereri 54.256378 -1.002131
2 Myotis nattereri 54.256378 -1.002131
我收到了一份包含物种目击记录数据的文件,格式如下:
"Species", "\t", "Latitude", "\t", "Longitude"
我需要定义一个函数,将文件中的数据加载到列表中,同时将列表中的每一行都分为三个部分:物种名称、纬度和经度。
这是我拥有的,但它不起作用:
def LineToList(FileName):
FileIn = open(FileName, "r")
DataList = []
for Line in FileIn:
Line = Line.rstrip()
DataList.append(Line)
EntryList = []
for Entry in Line:
Entry = Line.split("\t")
EntryList.append(Entry)
FileIn.close()
return DataList
LineToList("Mammal.txt")
print(DataList[1])
我需要将每条线上的数据分开,以便以后可以使用它来计算物种在给定位置一定距离内的位置。
示例数据:
Myotis nattereri 54.07663633 -1.006446707
Myotis nattereri 54.25637837 -1.002130504
Myotis nattereri 54.25637837 -1.002130504
我正在尝试打印一行数据集以测试它是否正确拆分,但 shell
中没有显示任何内容更新:
这是我现在使用的代码;
def LineToList(FileName):
FileIn = open(FileName, "r")
DataList = []
for Line in FileIn:
Line = Line.rstrip()
DataList.append(Line)
EntryList = []
for Entry in Line:
Entry = Line.split("\t")
EntryList.append(Entry)
return EntryList
FileIn.close()
return DataList
def CalculateDistance(Lat1, Lon1, Lat2, Lon2):
Lat1 = float(Lat1)
Lon1 = float(Lon1)
Lat2 = float(Lat2)
Lon2 = float(Lon2)
nDLat = (Lat1 - Lat2) * 0.017453293
nDLon = (Lon1 - Lon2) * 0.017453293
Lat1 = Lat1 * 0.017453293
Lat2 = Lat2 * 0.017453293
nA = (math.sin(nDLat/2) ** 2) + math.cos(Lat1) * math.cos(Lat2) * (math.sin(nDLon/2) ** 2 )
nC = 2 * math.atan2(math.sqrt(nA),math.sqrt( 1 - nA ))
nD = 6372.797 * nC
return nD
DataList = LineToList("Mammal.txt")
for Line in DataList:
LocationCount = 0
CalculateDistance(Entry[1], Entry[2], 54.988056, -1.619444)
if CalculateDistance <= 10:
LocationCount += 1
print("Number Recordings within Location Range:", LocationCount)
当运行程序出现错误时:
CalculateDistance(Entry[1], Entry[2], 54.988056, -1.619444) NameError: name 'Entry' is not defined
您的 DataList 变量是 LineToList 函数的局部变量;您必须在文件范围内分配给另一个变量:
DataList = LineToList("Mammal.txt")
print(DataList[1])
我认为您有一个常规的制表符分隔的 CSV,csv.reader
可以轻松地为您解析。
import csv
DataList = [row for row in csv.reader(open('Mammal.txt'), dialect='excel-tab')]
for data in DataList:
print(data)
这导致
['Myotis nattereri', '54.07663633', '-1.006446707']
['Myotis nattereri', '54.25637837', '-1.002130504']
['Myotis nattereri', '54.25637837', '-1.002130504']
我在你的个人资料中看到 "Biological Sciences",正因为如此,我建议你仔细看看 Pandas 模块。
可以很简单:
import pandas as pd
df = pd.read_csv('mammal.txt', sep='\t',
names=['species','lattitude','longitude'],
header=None)
print(df)
输出:
species lattitude longitude
0 Myotis nattereri 54.076636 -1.006447
1 Myotis nattereri 54.256378 -1.002131
2 Myotis nattereri 54.256378 -1.002131