当 <tr> 有 rowspan 时我该怎么办
What should I do when <tr> has rowspan
如果该行有 rowspan 元素,如何使该行对应于维基百科页面中的 table。
from bs4 import BeautifulSoup
import urllib2
from lxml.html import fromstring
import re
import csv
import pandas as pd
wiki = "http://en.wikipedia.org/wiki/List_of_England_Test_cricket_records"
header = {'User-Agent': 'Mozilla/5.0'} #Needed to prevent 403 error on Wikipedia
req = urllib2.Request(wiki,headers=header)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
try:
table = soup.find_all('table')[6]
except AttributeError as e:
print 'No tables found, exiting'
try:
first = table.find_all('tr')[0]
except AttributeError as e:
print 'No table row found, exiting'
try:
allRows = table.find_all('tr')[1:-1]
except AttributeError as e:
print 'No table row found, exiting'
headers = [header.get_text() for header in first.find_all(['th', 'td'])]
results = [[data.get_text() for data in row.find_all(['th', 'td'])] for row in allRows]
df = pd.DataFrame(data=results, columns=headers)
df
我得到 table 作为输出..但是对于行包含 rowspan - 的 table 我得到 table 作为如下-
如您所知,由于以下情况导致的问题
html内容:
<tr>
<td rowspan="2">2=</td>
<td>West Indies</td>
<td>4</td>
<td>Lord's</td>
<td>2009</td>
</tr>
<tr>
<td style="text-align:left;">India</td>
<td>4</td>
<td>Mumbai</td>
<td>2012</td>
</tr>
所以当 td
具有 rowspan
属性时,请考虑在同一级别为下一个 tr
重复相同的 td
值,并且 rowspan
的值表示下一个 tr
标签的数量。
- 获取所有此类
rowspan
信息并保存在变量中。保存 tr
标签的序号,td
标签的序号,rowspan
的值,即有多少 tr
标签具有相同的 td
,[ 的文本值=14=].
- 按照上述方法更新所有
tr
的结果
注::只检查给定的测试用例。需要检查更多的测试用例。
代码:
from bs4 import BeautifulSoup
import urllib2
from lxml.html import fromstring
import re
import csv
import pandas as pd
wiki = "http://en.wikipedia.org/wiki/List_of_England_Test_cricket_records"
header = {'User-Agent': 'Mozilla/5.0'} #Needed to prevent 403 error on Wikipedia
req = urllib2.Request(wiki,headers=header)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
table = soup.find_all('table')[6]
tmp = table.find_all('tr')
first = tmp[0]
allRows = tmp[1:-1]
#table.find_all('tr')[1:-1]
headers = [header.get_text() for header in first.find_all('th')]
results = [[data.get_text() for data in row.find_all('td')] for row in allRows]
#<td rowspan="2">2=</td>
# list of tuple (Level of tr, Level of td, total Count, Text Value)
#e.g.
#[(1, 0, 2, u'2=')]
# (<tr> is 1 , td sequence in tr is 0, reapted 2 times , value is 2=)
rowspan = []
for no, tr in enumerate(allRows):
tmp = []
for td_no, data in enumerate(tr.find_all('td')):
print data.has_key("rowspan")
if data.has_key("rowspan"):
rowspan.append((no, td_no, int(data["rowspan"]), data.get_text()))
if rowspan:
for i in rowspan:
# tr value of rowspan in present in 1th place in results
for j in xrange(1, i[2]):
#- Add value in next tr.
results[i[0]+j].insert(i[1], i[3])
df = pd.DataFrame(data=results, columns=headers)
print df
输出:
Rank Opponent No. wins Most recent venue Season
0 1 South Africa 6 Lord's 1951
1 2= West Indies 4 Lord's 2009
2 2= India 4 Mumbai 2012
3 4 Australia 3 Sydney 1932
4 5 Pakistan 2 Trent Bridge 1967
5 6 Sri Lanka 1 Old Trafford 2002
工作到 table 10 也
Rank Hundreds Player Matches Innings Average
0 1 25 Alastair Cook 107 191 45.61
1 2 23 Kevin Pietersen 104 181 47.28
2 3 22 Colin Cowdrey 114 188 44.07
3 3 22 Wally Hammond 85 140 58.46
4 3 22 Geoffrey Boycott 108 193 47.72
5 6 21 Andrew Strauss 100 178 40.91
6 6 21 Ian Bell 103 178 45.30
7 8= 20 Ken Barrington 82 131 58.67
8 8= 20 Graham Gooch 118 215 42.58
9 10 19 Len Hutton 79 138 56.67
输入:
<html>
<body>
<table width="100%" border="1">
<tr>
<td rowspan="2">one</td>
<td>two</td>
<td>three</td>
</tr>
<tr>
<td colspan="2">February</td>
</tr>
</table>
</body>
</html>
输出:
one two three
one February February
python代码:
# !/bin/python3
# coding: utf-8
from bs4 import BeautifulSoup
class Element(object):
def __init__(self, row, col, text, rowspan=1, colspan=1):
self.row = row
self.col = col
self.text = text
self.rowspan = rowspan
self.colspan = colspan
def __repr__(self):
return f'''{{"row": {self.row}, "col": {self.col}, "text": {self.text}, "rowspan": {self.rowspan}, "colspan": {self.colspan}}}'''
def isRowspan(self):
return self.rowspan > 1
def isColspan(self):
return self.colspan > 1
def parse(h) -> [[]]:
doc = BeautifulSoup(h, 'html.parser')
trs = doc.select('tr')
m = []
for row, tr in enumerate(trs): # collect Node, rowspan node, colspan node
it = []
ts = tr.find_all(['th', 'td'])
for col, tx in enumerate(ts):
element = Element(row, col, tx.text.strip())
if tx.has_attr('rowspan'):
element.rowspan = int(tx['rowspan'])
if tx.has_attr('colspan'):
element.colspan = int(tx['colspan'])
it.append(element)
m.append(it)
def solveColspan(ele):
row, col, text, rowspan, colspan = ele.row, ele.col, ele.text, ele.rowspan, ele.colspan
m[row].insert(col + 1, Element(row, col, text, rowspan, colspan - 1))
for column in range(col + 1, len(m[row])):
m[row][column].col += 1
def solveRowspan(ele):
row, col, text, rowspan, colspan = ele.row, ele.col, ele.text, ele.rowspan, ele.colspan
offset = row + 1
m[offset].insert(col, Element(offset, col, text, rowspan - 1, 1))
for column in range(col + 1, len(m[offset])):
m[offset][column].col += 1
for row in m:
for ele in row:
if ele.isColspan():
solveColspan(ele)
if ele.isRowspan():
solveRowspan(ele)
return m
def prettyPrint(m):
for i in m:
it = [f'{len(i)}']
for index, j in enumerate(i):
if j.text != '':
it.append(f'{index:2} {j.text[:4]:4}')
print(' --- '.join(it))
with open('./index.html', 'rb') as f:
index = f.read()
html = index.decode('utf-8')
matrix = parse(html)
prettyPrint(matrix)
None 在 Whosebug 或网络上找到的解析器对我有用——它们都错误地解析了我来自维基百科的 table。所以给你,一个实际工作并且简单的解析器。干杯。
定义解析器函数:
def pre_process_table(table):
"""
INPUT:
1. table - a bs4 element that contains the desired table: ie <table> ... </table>
OUTPUT:
a tuple of:
1. rows - a list of table rows ie: list of <tr>...</tr> elements
2. num_rows - number of rows in the table
3. num_cols - number of columns in the table
Options:
include_td_head_count - whether to use only th or th and td to count number of columns (default: False)
"""
rows = [x for x in table.find_all('tr')]
num_rows = len(rows)
# get an initial column count. Most often, this will be accurate
num_cols = max([len(x.find_all(['th','td'])) for x in rows])
# sometimes, the tables also contain multi-colspan headers. This accounts for that:
header_rows_set = [x.find_all(['th', 'td']) for x in rows if len(x.find_all(['th', 'td']))>num_cols/2]
num_cols_set = []
for header_rows in header_rows_set:
num_cols = 0
for cell in header_rows:
row_span, col_span = get_spans(cell)
num_cols+=len([cell.getText()]*col_span)
num_cols_set.append(num_cols)
num_cols = max(num_cols_set)
return (rows, num_rows, num_cols)
def get_spans(cell):
"""
INPUT:
1. cell - a <td>...</td> or <th>...</th> element that contains a table cell entry
OUTPUT:
1. a tuple with the cell's row and col spans
"""
if cell.has_attr('rowspan'):
rep_row = int(cell.attrs['rowspan'])
else: # ~cell.has_attr('rowspan'):
rep_row = 1
if cell.has_attr('colspan'):
rep_col = int(cell.attrs['colspan'])
else: # ~cell.has_attr('colspan'):
rep_col = 1
return (rep_row, rep_col)
def process_rows(rows, num_rows, num_cols):
"""
INPUT:
1. rows - a list of table rows ie <tr>...</tr> elements
OUTPUT:
1. data - a Pandas dataframe with the html data in it
"""
data = pd.DataFrame(np.ones((num_rows, num_cols))*np.nan)
for i, row in enumerate(rows):
try:
col_stat = data.iloc[i,:][data.iloc[i,:].isnull()].index[0]
except IndexError:
print(i, row)
for j, cell in enumerate(row.find_all(['td', 'th'])):
rep_row, rep_col = get_spans(cell)
#print("cols {0} to {1} with rep_col={2}".format(col_stat, col_stat+rep_col, rep_col))
#print("\trows {0} to {1} with rep_row={2}".format(i, i+rep_row, rep_row))
#find first non-na col and fill that one
while any(data.iloc[i,col_stat:col_stat+rep_col].notnull()):
col_stat+=1
data.iloc[i:i+rep_row,col_stat:col_stat+rep_col] = cell.getText()
if col_stat<data.shape[1]-1:
col_stat+=rep_col
return data
def main(table):
rows, num_rows, num_cols = pre_process_table(table)
df = process_rows(rows, num_rows, num_cols)
return(df)
这是一个示例,说明如何在此 Wisconsin 数据上使用上述代码。假设它已经在 bs4
汤里那么...
## Find tables on the page and locate the desired one:
tables = soup.findAll("table", class_='wikitable')
## I want table 3 or the one that contains years 2000-2018
table = tables[3]
## run the above functions to extract the data
rows, num_rows, num_cols = pre_process_table(table)
df = process_rows(rows, num_rows, num_cols)
我上面的解析器将准确地解析 table,例如 here,而所有其他解析器都无法在许多点重新创建 table。
对于简单的情况 - 更简单的解决方案
如果是具有 rowspan
属性且格式良好的 table,则上述问题可能有更简单的解决方案。 Pandas
有一个相当强大的 read_html
函数,可以解析提供的 html
tables 并且似乎可以很好地处理 rowspan
(无法解析 Wisconsin 的东西). fillna(method='ffill')
然后可以填充未填充的行。请注意,这不一定适用于列空间。另请注意,之后需要进行清理。
考虑 html 代码:
s = """<table width="100%" border="1">
<tr>
<td rowspan="1">one</td>
<td rowspan="2">two</td>
<td rowspan="3">three</td>
</tr>
<tr><td>"4"</td></tr>
<tr>
<td>"55"</td>
<td>"99"</td>
</tr>
</table>
"""
为了将其处理成请求的输出,只需执行:
In [16]: df = pd.read_html(s)[0]
In [29]: df
Out[29]:
0 1 2
0 one two three
1 "4" NaN NaN
2 "55" "99" NaN
然后填充NA,
In [30]: df.fillna(method='ffill')
Out[30]:
0 1 2
0 one two three
1 "4" two three
2 "55" "99" three
pandas >= 0.24.0 理解 colspan
和 rowspan
属性,如 中所述
发布
笔记。要提取之前给您带来问题的维基页面 table,请执行以下操作。
import pandas as pd
# Extract all tables from the wikipage
dfs = pd.read_html("http://en.wikipedia.org/wiki/List_of_England_Test_cricket_records")
# The table referenced above is the 7th on the wikipage
df = dfs[6]
# The last row is just the date of the last update
df = df.iloc[:-1]
输出:
Rank Victories Opposition Most recent venue Date
0 1 6 South Africa Lord's, London, England 21 June 1951
1 =2 4 India Wankhede Stadium, Mumbai, India 23 November 2012
2 =2 4 West Indies Lord's, London, England 6 May 2009
3 4 3 Australia Sydney Cricket Ground, Sydney, Australia 2 December 1932
4 5 2 Pakistan Trent Bridge, Nottingham, England 10 August 1967
5 6 1 Sri Lanka Old Trafford Cricket Ground, Manchester, England 13 June 2002
如果该行有 rowspan 元素,如何使该行对应于维基百科页面中的 table。
from bs4 import BeautifulSoup
import urllib2
from lxml.html import fromstring
import re
import csv
import pandas as pd
wiki = "http://en.wikipedia.org/wiki/List_of_England_Test_cricket_records"
header = {'User-Agent': 'Mozilla/5.0'} #Needed to prevent 403 error on Wikipedia
req = urllib2.Request(wiki,headers=header)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
try:
table = soup.find_all('table')[6]
except AttributeError as e:
print 'No tables found, exiting'
try:
first = table.find_all('tr')[0]
except AttributeError as e:
print 'No table row found, exiting'
try:
allRows = table.find_all('tr')[1:-1]
except AttributeError as e:
print 'No table row found, exiting'
headers = [header.get_text() for header in first.find_all(['th', 'td'])]
results = [[data.get_text() for data in row.find_all(['th', 'td'])] for row in allRows]
df = pd.DataFrame(data=results, columns=headers)
df
我得到 table 作为输出..但是对于行包含 rowspan - 的 table 我得到 table 作为如下-
如您所知,由于以下情况导致的问题
html内容:
<tr>
<td rowspan="2">2=</td>
<td>West Indies</td>
<td>4</td>
<td>Lord's</td>
<td>2009</td>
</tr>
<tr>
<td style="text-align:left;">India</td>
<td>4</td>
<td>Mumbai</td>
<td>2012</td>
</tr>
所以当 td
具有 rowspan
属性时,请考虑在同一级别为下一个 tr
重复相同的 td
值,并且 rowspan
的值表示下一个 tr
标签的数量。
- 获取所有此类
rowspan
信息并保存在变量中。保存tr
标签的序号,td
标签的序号,rowspan
的值,即有多少tr
标签具有相同的td
,[ 的文本值=14=]. - 按照上述方法更新所有
tr
的结果
注::只检查给定的测试用例。需要检查更多的测试用例。
代码:
from bs4 import BeautifulSoup
import urllib2
from lxml.html import fromstring
import re
import csv
import pandas as pd
wiki = "http://en.wikipedia.org/wiki/List_of_England_Test_cricket_records"
header = {'User-Agent': 'Mozilla/5.0'} #Needed to prevent 403 error on Wikipedia
req = urllib2.Request(wiki,headers=header)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
table = soup.find_all('table')[6]
tmp = table.find_all('tr')
first = tmp[0]
allRows = tmp[1:-1]
#table.find_all('tr')[1:-1]
headers = [header.get_text() for header in first.find_all('th')]
results = [[data.get_text() for data in row.find_all('td')] for row in allRows]
#<td rowspan="2">2=</td>
# list of tuple (Level of tr, Level of td, total Count, Text Value)
#e.g.
#[(1, 0, 2, u'2=')]
# (<tr> is 1 , td sequence in tr is 0, reapted 2 times , value is 2=)
rowspan = []
for no, tr in enumerate(allRows):
tmp = []
for td_no, data in enumerate(tr.find_all('td')):
print data.has_key("rowspan")
if data.has_key("rowspan"):
rowspan.append((no, td_no, int(data["rowspan"]), data.get_text()))
if rowspan:
for i in rowspan:
# tr value of rowspan in present in 1th place in results
for j in xrange(1, i[2]):
#- Add value in next tr.
results[i[0]+j].insert(i[1], i[3])
df = pd.DataFrame(data=results, columns=headers)
print df
输出:
Rank Opponent No. wins Most recent venue Season
0 1 South Africa 6 Lord's 1951
1 2= West Indies 4 Lord's 2009
2 2= India 4 Mumbai 2012
3 4 Australia 3 Sydney 1932
4 5 Pakistan 2 Trent Bridge 1967
5 6 Sri Lanka 1 Old Trafford 2002
工作到 table 10 也
Rank Hundreds Player Matches Innings Average
0 1 25 Alastair Cook 107 191 45.61
1 2 23 Kevin Pietersen 104 181 47.28
2 3 22 Colin Cowdrey 114 188 44.07
3 3 22 Wally Hammond 85 140 58.46
4 3 22 Geoffrey Boycott 108 193 47.72
5 6 21 Andrew Strauss 100 178 40.91
6 6 21 Ian Bell 103 178 45.30
7 8= 20 Ken Barrington 82 131 58.67
8 8= 20 Graham Gooch 118 215 42.58
9 10 19 Len Hutton 79 138 56.67
输入:
<html>
<body>
<table width="100%" border="1">
<tr>
<td rowspan="2">one</td>
<td>two</td>
<td>three</td>
</tr>
<tr>
<td colspan="2">February</td>
</tr>
</table>
</body>
</html>
输出:
one two three
one February February
python代码:
# !/bin/python3
# coding: utf-8
from bs4 import BeautifulSoup
class Element(object):
def __init__(self, row, col, text, rowspan=1, colspan=1):
self.row = row
self.col = col
self.text = text
self.rowspan = rowspan
self.colspan = colspan
def __repr__(self):
return f'''{{"row": {self.row}, "col": {self.col}, "text": {self.text}, "rowspan": {self.rowspan}, "colspan": {self.colspan}}}'''
def isRowspan(self):
return self.rowspan > 1
def isColspan(self):
return self.colspan > 1
def parse(h) -> [[]]:
doc = BeautifulSoup(h, 'html.parser')
trs = doc.select('tr')
m = []
for row, tr in enumerate(trs): # collect Node, rowspan node, colspan node
it = []
ts = tr.find_all(['th', 'td'])
for col, tx in enumerate(ts):
element = Element(row, col, tx.text.strip())
if tx.has_attr('rowspan'):
element.rowspan = int(tx['rowspan'])
if tx.has_attr('colspan'):
element.colspan = int(tx['colspan'])
it.append(element)
m.append(it)
def solveColspan(ele):
row, col, text, rowspan, colspan = ele.row, ele.col, ele.text, ele.rowspan, ele.colspan
m[row].insert(col + 1, Element(row, col, text, rowspan, colspan - 1))
for column in range(col + 1, len(m[row])):
m[row][column].col += 1
def solveRowspan(ele):
row, col, text, rowspan, colspan = ele.row, ele.col, ele.text, ele.rowspan, ele.colspan
offset = row + 1
m[offset].insert(col, Element(offset, col, text, rowspan - 1, 1))
for column in range(col + 1, len(m[offset])):
m[offset][column].col += 1
for row in m:
for ele in row:
if ele.isColspan():
solveColspan(ele)
if ele.isRowspan():
solveRowspan(ele)
return m
def prettyPrint(m):
for i in m:
it = [f'{len(i)}']
for index, j in enumerate(i):
if j.text != '':
it.append(f'{index:2} {j.text[:4]:4}')
print(' --- '.join(it))
with open('./index.html', 'rb') as f:
index = f.read()
html = index.decode('utf-8')
matrix = parse(html)
prettyPrint(matrix)
None 在 Whosebug 或网络上找到的解析器对我有用——它们都错误地解析了我来自维基百科的 table。所以给你,一个实际工作并且简单的解析器。干杯。
定义解析器函数:
def pre_process_table(table):
"""
INPUT:
1. table - a bs4 element that contains the desired table: ie <table> ... </table>
OUTPUT:
a tuple of:
1. rows - a list of table rows ie: list of <tr>...</tr> elements
2. num_rows - number of rows in the table
3. num_cols - number of columns in the table
Options:
include_td_head_count - whether to use only th or th and td to count number of columns (default: False)
"""
rows = [x for x in table.find_all('tr')]
num_rows = len(rows)
# get an initial column count. Most often, this will be accurate
num_cols = max([len(x.find_all(['th','td'])) for x in rows])
# sometimes, the tables also contain multi-colspan headers. This accounts for that:
header_rows_set = [x.find_all(['th', 'td']) for x in rows if len(x.find_all(['th', 'td']))>num_cols/2]
num_cols_set = []
for header_rows in header_rows_set:
num_cols = 0
for cell in header_rows:
row_span, col_span = get_spans(cell)
num_cols+=len([cell.getText()]*col_span)
num_cols_set.append(num_cols)
num_cols = max(num_cols_set)
return (rows, num_rows, num_cols)
def get_spans(cell):
"""
INPUT:
1. cell - a <td>...</td> or <th>...</th> element that contains a table cell entry
OUTPUT:
1. a tuple with the cell's row and col spans
"""
if cell.has_attr('rowspan'):
rep_row = int(cell.attrs['rowspan'])
else: # ~cell.has_attr('rowspan'):
rep_row = 1
if cell.has_attr('colspan'):
rep_col = int(cell.attrs['colspan'])
else: # ~cell.has_attr('colspan'):
rep_col = 1
return (rep_row, rep_col)
def process_rows(rows, num_rows, num_cols):
"""
INPUT:
1. rows - a list of table rows ie <tr>...</tr> elements
OUTPUT:
1. data - a Pandas dataframe with the html data in it
"""
data = pd.DataFrame(np.ones((num_rows, num_cols))*np.nan)
for i, row in enumerate(rows):
try:
col_stat = data.iloc[i,:][data.iloc[i,:].isnull()].index[0]
except IndexError:
print(i, row)
for j, cell in enumerate(row.find_all(['td', 'th'])):
rep_row, rep_col = get_spans(cell)
#print("cols {0} to {1} with rep_col={2}".format(col_stat, col_stat+rep_col, rep_col))
#print("\trows {0} to {1} with rep_row={2}".format(i, i+rep_row, rep_row))
#find first non-na col and fill that one
while any(data.iloc[i,col_stat:col_stat+rep_col].notnull()):
col_stat+=1
data.iloc[i:i+rep_row,col_stat:col_stat+rep_col] = cell.getText()
if col_stat<data.shape[1]-1:
col_stat+=rep_col
return data
def main(table):
rows, num_rows, num_cols = pre_process_table(table)
df = process_rows(rows, num_rows, num_cols)
return(df)
这是一个示例,说明如何在此 Wisconsin 数据上使用上述代码。假设它已经在 bs4
汤里那么...
## Find tables on the page and locate the desired one:
tables = soup.findAll("table", class_='wikitable')
## I want table 3 or the one that contains years 2000-2018
table = tables[3]
## run the above functions to extract the data
rows, num_rows, num_cols = pre_process_table(table)
df = process_rows(rows, num_rows, num_cols)
我上面的解析器将准确地解析 table,例如 here,而所有其他解析器都无法在许多点重新创建 table。
对于简单的情况 - 更简单的解决方案
如果是具有 rowspan
属性且格式良好的 table,则上述问题可能有更简单的解决方案。 Pandas
有一个相当强大的 read_html
函数,可以解析提供的 html
tables 并且似乎可以很好地处理 rowspan
(无法解析 Wisconsin 的东西). fillna(method='ffill')
然后可以填充未填充的行。请注意,这不一定适用于列空间。另请注意,之后需要进行清理。
考虑 html 代码:
s = """<table width="100%" border="1">
<tr>
<td rowspan="1">one</td>
<td rowspan="2">two</td>
<td rowspan="3">three</td>
</tr>
<tr><td>"4"</td></tr>
<tr>
<td>"55"</td>
<td>"99"</td>
</tr>
</table>
"""
为了将其处理成请求的输出,只需执行:
In [16]: df = pd.read_html(s)[0]
In [29]: df
Out[29]:
0 1 2
0 one two three
1 "4" NaN NaN
2 "55" "99" NaN
然后填充NA,
In [30]: df.fillna(method='ffill')
Out[30]:
0 1 2
0 one two three
1 "4" two three
2 "55" "99" three
pandas >= 0.24.0 理解 colspan
和 rowspan
属性,如 中所述
发布
笔记。要提取之前给您带来问题的维基页面 table,请执行以下操作。
import pandas as pd
# Extract all tables from the wikipage
dfs = pd.read_html("http://en.wikipedia.org/wiki/List_of_England_Test_cricket_records")
# The table referenced above is the 7th on the wikipage
df = dfs[6]
# The last row is just the date of the last update
df = df.iloc[:-1]
输出:
Rank Victories Opposition Most recent venue Date
0 1 6 South Africa Lord's, London, England 21 June 1951
1 =2 4 India Wankhede Stadium, Mumbai, India 23 November 2012
2 =2 4 West Indies Lord's, London, England 6 May 2009
3 4 3 Australia Sydney Cricket Ground, Sydney, Australia 2 December 1932
4 5 2 Pakistan Trent Bridge, Nottingham, England 10 August 1967
5 6 1 Sri Lanka Old Trafford Cricket Ground, Manchester, England 13 June 2002