使用 python {census} 计算每个州的县数
Count number of counties per state using python {census}
我在计算使用著名 cenus.csv 数据的县的数量时遇到了麻烦。
任务:统计每个州的县数。
面对比较(我认为)/请看下面?
我试过这个:
df = pd.read_csv('census.csv')
dfd = df[:]['STNAME'].unique() //Gives out names of state
serr = pd.Series(dfd) // converting to series (from array)
在此之后,我尝试了两种方法:
1:
df[df['STNAME'] == serr] **//ERROR: series length must match**
2:
i = 0
for name in serr: //This generate error 'Alabama'
df['STNAME'] == name
for i in serr:
serr[i] == serr[name]
print(serr[name].count)
i+=1
请指导我;这些东西已经三天了。
使用 groupby
并使用 nunique
汇总 COUNTY
:
In [1]: import pandas as pd
In [2]: df = pd.read_csv('census.csv')
In [3]: unique_counties = df.groupby('STNAME')['COUNTY'].nunique()
结果出来了
In [4]: unique_counties
Out[4]:
STNAME
Alabama 68
Alaska 30
Arizona 16
Arkansas 76
California 59
Colorado 65
Connecticut 9
Delaware 4
District of Columbia 2
Florida 68
Georgia 160
Hawaii 6
Idaho 45
Illinois 103
Indiana 93
Iowa 100
Kansas 106
Kentucky 121
Louisiana 65
Maine 17
Maryland 25
Massachusetts 15
Michigan 84
Minnesota 88
Mississippi 83
Missouri 116
Montana 57
Nebraska 94
Nevada 18
New Hampshire 11
New Jersey 22
New Mexico 34
New York 63
North Carolina 101
North Dakota 54
Ohio 89
Oklahoma 78
Oregon 37
Pennsylvania 68
Rhode Island 6
South Carolina 47
South Dakota 67
Tennessee 96
Texas 255
Utah 30
Vermont 15
Virginia 134
Washington 40
West Virginia 56
Wisconsin 73
Wyoming 24
Name: COUNTY, dtype: int64
juanpa.arrivillaga 有一个很好的解决方案。但是,代码需要稍作修改。
"counties" 和 'SUMLEV' == 40
或 'COUNTY' == 0
应被过滤。否则,所有县的数量都大了一个。
所以,正确答案应该是:
unique_counties = census_df[census_df['SUMLEV'] == 50].groupby('STNAME')['COUNTY'].nunique()
结果如下:
STNAME
Alabama 67
Alaska 29
Arizona 15
Arkansas 75
California 58
Colorado 64
Connecticut 8
Delaware 3
District of Columbia 1
Florida 67
Georgia 159
Hawaii 5
Idaho 44
Illinois 102
Indiana 92
Iowa 99
Kansas 105
Kentucky 120
Louisiana 64
Maine 16
Maryland 24
Massachusetts 14
Michigan 83
Minnesota 87
Mississippi 82
Missouri 115
Montana 56
Nebraska 93
Nevada 17
New Hampshire 10
New Jersey 21
New Mexico 33
New York 62
North Carolina 100
North Dakota 53
Ohio 88
Oklahoma 77
Oregon 36
Pennsylvania 67
Rhode Island 5
South Carolina 46
South Dakota 66
Tennessee 95
Texas 254
Utah 29
Vermont 14
Virginia 133
Washington 39
West Virginia 55
Wisconsin 72
Wyoming 23
Name: COUNTY, dtype: int64
@Bakhtawar - 这是一个非常简单的方法:
df.groupby(df['STNAME']).count().COUNTY
我在计算使用著名 cenus.csv 数据的县的数量时遇到了麻烦。
任务:统计每个州的县数。
面对比较(我认为)/请看下面?
我试过这个:
df = pd.read_csv('census.csv')
dfd = df[:]['STNAME'].unique() //Gives out names of state
serr = pd.Series(dfd) // converting to series (from array)
在此之后,我尝试了两种方法:
1:
df[df['STNAME'] == serr] **//ERROR: series length must match**
2:
i = 0
for name in serr: //This generate error 'Alabama'
df['STNAME'] == name
for i in serr:
serr[i] == serr[name]
print(serr[name].count)
i+=1
请指导我;这些东西已经三天了。
使用 groupby
并使用 nunique
汇总 COUNTY
:
In [1]: import pandas as pd
In [2]: df = pd.read_csv('census.csv')
In [3]: unique_counties = df.groupby('STNAME')['COUNTY'].nunique()
结果出来了
In [4]: unique_counties
Out[4]:
STNAME
Alabama 68
Alaska 30
Arizona 16
Arkansas 76
California 59
Colorado 65
Connecticut 9
Delaware 4
District of Columbia 2
Florida 68
Georgia 160
Hawaii 6
Idaho 45
Illinois 103
Indiana 93
Iowa 100
Kansas 106
Kentucky 121
Louisiana 65
Maine 17
Maryland 25
Massachusetts 15
Michigan 84
Minnesota 88
Mississippi 83
Missouri 116
Montana 57
Nebraska 94
Nevada 18
New Hampshire 11
New Jersey 22
New Mexico 34
New York 63
North Carolina 101
North Dakota 54
Ohio 89
Oklahoma 78
Oregon 37
Pennsylvania 68
Rhode Island 6
South Carolina 47
South Dakota 67
Tennessee 96
Texas 255
Utah 30
Vermont 15
Virginia 134
Washington 40
West Virginia 56
Wisconsin 73
Wyoming 24
Name: COUNTY, dtype: int64
juanpa.arrivillaga 有一个很好的解决方案。但是,代码需要稍作修改。
"counties" 和 'SUMLEV' == 40
或 'COUNTY' == 0
应被过滤。否则,所有县的数量都大了一个。
所以,正确答案应该是:
unique_counties = census_df[census_df['SUMLEV'] == 50].groupby('STNAME')['COUNTY'].nunique()
结果如下:
STNAME
Alabama 67
Alaska 29
Arizona 15
Arkansas 75
California 58
Colorado 64
Connecticut 8
Delaware 3
District of Columbia 1
Florida 67
Georgia 159
Hawaii 5
Idaho 44
Illinois 102
Indiana 92
Iowa 99
Kansas 105
Kentucky 120
Louisiana 64
Maine 16
Maryland 24
Massachusetts 14
Michigan 83
Minnesota 87
Mississippi 82
Missouri 115
Montana 56
Nebraska 93
Nevada 17
New Hampshire 10
New Jersey 21
New Mexico 33
New York 62
North Carolina 100
North Dakota 53
Ohio 88
Oklahoma 77
Oregon 36
Pennsylvania 67
Rhode Island 5
South Carolina 46
South Dakota 66
Tennessee 95
Texas 254
Utah 29
Vermont 14
Virginia 133
Washington 39
West Virginia 55
Wisconsin 72
Wyoming 23
Name: COUNTY, dtype: int64
@Bakhtawar - 这是一个非常简单的方法:
df.groupby(df['STNAME']).count().COUNTY