如何使用 Pandas 组织数据?
How can I organize data using Pandas?
我是 Python 的新手。我正在尝试将 CSV 文件组织成可读的网格。当我将我的 Excel 文件转换为 CSV 时,输出变得乱码,一堆乱七八糟的逗号和分散的值。我试过列表,但它仍然没有按照我想要的方式组织数据。我希望我的代码在 Pandas 网格图中按类别(例如民族和种族根源)进行组织。
这里是保存为CSV格式的部分文件(不幸的是会出现乱码):
Ethnic and Racial Roots Jobs Held Identity Reason for Latino Identity Latino ID With Whom Gets Together-Major Group With Whom Gets Together---Specific Group Transnational Behaviors Perceptions of Opportunity, Inequality, Discrimination
Subject Code Gen Place Age Male Country African European Indian Other Color Docs Reason Return 1st Occup 1st Oc Code 1st Wage Cur Occup Cur Oc Code Cur Wage Cur Hours/Day Father Occ Mother Occ Identity ID as Latino Ethnicity Culture Language Politics Values Emotions Everything Among Imms Mexican Cen Amer Caribbean South Amer Latinos-Gen Mex Gua Nic SS Hon CR PR DR Ecu Col Ven Bra Per Arg USYrs Contact R-Remits P-Remits Quantity Freq Sent How Sent Use 1 Use 2 US Bank OS Bank Type Com 1 Type Com 2 Presents Educ EngAbil EconOpps OthOpps Ineqaulity Discrim Context
F-001 1 1 28 1 2 0 1 1 0 1 2 3 4 serv sk park 8 7.5 serv sk park 8 14 10 99 99 1 1 1 0 0 0 0 0 0 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 2 1 1 8 1 1 1 1 2 1 1 1 0 9 13 1 1 0 1 1 3
F-002 1 2 35 1 15 1 1 1 0 3 9 6 4 sales work uns 7 7 music artist 10 7 99 9 9 1 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 1 1 8 3 2 5 3 2 1 1 1 0 1 13 2 2 0 1 9 9
F-003 1 1 30 0 10 0 1 1 0 1 2 1 1 restfood unsk 7 2.9 inspect arq skill 8 2.9 10 99 99 2 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 8 3 1 1 2 0 2 2 1 0 2 6 0 1 0 3 1 2
F-007 1 3 19 1 10 0 0 1 0 3 2 1 4 cleanserv unsk 7 8 restfood unsk 7 8 10 3 3 1 1 1 0 0 0 0 0 0 1 9 9 9 9 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 1 1 8 1 1 1 5 1 2 1 1 0 1 6 1 1 0 3 1 1
F-008 1 3 20 1 10 0 0 1 0 3 2 1 1 professional 10 8.75 restfood skill 8 8.75 10 3 3 1 1 0 0 0 0 1 0 0 1 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 4 1 1 8 1 1 1 4 5 2 1 1 0 2 11 1 1 0 1 1 8
F-010 1 2 21 0 5 0 1 1 0 1 1 5 1 serv sk cashier 8 6.75 serv skill libra 8 10 10 8 1 1 1 0 1 0 0 0 0 0 3 0 0 1 1 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 3 1 1 8 1 1 1 2 3 1 2 4 0 1 13 2 1 0 1 0 3
F-013 1 3 29 1 5 1 1 0 0 1 2 2 4 manufa unsk 4 4 manufa unsk 4 4 8 10 10 2 1 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 8 1 2 8 9 9 9 9 9 9 9 1 4 1 18 2 2 0 3 1 4
F-014 1 1 25 1 10 0 1 1 0 3 2 1 4 restfood unsk 7 3.5 restfood unsk 7 3.5 9 6 1 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 5 1 3 8 2 4 1 2 0 2 1 1 0 1 6 0 1 0 3 0 0
F-015 1 3 23 1 5 1 1 0 0 3 9 6 4 unknown 99 99 unknwon 99 99 99 99 99 9 9 9 9 9 9 9 9 9 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 9 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9
F-016 1 3 30 0 5 1 1 1 0 2 3 3 2 clean serv unsk 7 7 clean serv unsk 7 7 10 5 1 1 1 0 0 0 0 1 0 0 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 4 1 1 8 2 1 1 4 2 1 2 3 0 1 9 1 1 0 1 1 3
F-017 1 3 21 0 10 0 1 1 0 3 2 1 1 domest garden 7 5 homekeeper 1 5 8 6 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 2 8 3 2 1 3 2 2 2 4 0 1 9 0 1 0 2 1 5
F-018 1 3 23 1 10 1 1 1 0 3 2 3 2 ambulant unsk 7 restfood unsk 7 99 9 1 1 1 0 1 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 1 1 8 3 2 1 2 0 2 2 3 0 1 12 2 9 9 2 1 4
F-019 1 3 34 1 4 0 1 1 0 1 1 2 4 domest garden 7 3 professional 10 3 99 10 9 1 1 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 1 2 8 9 9 9 9 9 9 9 1 0 2 20 1 1 0 1 1 8
F-020 1 3 33 1 3 1 1 0 0 1 2 1 4 domestic serv 7 1.25 sales work unsk 7 1.25 12 5 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 1 1 8 1 1 1 4 0 1 1 1 4 1 14 1 1 0 1 1 4
F-021 1 3 33 0 5 1 0 1 1 4 3 2 2 clean serv unsk 7 9 clean serv unsk 7 9 10 3 1 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 1 3 8 3 4 1 2 3 2 2 1 1 1 14 1 1 0 2 1 3
F-022 1 3 33 1 3 1 1 1 0 1 2 2 1 sales work uns 7 99 clean serv unsk 7 99 8 99 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 8 1 1 1 1 5 1 1 2 3 2 12 1 1 0 1 1 8
F-024 1 3 26 1 15 1 1 1 0 3 2 2 4 restfood unsk 7 8.75 sales work unsk 7 8.75 99 5 7 1 1 0 0 1 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 8 3 1 1 2 0 2 1 1 0 2 13 1 1 0 9 0 0
F-025 1 2 31 1 6 0 1 1 0 1 3 5 2 serv rest skill 8 7.5 restfood unsk 7 7.5 12 9 1 2 1 1 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 13 0 3 8 3 4 3 2 0 2 2 1 0 2 12 2 1 0 1 1 1
F-026 1 3 31 0 6 0 1 1 0 3 3 5 4 serv hotel skill 8 8 manager proffes 10 8 8 5 1 1 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 1 3 8 3 5 5 2 0 1 2 1 0 1 13 2 1 0 3 1 3
F-027 1 3 20 1 14 0 1 1 0 1 1 3 4 adm asist NGO 10 3.75 superv rest skill\ 8 3.75 8 8 8 1 1 0 1 0 0 0 0 0 9 1 0 1 1 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 3 1 2 8 9 9 9 9 9 1 1 1 0 1 12 2 1 0 9 1 4
F-028 1 1 20 0 10 0 1 1 0 3 1 5 1 manufcloth unsk 7 2.5 adm asist NGO 10 2.5 8 7 1 2 1 1 0 0 0 0 0 0 4 1 1 1 0 0 1 0 0 1 1 0 0 1 0 0 0 0 0 0 1 1 1 8 3 1 1 2 0 2 2 1 0 1 12 0 1 0 3 1 4
F-032 1 3 22 1 6 0 1 1 0 1 2 2 1 restfood unsk 7 6.25 restfood unsk 7 6.25 12 9 1 2 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 1 1 8 2 1 1 1 2 2 2 1 0 9 9 1 1 0 1 0 0
F-033 1 1 20 1 10 0 1 1 0 1 2 3 1 restfood unsk 7 12 servworker skil; 8 12 10 6 1 2 1 0 0 1 0 0 0 0 1 1 1 0 1 0 1 0 0 1 1 0 0 0 1 1 0 0 0 0 2 1 2 8 1 1 1 2 0 2 2 1 0 2 12 1 1 0 1 1 3
F-034 1 3 30 0 4 1 1 1 0 1 3 2 3 manufa unsk 4 99 domestic serv 7 99 5 11 1 1 1 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 2 1 2 8 9 9 9 9 9 1 2 2 0 1 16 2 1 0 2 1 4
F-035 1 3 22 1 10 0 1 1 0 1 2 5 1 cleanserv unsk 7 10 restfood unsk 7 10 10 9 9 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 8 1 2 7 4 0 2 2 1 0 1 7 1 1 0 1 1 6
F-036 1 3 26 0 3 0 1 1 0 2 2 1 1 salesfood unsk 7 6 domerstserv uns 7 6 99 99 99 1 1 0 0 0 0 1 0 0 2 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 5 1 1 8 3 1 1 1 2 1 1 9 9 9 12 1 9 9 9 9 9
F-037 1 3 25 1 10 0 0 1 0 3 2 5 1 restfood unsk 7 99 restfood unsk 7 99 4 3 1 1 1 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 8 2 1 2 1 2 2 2 1 0 2 7 1 1 0 1 0 0
F-038 1 1 19 0 5 1 1 1 0 5 1 5 2 salespharm uns 7 7.5 restfood unsk 7 7.5 5 6 8 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 1 1 8 3 4 1 3 2 2 2 1 0 1 13 1 1 0 3 1 8
F-039 1 3 21 1 13 0 1 1 1 3 2 5 4 manufac unskil 4 5.25 salespharm uns 7 5.25 99 9 1 1 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 4 1 3 8 3 4 7 1 0 1 2 1 0 1 12 2 1 0 2 1 3
F-040 1 3 20 0 5 1 1 0 0 4 1 5 1 manufac unskill 4 5.5 clean serv unsk 7 5.5 8 5 9 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 0 0 0 0 0 0 1 1 0 1 1 0 0 0 2 1 2 8 3 2 1 3 0 1 2 1 0 1 12 0 1 0 2 1 8
F-041 1 2 25 0 6 0 1 1 0 3 2 5 1 manufac unskill 4 3 restfood unsk 7 3 8 99 99 1 1 0 0 0 0 1 0 0 1
这是用于此数据的代码(我想将其放入 Pandas 网格图中)
Codes
Generation 1= First 2=Second
Location 1=New York 2=New Jersey 3=Pennsylvania
Age Age at Last Birthday
Gender 0=Female 1=Male
Country 1=Arg 2=Bol 3=Bra 4=Col 5=DR 6=Ecu 7= El Sal 8=Gua 9=Hon 10=Mex 11=Nic 12=Pan 13=Peru 14=PR 15=Ven
African Roots 0=No 1=Yes
European Roots 0=No 1=Yes
Indian Roots 0=No 1=Yes
Other Roots 0=No 1=Yes
Skin Color 1=Light 2=Medium Light 3=Medium 4=Mediium Dark 5=Dark
Legal Status 1=Documents 2=No Documents 3=Questionable Documents 9=Missing
Reason for Migration 1=supply-side economics 2=demand-side economics 3=network links 4=violence at origin 5=family reasons 6=other
Return Plans 1=Yes 2=No 3=Don't Know 4=No Answer 9=Not Asked
Occupation 1=Unpaid 2=Student 3=Agrigulture 4=Unskilled Operative 5=Skilled Operative 6=Transport Worker
7=Unsilled Services 8=Skilled Services 9=Small Business 10=Professional 11=Retired 99=Unknown
Wage Wage in U.S. Dollars; 88=Not applicable; 99=Unknown
Hours Worked Hours Worked; 88=Not Applicable; 99=Unknown
Identity 1=Latino 2=American 3=Both 9=Unknown
Latino Identity Among Immigrants 1=Yes 2=No 3=Yes-No 4=Don't Know 9=Missing
Reasons for Latino Identity 1=Yes 0=No 9=Unknown
With Whom Gets Together 1=Yes 0=No 9=Unknown
USYrs Number of Years in US; 88=Not Applicable; 99 Missing
In Contact with Home Community 1=Yes 0=No 9=Unknown
R Sends Money Home 1=Yes 2=No 3=Send Other 9=Unknown
Parent Sends Money Home (Second Generation Only) 1=Yes 2=No 8=Not Applicable 9=Unknown
Quantity Sent by Respondent or Parent 1=Half of Paycheck 2=20% of Paycheck 3=Varies Month to Month
How Money Sent 1=Moneygram 2=Paisano 3=Friend 4=Self 5=Bank 6=Moneygram and Paisano 7=Moneygram and Friend
Frequency Money Sent 1=Once a Month 2=Twice a Year 3=Once a Year 4=Once in a While 5=Holidays
How Money Used 0=No Use 1=Buy House 2=Family Expenses 3=Health 4=Education 5=Savings 6=Pay a Debt
Bank in US 1=Yes 2=No 9=Unknown
Bank Overseas 1=Yes 2=No 9=Unknown
Type of Communication 1=Land Phone 2=Cell Phone 3=Calling Card 4=Email 5=Regular Mail 6=No Communication 9=Unknwn
Presents Sent 1=Yes 2=No 9=Unknown
Education In Years
EngAbil 0=None 1=Some English 2=Good English 9=Missing
EconOpps 1=More in US 2=More at Origin 3=Same at Both 9=Missing
OthOpps 0=Just Earnings 1=Personal 2=Work 3=Study 4=Political 9=Missing
Inequality 1=More at Origin 2=More in US 3=Same in Both 9=Missing
Discrim 1=Yes 0=No 9=Missing
Context 1=Work/School 2=On Street 3=Language 4=Race/Ethnicity 5=Medical 6=Violence 7=Poverty 8=Other 9=Missing
到目前为止,这是我的代码:
import numpy as np
import csv
import pandas as pd
Lat_pro = open('Identity.Codes.Datafile.csv')
Lat_reader = list(pd.read_csv(Lat_pro))
print Lat_reader
这是我的输出:
['Unnamed: 0', 'Unnamed: 1', 'Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4',
'Unnamed: 5', 'Ethnic and Racial Roots', 'Unnamed: 7', 'Unnamed: 8', 'Unnamed:
9', 'Unnamed: 10', 'Unnamed: 11', 'Unnamed: 12', 'Unnamed: 13', ' Jobs Held',
'Unnamed: 15', 'Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18', 'Unnamed: 19',
'Unnamed: 20', 'Unnamed: 21', 'Unnamed: 22', ' Identity', 'Unnamed: 24',
'Reason for Latino Identity ', 'Unnamed: 26', 'Unnamed: 27', 'Unnamed: 28',
'Unnamed: 29', 'Unnamed: 30', 'Unnamed: 31', 'Latino ID', 'With Whom Gets
Together-Major Group', 'Unnamed: 34', 'Unnamed: 35', 'Unnamed: 36', 'Unnamed:
37', ' With Whom Gets Together---Specific Group', 'Unnamed: 39', 'Unnamed: 40',
'Unnamed: 41', 'Unnamed: 42', 'Unnamed: 43', 'Unnamed: 44', 'Unnamed: 45',
'Unnamed: 46', 'Unnamed: 47', 'Unnamed: 48', 'Unnamed: 49', 'Unnamed: 50',
'Unnamed: 51', 'Unnamed: 52', 'Transnational Behaviors', 'Unnamed: 54',
'Unnamed: 55', 'Unnamed: 56', 'Unnamed: 57', 'Unnamed: 58', 'Unnamed: 59',
'Unnamed: 60', 'Unnamed: 61', 'Unnamed: 62', 'Unnamed: 63', 'Unnamed: 64',
'Unnamed: 65', 'Unnamed: 66', 'Unnamed: 67', 'Perceptions of Opportunity,
Inequality, Discrimination', 'Unnamed: 69', 'Unnamed: 70', 'Unnamed: 71',
'Unnamed: 72']
如果数据以逗号分隔,pandas.read_csv()
可能会更好。您可以使用 delimeter
(a.k.a sep
) 选项指定数据中使用的分隔符。
查看 the docs
例如:
pandas.read_csv('file.csv', delimiter=',')
就像 Peter 所说的那样,只需确保您的数据被正确分隔,然后您可以在那里指定它以确保它正确读取它。
此外,第一个 header 行会在第一个数据文件中搞砸。最好只删除它,但您也可以使用 skiprows
选项忽略它。
pandas.read_csv('file.csv', delimiter=',', skiprows=1)
更新:
对数据进行一些清理,第一次读取时没有使用 delimiter
或 skiprows
。
数据
Ethnic,and,Racial,Roots,Jobs,Held,Identity,Reason,for,Latino,Identity,Latino,ID,With,Whom,Gets,Together-Major,Group,With,Whom,Gets,Together---Specific,Group,Transnational,Behaviors,Perceptions,of,Opportunity,,Inequality,,Discrimination,
Subject,Code,Gen,Place,Age,Male,Country,African,European,Indian,Other,Color,Docs,Reason,Return,1st,Occup,1st,Oc,Code,1st,Wage,Cur,Occup,Cur,Oc,Code,Cur,Wage,Cur,Hours/Day,Father,Occ,Mother,Occ,Identity,ID,as,Latino,Ethnicity,Culture,Language,Politics,Values,Emotions,Everything,Among,Imms,Mexican,Cen,Amer,Caribbean,South,Amer,Latinos-Gen,Mex,Gua,Nic,SS,Hon,CR,PR,DR,Ecu,Col,Ven,Bra,Per,Arg,USYrs,Contact,R-Remits,P-Remits,Quantity,Freq,Sent,How,Sent,Use,1,Use,2,US,Bank,OS,Bank,Type,Com,1,Type,Com,2,Presents,Educ,EngAbil,EconOpps,OthOpps,Ineqaulity,Discrim,Context
F-001,1,1,28,1,2,0,1,1,0,1,2,3,4,serv,sk,park,8,7.5,serv,sk,park,8,14,10,99,99,1,1,1,0,0,0,0,0,0,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,2,1,1,8,1,1,1,1,2,1,1,1,0,9,13,1,1,0,1,1,3
F-002,1,2,35,1,15,1,1,1,0,3,9,6,4,sales,work,uns,7,7,music,artist,10,7,99,9,9,1,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,1,1,8,3,2,5,3,2,1,1,1,0,1,13,2,2,0,1,9,9
F-003,1,1,30,0,10,0,1,1,0,1,2,1,1,restfood,unsk,7,2.9,inspect,arq,skill,8,2.9,10,99,99,2,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,8,3,1,1,2,0,2,2,1,0,2,6,0,1,0,3,1,2
F-007,1,3,19,1,10,0,0,1,0,3,2,1,4,cleanserv,unsk,7,8,restfood,unsk,7,8,10,3,3,1,1,1,0,0,0,0,0,0,1,9,9,9,9,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,1,1,8,1,1,1,5,1,2,1,1,0,1,6,1,1,0,3,1,1
F-008,1,3,20,1,10,0,0,1,0,3,2,1,1,professional,10,8.75,restfood,skill,8,8.75,10,3,3,1,1,0,0,0,0,1,0,0,1,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,4,1,1,8,1,1,1,4,5,2,1,1,0,2,11,1,1,0,1,1,8
F-010,1,2,21,0,5,0,1,1,0,1,1,5,1,serv,sk,cashier,8,6.75,serv,skill,libra,8,10,10,8,1,1,1,0,1,0,0,0,0,0,3,0,0,1,1,0,0,0,0,0,0,0,1,1,1,0,1,0,0,0,3,1,1,8,1,1,1,2,3,1,2,4,0,1,13,2,1,0,1,0,3
F-013,1,3,29,1,5,1,1,0,0,1,2,2,4,manufa,unsk,4,4,manufa,unsk,4,4,8,10,10,2,1,0,1,0,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,0,1,1,0,0,1,0,0,0,8,1,2,8,9,9,9,9,9,9,9,1,4,1,18,2,2,0,3,1,4
F-014,1,1,25,1,10,0,1,1,0,3,2,1,4,restfood,unsk,7,3.5,restfood,unsk,7,3.5,9,6,1,1,1,1,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,5,1,3,8,2,4,1,2,0,2,1,1,0,1,6,0,1,0,3,0,0
F-015,1,3,23,1,5,1,1,0,0,3,9,6,4,unknown,99,99,unknwon,99,99,99,99,99,9,9,9,9,9,9,9,9,9,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,9,9,9,9,9,9,9,9,9,9,9,99,9,9,9,9,9,9
F-016,1,3,30,0,5,1,1,1,0,2,3,3,2,clean,serv,unsk,7,7,clean,serv,unsk,7,7,10,5,1,1,1,0,0,0,0,1,0,0,1,0,1,1,1,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,4,1,1,8,2,1,1,4,2,1,2,3,0,1,9,1,1,0,1,1,3
F-017,1,3,21,0,10,0,1,1,0,3,2,1,1,domest,garden,7,5,homekeeper,1,5,8,6,1,1,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,2,8,3,2,1,3,2,2,2,4,0,1,9,0,1,0,2,1,5
F-018,1,3,23,1,10,1,1,1,0,3,2,3,2,ambulant,unsk,7,restfood,unsk,7,99,9,1,1,1,0,1,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,1,1,8,3,2,1,2,0,2,2,3,0,1,12,2,9,9,2,1,4
F-019,1,3,34,1,4,0,1,1,0,1,1,2,4,domest,garden,7,3,professional,10,3,99,10,9,1,1,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,1,2,8,9,9,9,9,9,9,9,1,0,2,20,1,1,0,1,1,8
F-020,1,3,33,1,3,1,1,0,0,1,2,1,4,domestic,serv,7,1.25,sales,work,unsk,7,1.25,12,5,1,1,1,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,1,1,8,1,1,1,4,0,1,1,1,4,1,14,1,1,0,1,1,4
F-021,1,3,33,0,5,1,0,1,1,4,3,2,2,clean,serv,unsk,7,9,clean,serv,unsk,7,9,10,3,1,1,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,1,3,8,3,4,1,2,3,2,2,1,1,1,14,1,1,0,2,1,3
F-022,1,3,33,1,3,1,1,1,0,1,2,2,1,sales,work,uns,7,99,clean,serv,unsk,7,99,8,99,1,1,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,8,1,1,1,1,5,1,1,2,3,2,12,1,1,0,1,1,8
F-024,1,3,26,1,15,1,1,1,0,3,2,2,4,restfood,unsk,7,8.75,sales,work,unsk,7,8.75,99,5,7,1,1,0,0,1,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,8,3,1,1,2,0,2,1,1,0,2,13,1,1,0,9,0,0
F-025,1,2,31,1,6,0,1,1,0,1,3,5,2,serv,rest,skill,8,7.5,restfood,unsk,7,7.5,12,9,1,2,1,1,0,0,0,0,0,0,1,0,1,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,13,0,3,8,3,4,3,2,0,2,2,1,0,2,12,2,1,0,1,1,1
F-026,1,3,31,0,6,0,1,1,0,3,3,5,4,serv,hotel,skill,8,8,manager,proffes,10,8,8,5,1,1,1,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,11,1,3,8,3,5,5,2,0,1,2,1,0,1,13,2,1,0,3,1,3
F-027,1,3,20,1,14,0,1,1,0,1,1,3,4,adm,asist,NGO,10,3.75,superv,rest,skill\,8,3.75,8,8,8,1,1,0,1,0,0,0,0,0,9,1,0,1,1,0,1,0,0,0,0,0,1,1,0,0,1,0,0,0,3,1,2,8,9,9,9,9,9,1,1,1,0,1,12,2,1,0,9,1,4
F-028,1,1,20,0,10,0,1,1,0,3,1,5,1,manufcloth,unsk,7,2.5,adm,asist,NGO,10,2.5,8,7,1,2,1,1,0,0,0,0,0,0,4,1,1,1,0,0,1,0,0,1,1,0,0,1,0,0,0,0,0,0,1,1,1,8,3,1,1,2,0,2,2,1,0,1,12,0,1,0,3,1,4
F-032,1,3,22,1,6,0,1,1,0,1,2,2,1,restfood,unsk,7,6.25,restfood,unsk,7,6.25,12,9,1,2,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,1,1,8,2,1,1,1,2,2,2,1,0,9,9,1,1,0,1,0,0
F-033,1,1,20,1,10,0,1,1,0,1,2,3,1,restfood,unsk,7,12,servworker,skil;,8,12,10,6,1,2,1,0,0,1,0,0,0,0,1,1,1,0,1,0,1,0,0,1,1,0,0,0,1,1,0,0,0,0,2,1,2,8,1,1,1,2,0,2,2,1,0,2,12,1,1,0,1,1,3
F-034,1,3,30,0,4,1,1,1,0,1,3,2,3,manufa,unsk,4,99,domestic,serv,7,99,5,11,1,1,1,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,2,1,2,8,9,9,9,9,9,1,2,2,0,1,16,2,1,0,2,1,4
F-035,1,3,22,1,10,0,1,1,0,1,2,5,1,cleanserv,unsk,7,10,restfood,unsk,7,10,10,9,9,1,1,1,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,8,1,2,7,4,0,2,2,1,0,1,7,1,1,0,1,1,6
F-036,1,3,26,0,3,0,1,1,0,2,2,1,1,salesfood,unsk,7,6,domerstserv,uns,7,6,99,99,99,1,1,0,0,0,0,1,0,0,2,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,5,1,1,8,3,1,1,1,2,1,1,9,9,9,12,1,9,9,9,9,9
F-037,1,3,25,1,10,0,0,1,0,3,2,5,1,restfood,unsk,7,99,restfood,unsk,7,99,4,3,1,1,1,0,1,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,8,2,1,2,1,2,2,2,1,0,2,7,1,1,0,1,0,0
F-038,1,1,19,0,5,1,1,1,0,5,1,5,2,salespharm,uns,7,7.5,restfood,unsk,7,7.5,5,6,8,1,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,1,1,8,3,4,1,3,2,2,2,1,0,1,13,1,1,0,3,1,8
F-039,1,3,21,1,13,0,1,1,1,3,2,5,4,manufac,unskil,4,5.25,salespharm,uns,7,5.25,99,9,1,1,1,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,4,1,3,8,3,4,7,1,0,1,2,1,0,1,12,2,1,0,2,1,3
F-040,1,3,20,0,5,1,1,0,0,4,1,5,1,manufac,unskill,4,5.5,clean,serv,unsk,7,5.5,8,5,9,1,1,0,0,0,0,1,0,0,1,0,0,1,1,0,0,0,0,0,0,0,1,1,0,1,1,0,0,0,2,1,2,8,3,2,1,3,0,1,2,1,0,1,12,0,1,0,2,1,8
F-041,1,2,25,0,6,0,1,1,0,3,2,5,1,manufac,unskill,4,3,restfood,unsk,7,3,8,99,99,1,1,0,0,0,0,1,0,0,1
对于代码,我宁愿在这里使用字典。
例如
codes = {'Generation':{1:'First', 2: second},
'Location':{1:'New York', 2:'Pennsylvania', 3: 'New Jersey'}
}
然后你可以像这样引用值:
codes['Generation'][1] # yeilds 'First'
我是 Python 的新手。我正在尝试将 CSV 文件组织成可读的网格。当我将我的 Excel 文件转换为 CSV 时,输出变得乱码,一堆乱七八糟的逗号和分散的值。我试过列表,但它仍然没有按照我想要的方式组织数据。我希望我的代码在 Pandas 网格图中按类别(例如民族和种族根源)进行组织。
这里是保存为CSV格式的部分文件(不幸的是会出现乱码):
Ethnic and Racial Roots Jobs Held Identity Reason for Latino Identity Latino ID With Whom Gets Together-Major Group With Whom Gets Together---Specific Group Transnational Behaviors Perceptions of Opportunity, Inequality, Discrimination
Subject Code Gen Place Age Male Country African European Indian Other Color Docs Reason Return 1st Occup 1st Oc Code 1st Wage Cur Occup Cur Oc Code Cur Wage Cur Hours/Day Father Occ Mother Occ Identity ID as Latino Ethnicity Culture Language Politics Values Emotions Everything Among Imms Mexican Cen Amer Caribbean South Amer Latinos-Gen Mex Gua Nic SS Hon CR PR DR Ecu Col Ven Bra Per Arg USYrs Contact R-Remits P-Remits Quantity Freq Sent How Sent Use 1 Use 2 US Bank OS Bank Type Com 1 Type Com 2 Presents Educ EngAbil EconOpps OthOpps Ineqaulity Discrim Context
F-001 1 1 28 1 2 0 1 1 0 1 2 3 4 serv sk park 8 7.5 serv sk park 8 14 10 99 99 1 1 1 0 0 0 0 0 0 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 2 1 1 8 1 1 1 1 2 1 1 1 0 9 13 1 1 0 1 1 3
F-002 1 2 35 1 15 1 1 1 0 3 9 6 4 sales work uns 7 7 music artist 10 7 99 9 9 1 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 1 1 8 3 2 5 3 2 1 1 1 0 1 13 2 2 0 1 9 9
F-003 1 1 30 0 10 0 1 1 0 1 2 1 1 restfood unsk 7 2.9 inspect arq skill 8 2.9 10 99 99 2 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 8 3 1 1 2 0 2 2 1 0 2 6 0 1 0 3 1 2
F-007 1 3 19 1 10 0 0 1 0 3 2 1 4 cleanserv unsk 7 8 restfood unsk 7 8 10 3 3 1 1 1 0 0 0 0 0 0 1 9 9 9 9 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 1 1 8 1 1 1 5 1 2 1 1 0 1 6 1 1 0 3 1 1
F-008 1 3 20 1 10 0 0 1 0 3 2 1 1 professional 10 8.75 restfood skill 8 8.75 10 3 3 1 1 0 0 0 0 1 0 0 1 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 4 1 1 8 1 1 1 4 5 2 1 1 0 2 11 1 1 0 1 1 8
F-010 1 2 21 0 5 0 1 1 0 1 1 5 1 serv sk cashier 8 6.75 serv skill libra 8 10 10 8 1 1 1 0 1 0 0 0 0 0 3 0 0 1 1 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 3 1 1 8 1 1 1 2 3 1 2 4 0 1 13 2 1 0 1 0 3
F-013 1 3 29 1 5 1 1 0 0 1 2 2 4 manufa unsk 4 4 manufa unsk 4 4 8 10 10 2 1 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 8 1 2 8 9 9 9 9 9 9 9 1 4 1 18 2 2 0 3 1 4
F-014 1 1 25 1 10 0 1 1 0 3 2 1 4 restfood unsk 7 3.5 restfood unsk 7 3.5 9 6 1 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 5 1 3 8 2 4 1 2 0 2 1 1 0 1 6 0 1 0 3 0 0
F-015 1 3 23 1 5 1 1 0 0 3 9 6 4 unknown 99 99 unknwon 99 99 99 99 99 9 9 9 9 9 9 9 9 9 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 9 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9
F-016 1 3 30 0 5 1 1 1 0 2 3 3 2 clean serv unsk 7 7 clean serv unsk 7 7 10 5 1 1 1 0 0 0 0 1 0 0 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 4 1 1 8 2 1 1 4 2 1 2 3 0 1 9 1 1 0 1 1 3
F-017 1 3 21 0 10 0 1 1 0 3 2 1 1 domest garden 7 5 homekeeper 1 5 8 6 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 2 8 3 2 1 3 2 2 2 4 0 1 9 0 1 0 2 1 5
F-018 1 3 23 1 10 1 1 1 0 3 2 3 2 ambulant unsk 7 restfood unsk 7 99 9 1 1 1 0 1 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 1 1 8 3 2 1 2 0 2 2 3 0 1 12 2 9 9 2 1 4
F-019 1 3 34 1 4 0 1 1 0 1 1 2 4 domest garden 7 3 professional 10 3 99 10 9 1 1 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 1 2 8 9 9 9 9 9 9 9 1 0 2 20 1 1 0 1 1 8
F-020 1 3 33 1 3 1 1 0 0 1 2 1 4 domestic serv 7 1.25 sales work unsk 7 1.25 12 5 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 1 1 8 1 1 1 4 0 1 1 1 4 1 14 1 1 0 1 1 4
F-021 1 3 33 0 5 1 0 1 1 4 3 2 2 clean serv unsk 7 9 clean serv unsk 7 9 10 3 1 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 1 3 8 3 4 1 2 3 2 2 1 1 1 14 1 1 0 2 1 3
F-022 1 3 33 1 3 1 1 1 0 1 2 2 1 sales work uns 7 99 clean serv unsk 7 99 8 99 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 8 1 1 1 1 5 1 1 2 3 2 12 1 1 0 1 1 8
F-024 1 3 26 1 15 1 1 1 0 3 2 2 4 restfood unsk 7 8.75 sales work unsk 7 8.75 99 5 7 1 1 0 0 1 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 8 3 1 1 2 0 2 1 1 0 2 13 1 1 0 9 0 0
F-025 1 2 31 1 6 0 1 1 0 1 3 5 2 serv rest skill 8 7.5 restfood unsk 7 7.5 12 9 1 2 1 1 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 13 0 3 8 3 4 3 2 0 2 2 1 0 2 12 2 1 0 1 1 1
F-026 1 3 31 0 6 0 1 1 0 3 3 5 4 serv hotel skill 8 8 manager proffes 10 8 8 5 1 1 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 1 3 8 3 5 5 2 0 1 2 1 0 1 13 2 1 0 3 1 3
F-027 1 3 20 1 14 0 1 1 0 1 1 3 4 adm asist NGO 10 3.75 superv rest skill\ 8 3.75 8 8 8 1 1 0 1 0 0 0 0 0 9 1 0 1 1 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 3 1 2 8 9 9 9 9 9 1 1 1 0 1 12 2 1 0 9 1 4
F-028 1 1 20 0 10 0 1 1 0 3 1 5 1 manufcloth unsk 7 2.5 adm asist NGO 10 2.5 8 7 1 2 1 1 0 0 0 0 0 0 4 1 1 1 0 0 1 0 0 1 1 0 0 1 0 0 0 0 0 0 1 1 1 8 3 1 1 2 0 2 2 1 0 1 12 0 1 0 3 1 4
F-032 1 3 22 1 6 0 1 1 0 1 2 2 1 restfood unsk 7 6.25 restfood unsk 7 6.25 12 9 1 2 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 1 1 8 2 1 1 1 2 2 2 1 0 9 9 1 1 0 1 0 0
F-033 1 1 20 1 10 0 1 1 0 1 2 3 1 restfood unsk 7 12 servworker skil; 8 12 10 6 1 2 1 0 0 1 0 0 0 0 1 1 1 0 1 0 1 0 0 1 1 0 0 0 1 1 0 0 0 0 2 1 2 8 1 1 1 2 0 2 2 1 0 2 12 1 1 0 1 1 3
F-034 1 3 30 0 4 1 1 1 0 1 3 2 3 manufa unsk 4 99 domestic serv 7 99 5 11 1 1 1 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 2 1 2 8 9 9 9 9 9 1 2 2 0 1 16 2 1 0 2 1 4
F-035 1 3 22 1 10 0 1 1 0 1 2 5 1 cleanserv unsk 7 10 restfood unsk 7 10 10 9 9 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 8 1 2 7 4 0 2 2 1 0 1 7 1 1 0 1 1 6
F-036 1 3 26 0 3 0 1 1 0 2 2 1 1 salesfood unsk 7 6 domerstserv uns 7 6 99 99 99 1 1 0 0 0 0 1 0 0 2 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 5 1 1 8 3 1 1 1 2 1 1 9 9 9 12 1 9 9 9 9 9
F-037 1 3 25 1 10 0 0 1 0 3 2 5 1 restfood unsk 7 99 restfood unsk 7 99 4 3 1 1 1 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 8 2 1 2 1 2 2 2 1 0 2 7 1 1 0 1 0 0
F-038 1 1 19 0 5 1 1 1 0 5 1 5 2 salespharm uns 7 7.5 restfood unsk 7 7.5 5 6 8 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 1 1 8 3 4 1 3 2 2 2 1 0 1 13 1 1 0 3 1 8
F-039 1 3 21 1 13 0 1 1 1 3 2 5 4 manufac unskil 4 5.25 salespharm uns 7 5.25 99 9 1 1 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 4 1 3 8 3 4 7 1 0 1 2 1 0 1 12 2 1 0 2 1 3
F-040 1 3 20 0 5 1 1 0 0 4 1 5 1 manufac unskill 4 5.5 clean serv unsk 7 5.5 8 5 9 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 0 0 0 0 0 0 1 1 0 1 1 0 0 0 2 1 2 8 3 2 1 3 0 1 2 1 0 1 12 0 1 0 2 1 8
F-041 1 2 25 0 6 0 1 1 0 3 2 5 1 manufac unskill 4 3 restfood unsk 7 3 8 99 99 1 1 0 0 0 0 1 0 0 1
这是用于此数据的代码(我想将其放入 Pandas 网格图中)
Codes
Generation 1= First 2=Second
Location 1=New York 2=New Jersey 3=Pennsylvania
Age Age at Last Birthday
Gender 0=Female 1=Male
Country 1=Arg 2=Bol 3=Bra 4=Col 5=DR 6=Ecu 7= El Sal 8=Gua 9=Hon 10=Mex 11=Nic 12=Pan 13=Peru 14=PR 15=Ven
African Roots 0=No 1=Yes
European Roots 0=No 1=Yes
Indian Roots 0=No 1=Yes
Other Roots 0=No 1=Yes
Skin Color 1=Light 2=Medium Light 3=Medium 4=Mediium Dark 5=Dark
Legal Status 1=Documents 2=No Documents 3=Questionable Documents 9=Missing
Reason for Migration 1=supply-side economics 2=demand-side economics 3=network links 4=violence at origin 5=family reasons 6=other
Return Plans 1=Yes 2=No 3=Don't Know 4=No Answer 9=Not Asked
Occupation 1=Unpaid 2=Student 3=Agrigulture 4=Unskilled Operative 5=Skilled Operative 6=Transport Worker
7=Unsilled Services 8=Skilled Services 9=Small Business 10=Professional 11=Retired 99=Unknown
Wage Wage in U.S. Dollars; 88=Not applicable; 99=Unknown
Hours Worked Hours Worked; 88=Not Applicable; 99=Unknown
Identity 1=Latino 2=American 3=Both 9=Unknown
Latino Identity Among Immigrants 1=Yes 2=No 3=Yes-No 4=Don't Know 9=Missing
Reasons for Latino Identity 1=Yes 0=No 9=Unknown
With Whom Gets Together 1=Yes 0=No 9=Unknown
USYrs Number of Years in US; 88=Not Applicable; 99 Missing
In Contact with Home Community 1=Yes 0=No 9=Unknown
R Sends Money Home 1=Yes 2=No 3=Send Other 9=Unknown
Parent Sends Money Home (Second Generation Only) 1=Yes 2=No 8=Not Applicable 9=Unknown
Quantity Sent by Respondent or Parent 1=Half of Paycheck 2=20% of Paycheck 3=Varies Month to Month
How Money Sent 1=Moneygram 2=Paisano 3=Friend 4=Self 5=Bank 6=Moneygram and Paisano 7=Moneygram and Friend
Frequency Money Sent 1=Once a Month 2=Twice a Year 3=Once a Year 4=Once in a While 5=Holidays
How Money Used 0=No Use 1=Buy House 2=Family Expenses 3=Health 4=Education 5=Savings 6=Pay a Debt
Bank in US 1=Yes 2=No 9=Unknown
Bank Overseas 1=Yes 2=No 9=Unknown
Type of Communication 1=Land Phone 2=Cell Phone 3=Calling Card 4=Email 5=Regular Mail 6=No Communication 9=Unknwn
Presents Sent 1=Yes 2=No 9=Unknown
Education In Years
EngAbil 0=None 1=Some English 2=Good English 9=Missing
EconOpps 1=More in US 2=More at Origin 3=Same at Both 9=Missing
OthOpps 0=Just Earnings 1=Personal 2=Work 3=Study 4=Political 9=Missing
Inequality 1=More at Origin 2=More in US 3=Same in Both 9=Missing
Discrim 1=Yes 0=No 9=Missing
Context 1=Work/School 2=On Street 3=Language 4=Race/Ethnicity 5=Medical 6=Violence 7=Poverty 8=Other 9=Missing
到目前为止,这是我的代码:
import numpy as np
import csv
import pandas as pd
Lat_pro = open('Identity.Codes.Datafile.csv')
Lat_reader = list(pd.read_csv(Lat_pro))
print Lat_reader
这是我的输出:
['Unnamed: 0', 'Unnamed: 1', 'Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4',
'Unnamed: 5', 'Ethnic and Racial Roots', 'Unnamed: 7', 'Unnamed: 8', 'Unnamed:
9', 'Unnamed: 10', 'Unnamed: 11', 'Unnamed: 12', 'Unnamed: 13', ' Jobs Held',
'Unnamed: 15', 'Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18', 'Unnamed: 19',
'Unnamed: 20', 'Unnamed: 21', 'Unnamed: 22', ' Identity', 'Unnamed: 24',
'Reason for Latino Identity ', 'Unnamed: 26', 'Unnamed: 27', 'Unnamed: 28',
'Unnamed: 29', 'Unnamed: 30', 'Unnamed: 31', 'Latino ID', 'With Whom Gets
Together-Major Group', 'Unnamed: 34', 'Unnamed: 35', 'Unnamed: 36', 'Unnamed:
37', ' With Whom Gets Together---Specific Group', 'Unnamed: 39', 'Unnamed: 40',
'Unnamed: 41', 'Unnamed: 42', 'Unnamed: 43', 'Unnamed: 44', 'Unnamed: 45',
'Unnamed: 46', 'Unnamed: 47', 'Unnamed: 48', 'Unnamed: 49', 'Unnamed: 50',
'Unnamed: 51', 'Unnamed: 52', 'Transnational Behaviors', 'Unnamed: 54',
'Unnamed: 55', 'Unnamed: 56', 'Unnamed: 57', 'Unnamed: 58', 'Unnamed: 59',
'Unnamed: 60', 'Unnamed: 61', 'Unnamed: 62', 'Unnamed: 63', 'Unnamed: 64',
'Unnamed: 65', 'Unnamed: 66', 'Unnamed: 67', 'Perceptions of Opportunity,
Inequality, Discrimination', 'Unnamed: 69', 'Unnamed: 70', 'Unnamed: 71',
'Unnamed: 72']
pandas.read_csv()
可能会更好。您可以使用 delimeter
(a.k.a sep
) 选项指定数据中使用的分隔符。
查看 the docs
例如:
pandas.read_csv('file.csv', delimiter=',')
就像 Peter 所说的那样,只需确保您的数据被正确分隔,然后您可以在那里指定它以确保它正确读取它。
此外,第一个 header 行会在第一个数据文件中搞砸。最好只删除它,但您也可以使用 skiprows
选项忽略它。
pandas.read_csv('file.csv', delimiter=',', skiprows=1)
更新:
对数据进行一些清理,第一次读取时没有使用 delimiter
或 skiprows
。
数据
Ethnic,and,Racial,Roots,Jobs,Held,Identity,Reason,for,Latino,Identity,Latino,ID,With,Whom,Gets,Together-Major,Group,With,Whom,Gets,Together---Specific,Group,Transnational,Behaviors,Perceptions,of,Opportunity,,Inequality,,Discrimination,
Subject,Code,Gen,Place,Age,Male,Country,African,European,Indian,Other,Color,Docs,Reason,Return,1st,Occup,1st,Oc,Code,1st,Wage,Cur,Occup,Cur,Oc,Code,Cur,Wage,Cur,Hours/Day,Father,Occ,Mother,Occ,Identity,ID,as,Latino,Ethnicity,Culture,Language,Politics,Values,Emotions,Everything,Among,Imms,Mexican,Cen,Amer,Caribbean,South,Amer,Latinos-Gen,Mex,Gua,Nic,SS,Hon,CR,PR,DR,Ecu,Col,Ven,Bra,Per,Arg,USYrs,Contact,R-Remits,P-Remits,Quantity,Freq,Sent,How,Sent,Use,1,Use,2,US,Bank,OS,Bank,Type,Com,1,Type,Com,2,Presents,Educ,EngAbil,EconOpps,OthOpps,Ineqaulity,Discrim,Context
F-001,1,1,28,1,2,0,1,1,0,1,2,3,4,serv,sk,park,8,7.5,serv,sk,park,8,14,10,99,99,1,1,1,0,0,0,0,0,0,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,2,1,1,8,1,1,1,1,2,1,1,1,0,9,13,1,1,0,1,1,3
F-002,1,2,35,1,15,1,1,1,0,3,9,6,4,sales,work,uns,7,7,music,artist,10,7,99,9,9,1,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,1,1,8,3,2,5,3,2,1,1,1,0,1,13,2,2,0,1,9,9
F-003,1,1,30,0,10,0,1,1,0,1,2,1,1,restfood,unsk,7,2.9,inspect,arq,skill,8,2.9,10,99,99,2,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,8,3,1,1,2,0,2,2,1,0,2,6,0,1,0,3,1,2
F-007,1,3,19,1,10,0,0,1,0,3,2,1,4,cleanserv,unsk,7,8,restfood,unsk,7,8,10,3,3,1,1,1,0,0,0,0,0,0,1,9,9,9,9,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,1,1,8,1,1,1,5,1,2,1,1,0,1,6,1,1,0,3,1,1
F-008,1,3,20,1,10,0,0,1,0,3,2,1,1,professional,10,8.75,restfood,skill,8,8.75,10,3,3,1,1,0,0,0,0,1,0,0,1,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,4,1,1,8,1,1,1,4,5,2,1,1,0,2,11,1,1,0,1,1,8
F-010,1,2,21,0,5,0,1,1,0,1,1,5,1,serv,sk,cashier,8,6.75,serv,skill,libra,8,10,10,8,1,1,1,0,1,0,0,0,0,0,3,0,0,1,1,0,0,0,0,0,0,0,1,1,1,0,1,0,0,0,3,1,1,8,1,1,1,2,3,1,2,4,0,1,13,2,1,0,1,0,3
F-013,1,3,29,1,5,1,1,0,0,1,2,2,4,manufa,unsk,4,4,manufa,unsk,4,4,8,10,10,2,1,0,1,0,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,0,1,1,0,0,1,0,0,0,8,1,2,8,9,9,9,9,9,9,9,1,4,1,18,2,2,0,3,1,4
F-014,1,1,25,1,10,0,1,1,0,3,2,1,4,restfood,unsk,7,3.5,restfood,unsk,7,3.5,9,6,1,1,1,1,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,5,1,3,8,2,4,1,2,0,2,1,1,0,1,6,0,1,0,3,0,0
F-015,1,3,23,1,5,1,1,0,0,3,9,6,4,unknown,99,99,unknwon,99,99,99,99,99,9,9,9,9,9,9,9,9,9,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,9,9,9,9,9,9,9,9,9,9,9,99,9,9,9,9,9,9
F-016,1,3,30,0,5,1,1,1,0,2,3,3,2,clean,serv,unsk,7,7,clean,serv,unsk,7,7,10,5,1,1,1,0,0,0,0,1,0,0,1,0,1,1,1,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,4,1,1,8,2,1,1,4,2,1,2,3,0,1,9,1,1,0,1,1,3
F-017,1,3,21,0,10,0,1,1,0,3,2,1,1,domest,garden,7,5,homekeeper,1,5,8,6,1,1,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,2,8,3,2,1,3,2,2,2,4,0,1,9,0,1,0,2,1,5
F-018,1,3,23,1,10,1,1,1,0,3,2,3,2,ambulant,unsk,7,restfood,unsk,7,99,9,1,1,1,0,1,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,1,1,8,3,2,1,2,0,2,2,3,0,1,12,2,9,9,2,1,4
F-019,1,3,34,1,4,0,1,1,0,1,1,2,4,domest,garden,7,3,professional,10,3,99,10,9,1,1,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,1,2,8,9,9,9,9,9,9,9,1,0,2,20,1,1,0,1,1,8
F-020,1,3,33,1,3,1,1,0,0,1,2,1,4,domestic,serv,7,1.25,sales,work,unsk,7,1.25,12,5,1,1,1,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,1,1,8,1,1,1,4,0,1,1,1,4,1,14,1,1,0,1,1,4
F-021,1,3,33,0,5,1,0,1,1,4,3,2,2,clean,serv,unsk,7,9,clean,serv,unsk,7,9,10,3,1,1,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,1,3,8,3,4,1,2,3,2,2,1,1,1,14,1,1,0,2,1,3
F-022,1,3,33,1,3,1,1,1,0,1,2,2,1,sales,work,uns,7,99,clean,serv,unsk,7,99,8,99,1,1,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,8,1,1,1,1,5,1,1,2,3,2,12,1,1,0,1,1,8
F-024,1,3,26,1,15,1,1,1,0,3,2,2,4,restfood,unsk,7,8.75,sales,work,unsk,7,8.75,99,5,7,1,1,0,0,1,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,8,3,1,1,2,0,2,1,1,0,2,13,1,1,0,9,0,0
F-025,1,2,31,1,6,0,1,1,0,1,3,5,2,serv,rest,skill,8,7.5,restfood,unsk,7,7.5,12,9,1,2,1,1,0,0,0,0,0,0,1,0,1,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,13,0,3,8,3,4,3,2,0,2,2,1,0,2,12,2,1,0,1,1,1
F-026,1,3,31,0,6,0,1,1,0,3,3,5,4,serv,hotel,skill,8,8,manager,proffes,10,8,8,5,1,1,1,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,11,1,3,8,3,5,5,2,0,1,2,1,0,1,13,2,1,0,3,1,3
F-027,1,3,20,1,14,0,1,1,0,1,1,3,4,adm,asist,NGO,10,3.75,superv,rest,skill\,8,3.75,8,8,8,1,1,0,1,0,0,0,0,0,9,1,0,1,1,0,1,0,0,0,0,0,1,1,0,0,1,0,0,0,3,1,2,8,9,9,9,9,9,1,1,1,0,1,12,2,1,0,9,1,4
F-028,1,1,20,0,10,0,1,1,0,3,1,5,1,manufcloth,unsk,7,2.5,adm,asist,NGO,10,2.5,8,7,1,2,1,1,0,0,0,0,0,0,4,1,1,1,0,0,1,0,0,1,1,0,0,1,0,0,0,0,0,0,1,1,1,8,3,1,1,2,0,2,2,1,0,1,12,0,1,0,3,1,4
F-032,1,3,22,1,6,0,1,1,0,1,2,2,1,restfood,unsk,7,6.25,restfood,unsk,7,6.25,12,9,1,2,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,1,1,8,2,1,1,1,2,2,2,1,0,9,9,1,1,0,1,0,0
F-033,1,1,20,1,10,0,1,1,0,1,2,3,1,restfood,unsk,7,12,servworker,skil;,8,12,10,6,1,2,1,0,0,1,0,0,0,0,1,1,1,0,1,0,1,0,0,1,1,0,0,0,1,1,0,0,0,0,2,1,2,8,1,1,1,2,0,2,2,1,0,2,12,1,1,0,1,1,3
F-034,1,3,30,0,4,1,1,1,0,1,3,2,3,manufa,unsk,4,99,domestic,serv,7,99,5,11,1,1,1,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,2,1,2,8,9,9,9,9,9,1,2,2,0,1,16,2,1,0,2,1,4
F-035,1,3,22,1,10,0,1,1,0,1,2,5,1,cleanserv,unsk,7,10,restfood,unsk,7,10,10,9,9,1,1,1,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,8,1,2,7,4,0,2,2,1,0,1,7,1,1,0,1,1,6
F-036,1,3,26,0,3,0,1,1,0,2,2,1,1,salesfood,unsk,7,6,domerstserv,uns,7,6,99,99,99,1,1,0,0,0,0,1,0,0,2,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,5,1,1,8,3,1,1,1,2,1,1,9,9,9,12,1,9,9,9,9,9
F-037,1,3,25,1,10,0,0,1,0,3,2,5,1,restfood,unsk,7,99,restfood,unsk,7,99,4,3,1,1,1,0,1,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,8,2,1,2,1,2,2,2,1,0,2,7,1,1,0,1,0,0
F-038,1,1,19,0,5,1,1,1,0,5,1,5,2,salespharm,uns,7,7.5,restfood,unsk,7,7.5,5,6,8,1,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,1,1,8,3,4,1,3,2,2,2,1,0,1,13,1,1,0,3,1,8
F-039,1,3,21,1,13,0,1,1,1,3,2,5,4,manufac,unskil,4,5.25,salespharm,uns,7,5.25,99,9,1,1,1,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,4,1,3,8,3,4,7,1,0,1,2,1,0,1,12,2,1,0,2,1,3
F-040,1,3,20,0,5,1,1,0,0,4,1,5,1,manufac,unskill,4,5.5,clean,serv,unsk,7,5.5,8,5,9,1,1,0,0,0,0,1,0,0,1,0,0,1,1,0,0,0,0,0,0,0,1,1,0,1,1,0,0,0,2,1,2,8,3,2,1,3,0,1,2,1,0,1,12,0,1,0,2,1,8
F-041,1,2,25,0,6,0,1,1,0,3,2,5,1,manufac,unskill,4,3,restfood,unsk,7,3,8,99,99,1,1,0,0,0,0,1,0,0,1
对于代码,我宁愿在这里使用字典。
例如
codes = {'Generation':{1:'First', 2: second},
'Location':{1:'New York', 2:'Pennsylvania', 3: 'New Jersey'}
}
然后你可以像这样引用值:
codes['Generation'][1] # yeilds 'First'