将公司名称列表变成股票行情
Turn list of company names into tickers
我有一个公司名称列表,我想将其转换为代码。这是创建我拥有的名称列表的可重现代码:
companynames=structure(list(V1 = structure(1:41, .Label = c("AETNA INC", "ANTHEM INC",
"APPLE INC", "ASPEN INSURANCE HOLDINGS LTD", "BARRICK GOLD CORP",
"BEST BUY CO INC", "CAREFUSION CORP", "CBS CORP-CLASS B NON VOTING",
"CIGNA CORP", "COMPUTER SCIENCES CORP", "COMPUWARE CORP", "COVENTRY HEALTH CARE INC",
"DELPHI AUTOMOTIVE PLC", "DST SYSTEMS INC", "EINSTEIN NOAH RESTAURANT GRO",
"ENSCO PLC-CL A", "EXPEDIA INC", "FIFTH STREET FINANCE CORP",
"GENERAL MOTORS CO", "GENWORTH FINANCIAL INC-CL A", "GREEN BRICK PARTNERS INC",
"HESS CORP", "HUMANA INC", "HUNTINGTON INGALLS INDUSTRIE", "LEGG MASON INC",
"MARKET VECTORS GOLD MINERS", "MARVELL TECHNOLOGY GROUP LTD",
"MICROSOFT CORP", "NCR CORPORATION", "NVR INC", "OAKTREE CAPITAL GROUP LLC",
"REPUBLIC AIRWAYS HOLDINGS IN", "SEAGATE TECHNOLOGY", "SPRINT COMMUNICATIONS INC",
"STARZ - A", "STATE BANK FINANCIAL CORP", "SYMMETRICOM INC",
"TESSERA TECHNOLOGIES INC", "UNITEDHEALTH GROUP INC", "VIRGIN MEDIA INC/OLD",
"XEROX CORP"), class = "factor")), .Names = "V1", class = "data.frame", row.names = c(NA,
-41L))
这给了我一些类似的东西:
head(companynames)
V1
1 AETNA INC
2 ANTHEM INC
3 APPLE INC
4 ASPEN INSURANCE HOLDINGS LTD
5 BARRICK GOLD CORP
6 BEST BUY CO INC
我想要另一列输出每个公司的代码。所以对于第一行我应该得到 AET,第二行是 ATHN,第三行是 AAPL,等等。我的例子是在 R 中,但是 python 或 R 中的任何解决方案都会非常有帮助。我不确定是否已经有一个函数可以执行此操作,或者如果它不存在,最好的方法是如何创建一个函数。
您可以使用@Joshual Ulrich 的 TTR
包来获取公司名称到代码的映射并针对您的 companynames
对象执行查找。理想情况下,您的姓名列表应该是准确的/格式正确的,但由于不是这样,您将不得不做一些额外的工作来获得一些符号。例如,
stock.symbols <- TTR::stockSymbols()
stock.symbols$adj_name <- gsub("[\.\,]", "", toupper(stock.symbols$Name)) # quick adjustments
##
companynames$Symbol <- sapply(companynames[,1], function(x) {
stock.symbols[grep(x, stock.symbols$adj_name)[1], 1]
})
##
R> na.omit(companynames)
# V1 Symbol
#1 AETNA INC AET
#2 ANTHEM INC ANTM
#3 APPLE INC AAPL
#5 BARRICK GOLD CORP ABX
#6 BEST BUY CO INC BBY
#9 CIGNA CORP CI
#10 COMPUTER SCIENCES CORP CSC
#13 DELPHI AUTOMOTIVE PLC DLPH
#14 DST SYSTEMS INC DST
#17 EXPEDIA INC EXPE
#18 FIFTH STREET FINANCE CORP FSC
#19 GENERAL MOTORS CO GM
#21 GREEN BRICK PARTNERS INC GRBK
#22 HESS CORP HES
#23 HUMANA INC HUM
#24 HUNTINGTON INGALLS INDUSTRIE HII
#25 LEGG MASON INC LM
#27 MARVELL TECHNOLOGY GROUP LTD MRVL
#28 MICROSOFT CORP MSFT
#29 NCR CORPORATION NCR
#30 NVR INC NVR
#31 OAKTREE CAPITAL GROUP LLC OAK
#32 REPUBLIC AIRWAYS HOLDINGS IN RJET
#33 SEAGATE TECHNOLOGY STX
#36 STATE BANK FINANCIAL CORP STBZ
#38 TESSERA TECHNOLOGIES INC TSRA
#39 UNITEDHEALTH GROUP INC UNH
#41 XEROX CORP XRX
因此只需使用一些基本转换(将 Names
列设置为大写并删除 .
s 和 ,
s),您可以匹配 41 个输入中的 28 个.大多数剩余的不匹配案例可能可以通过简单替换您的输入名称或 stock.symbols
中的 adj_names
列来解决,例如CORP
vs CORPORATION
,等等...正如上面的评论中所指出的,如果您的公司名称未在任何 NASDAQ
、AMEX
上交易, 或 NYSE
交换,你将不得不引入更多的外部数据。
我有一个公司名称列表,我想将其转换为代码。这是创建我拥有的名称列表的可重现代码:
companynames=structure(list(V1 = structure(1:41, .Label = c("AETNA INC", "ANTHEM INC",
"APPLE INC", "ASPEN INSURANCE HOLDINGS LTD", "BARRICK GOLD CORP",
"BEST BUY CO INC", "CAREFUSION CORP", "CBS CORP-CLASS B NON VOTING",
"CIGNA CORP", "COMPUTER SCIENCES CORP", "COMPUWARE CORP", "COVENTRY HEALTH CARE INC",
"DELPHI AUTOMOTIVE PLC", "DST SYSTEMS INC", "EINSTEIN NOAH RESTAURANT GRO",
"ENSCO PLC-CL A", "EXPEDIA INC", "FIFTH STREET FINANCE CORP",
"GENERAL MOTORS CO", "GENWORTH FINANCIAL INC-CL A", "GREEN BRICK PARTNERS INC",
"HESS CORP", "HUMANA INC", "HUNTINGTON INGALLS INDUSTRIE", "LEGG MASON INC",
"MARKET VECTORS GOLD MINERS", "MARVELL TECHNOLOGY GROUP LTD",
"MICROSOFT CORP", "NCR CORPORATION", "NVR INC", "OAKTREE CAPITAL GROUP LLC",
"REPUBLIC AIRWAYS HOLDINGS IN", "SEAGATE TECHNOLOGY", "SPRINT COMMUNICATIONS INC",
"STARZ - A", "STATE BANK FINANCIAL CORP", "SYMMETRICOM INC",
"TESSERA TECHNOLOGIES INC", "UNITEDHEALTH GROUP INC", "VIRGIN MEDIA INC/OLD",
"XEROX CORP"), class = "factor")), .Names = "V1", class = "data.frame", row.names = c(NA,
-41L))
这给了我一些类似的东西:
head(companynames)
V1
1 AETNA INC
2 ANTHEM INC
3 APPLE INC
4 ASPEN INSURANCE HOLDINGS LTD
5 BARRICK GOLD CORP
6 BEST BUY CO INC
我想要另一列输出每个公司的代码。所以对于第一行我应该得到 AET,第二行是 ATHN,第三行是 AAPL,等等。我的例子是在 R 中,但是 python 或 R 中的任何解决方案都会非常有帮助。我不确定是否已经有一个函数可以执行此操作,或者如果它不存在,最好的方法是如何创建一个函数。
您可以使用@Joshual Ulrich 的 TTR
包来获取公司名称到代码的映射并针对您的 companynames
对象执行查找。理想情况下,您的姓名列表应该是准确的/格式正确的,但由于不是这样,您将不得不做一些额外的工作来获得一些符号。例如,
stock.symbols <- TTR::stockSymbols()
stock.symbols$adj_name <- gsub("[\.\,]", "", toupper(stock.symbols$Name)) # quick adjustments
##
companynames$Symbol <- sapply(companynames[,1], function(x) {
stock.symbols[grep(x, stock.symbols$adj_name)[1], 1]
})
##
R> na.omit(companynames)
# V1 Symbol
#1 AETNA INC AET
#2 ANTHEM INC ANTM
#3 APPLE INC AAPL
#5 BARRICK GOLD CORP ABX
#6 BEST BUY CO INC BBY
#9 CIGNA CORP CI
#10 COMPUTER SCIENCES CORP CSC
#13 DELPHI AUTOMOTIVE PLC DLPH
#14 DST SYSTEMS INC DST
#17 EXPEDIA INC EXPE
#18 FIFTH STREET FINANCE CORP FSC
#19 GENERAL MOTORS CO GM
#21 GREEN BRICK PARTNERS INC GRBK
#22 HESS CORP HES
#23 HUMANA INC HUM
#24 HUNTINGTON INGALLS INDUSTRIE HII
#25 LEGG MASON INC LM
#27 MARVELL TECHNOLOGY GROUP LTD MRVL
#28 MICROSOFT CORP MSFT
#29 NCR CORPORATION NCR
#30 NVR INC NVR
#31 OAKTREE CAPITAL GROUP LLC OAK
#32 REPUBLIC AIRWAYS HOLDINGS IN RJET
#33 SEAGATE TECHNOLOGY STX
#36 STATE BANK FINANCIAL CORP STBZ
#38 TESSERA TECHNOLOGIES INC TSRA
#39 UNITEDHEALTH GROUP INC UNH
#41 XEROX CORP XRX
因此只需使用一些基本转换(将 Names
列设置为大写并删除 .
s 和 ,
s),您可以匹配 41 个输入中的 28 个.大多数剩余的不匹配案例可能可以通过简单替换您的输入名称或 stock.symbols
中的 adj_names
列来解决,例如CORP
vs CORPORATION
,等等...正如上面的评论中所指出的,如果您的公司名称未在任何 NASDAQ
、AMEX
上交易, 或 NYSE
交换,你将不得不引入更多的外部数据。