如何理解不同值的语义?

How can understand semantic meaning for different value?

我想获取apple's financial data,从https://www.sec.gov/dera/data/financial-statement-and-notes-data-set.html下载https://www.sec.gov/files/dera/data/financial-statement-and-notes-data-sets/2022_01_notes.zip。解压后放入/tmp/2022_01_notes。你可以获取table sub,num 和网页中的字段定义 https://www.sec.gov/files/aqfsn_1.pdf.

我计算 zip 文件的 MD5 消息摘要。

md5sum  2022_01_notes.zip
b1cdf638200991e1bbe260489093bf67  2022_01_notes.zip

官网或者我的dropbox都可以下载:

https://www.dropbox.com/s/5ntwasipze8vr29/2022_01_notes.zip?dl=0

无论从哪里下载,请检查md5sum值,可能SEC上传的文件有误,以后他们会更新zip文件。

import pandas as pd
df_sub = pd.read_csv('/tmp/2022_01_notes/sub.tsv',sep='\t')
df_sub[df_sub['cik'] == 320193]  #apple's cik is 321093
df_sub
                      adsh     cik       name     sic countryba stprba     cityba  ...               instance nciks aciks pubfloatusd floatdate floataxis floatmems
4329  0000320193-22-000006  320193  APPLE INC  3571.0        US     CA  CUPERTINO  ...  aapl-20220127_htm.xml     1   NaN         NaN       NaN       NaN       NaN
4731  0000320193-22-000007  320193  APPLE INC  3571.0        US     CA  CUPERTINO  ...  aapl-20211225_htm.xml     1   NaN         NaN       NaN       NaN       NaN

0000320193-22-000007是其2022Q2数据的访问号。

df_num = pd.read_csv('/tmp/2022_01_notes/num.tsv',sep='\t')
#get all apple's financial data in xbrl concepts format
df_apple = df_num[df_num['adsh'] == '0000320193-22-000007' ]
#extract only one concept ----RevenueFromContractWithCustomerExcludingAssessedTax
#it is revenue mapping into financial accountant concept from xbrl taxonomy.
df_apple_revenue = df_apple[df_apple['tag'] == 'RevenueFromContractWithCustomerExcludingAssessedTax']
df_apple_revenue_2021 = df_apple_revenue[df_apple_revenue['ddate'] == 20201231]
df_apple_revenue_2021

在我的终端控制台上显示数据框太长了,我写到一个excel

df_apple_revenue_2021.to_csv('/tmp/apple_revenue_2021.csv')    

并在 excel 中显示,将内容粘贴到此处。

前两行,828500000015761000000是什么意思?请给828500000015761000000一个合理的描述。

0000320193-22-000007    RevenueFromContractWithCustomerExcludingAssessedTax us-gaap/2021    20201231    1   USD 0xf159835fd3644f228d15724ad9d1837c  0   8285000000      0   1       0.013698995 5   -6
0000320193-22-000007    RevenueFromContractWithCustomerExcludingAssessedTax us-gaap/2021    20201231    1   USD 0x58c22680ab8dbbfb662ff4e14055c1bd  1   15761000000     0   1       0.013698995 5   -6

要解释这些数字,您必须追溯到提取它们的文件。在这种情况下,0000320193-22-000007accession-number的归档是Form 10-Q For the Fiscal Quarter Ended December 25, 2021。如果您查看该文件,您会在 table Net sales by reportable segment 中的数据框中找到 value 数字中的七个,特别是 Three Months Ended December 26,2020.

因此,例如,8285000000 指的是那个时期的 Japan 段,而 15761000000Net sales by category table 中指的是 Services 同一报告期的类别。 table 在数据框中又包含六个 value