将 SQL 个命令转换为 Python
Convert SQL commands to Python
我在SQL中有以下代码:
CREATE OR REPLACE TABLE
`table2` AS
SELECT nmc_lvl, count(*) AS total_nb
FROM (
SELECT DISTINCT nmc_ray as nmc_lvl, cli_id, date
FROM `table1`
)
GROUP BY nmc_lvl
我一直在尝试在 python 中重写它,如下所示:
table2 = table1['cli_id', 'date', 'nmc_ray'].unique()
table2 = table2.groupby('nmc_ray')['cli_id', 'date'].count()
但我不断收到一般错误消息。我做错了什么?
编辑:
添加了错误消息
KeyError Traceback (most recent call last)
/tmp/ipykernel_7989/1297335147.py in <module>
----> 1 table2 = table1['cli_id', 'date', 'nmc_ray'].unique()
2 table2 = table2.groupby('nmc_ray')['cli_id', 'date'].count()
/opt/conda/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
3456 if self.columns.nlevels > 1:
3457 return self._getitem_multilevel(key)
-> 3458 indexer = self.columns.get_loc(key)
3459 if is_integer(indexer):
3460 indexer = [indexer]
/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: ('cli_id', 'date', 'nmc_ray')
IIUC,您可以尝试以下方法:
table2 = (table1.drop_duplicates(subset=['nmc_ray', 'cli_id', 'date'])[['nmc_ray','cli_id','date']]
.rename(columns={'nmc_ray':'nmc_lvl'})
.value_counts('nmc_lvl').reset_index(name='total_nb'))
SELECT DISTINCT col
等价于drop_duplicates(col)
,SELECT col, count(*)
等价于value_counts(col)
。
我在SQL中有以下代码:
CREATE OR REPLACE TABLE
`table2` AS
SELECT nmc_lvl, count(*) AS total_nb
FROM (
SELECT DISTINCT nmc_ray as nmc_lvl, cli_id, date
FROM `table1`
)
GROUP BY nmc_lvl
我一直在尝试在 python 中重写它,如下所示:
table2 = table1['cli_id', 'date', 'nmc_ray'].unique()
table2 = table2.groupby('nmc_ray')['cli_id', 'date'].count()
但我不断收到一般错误消息。我做错了什么?
编辑: 添加了错误消息
KeyError Traceback (most recent call last)
/tmp/ipykernel_7989/1297335147.py in <module>
----> 1 table2 = table1['cli_id', 'date', 'nmc_ray'].unique()
2 table2 = table2.groupby('nmc_ray')['cli_id', 'date'].count()
/opt/conda/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
3456 if self.columns.nlevels > 1:
3457 return self._getitem_multilevel(key)
-> 3458 indexer = self.columns.get_loc(key)
3459 if is_integer(indexer):
3460 indexer = [indexer]
/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: ('cli_id', 'date', 'nmc_ray')
IIUC,您可以尝试以下方法:
table2 = (table1.drop_duplicates(subset=['nmc_ray', 'cli_id', 'date'])[['nmc_ray','cli_id','date']]
.rename(columns={'nmc_ray':'nmc_lvl'})
.value_counts('nmc_lvl').reset_index(name='total_nb'))
SELECT DISTINCT col
等价于drop_duplicates(col)
,SELECT col, count(*)
等价于value_counts(col)
。