我发现自己无法解决这个问题(自动完成税务文书工作).我有两个数据框:例如,一个具有欧元/美元汇率的季度历史记录,另一个具有我自己的发票,例如:
I find myself lost trying to solve this problem (automating tax paperwork). I have two dataframes: one with the quarterly historical records of EUR/USD exchange rates, and another with my own invoices, as an example:
import pandas as pd import numpy as np usdeur = [(pd.Timestamp('20170705'),1.1329), (pd.Timestamp('20170706'),1.1385), (pd.Timestamp('20170707'),1.1412), (pd.Timestamp('20170710'),1.1387), (pd.Timestamp('20170711'),1.1405), (pd.Timestamp('20170712'),1.1449)] labels = ['Date', 'Rate'] rates = pd.DataFrame.from_records(usdeur, columns=labels) transactions = [(pd.Timestamp('20170706'), 'PayPal', 'USD', 100, 1), (pd.Timestamp('20170706'), 'Fastspring', 'USD', 200, 1), (pd.Timestamp('20170709'), 'Fastspring', 'USD', 100, 1), (pd.Timestamp('20170710'), 'EU', 'EUR', 100, 1), (pd.Timestamp('20170710'), 'PayPal', 'USD', 200, 1)] labels = ['Date', 'From', 'Currency', 'Amount', 'Rate'] sales =pd.DataFrame.from_records(transactions, columns=labels)导致:
我需要在sales['Rate']栏中填充来自rates['Rate']的正确汇率,即:
I would need to have the sales['Rate'] column filled with the proper exchange rates from the rates['Rate'], that is to say:
- 如果sales['Currency']是'EUR',请不要理会.
- 对于sales的每一行,在rates中找到具有匹配的'Date'的行;抓取非常rates['Rate']的值并将其放入sales['Rate']
- 奖励:如果没有匹配的'Date'(例如,在假期期间,交易所市场关闭),请检查上一行直到找到合适的值.
- if sales['Currency'] is 'EUR', leave it alone.
- for each row of sales, find the row in rates with matching 'Date'; grab that very rates['Rate'] value and put it in sales['Rate']
- bonus: if there's no matching 'Date' (e.g. during holidays, the exchange market is closed), check the previous row until a suitable value is found.
完整结果应如下所示(请注意,第2行的费率自2017年7月7日开始):
The full result should look like the following (note that row #2 has the rate from 2017-07-07):
我尝试遵循其他问题的一些建议解决方案,但没有运气. 提前非常感谢您
I've tried to follow several suggested solutions from other questions, but with no luck. Thank you very much in advance
推荐答案我使用pd.merge_asof
sales=pd.merge_asof(sales,rates,on='Date',direction='backward',allow_exact_matches =True) sales.loc[sales.From=='EU','Rate_y']=sales.Rate_x sales Out[748]: Date From Currency Amount Rate_x Rate_y 0 2017-07-06 PayPal USD 100 1 1.1385 1 2017-07-06 Fastspring USD 200 1 1.1385 2 2017-07-09 Fastspring USD 100 1 1.1412 3 2017-07-10 EU EUR 100 1 1.0000 4 2017-07-10 PayPal USD 200 1 1.1387然后
sales.drop('Rate_x',1).rename(columns={'Rate_y':'Rate'}) Out[749]: Date From Currency Amount Rate 0 2017-07-06 PayPal USD 100 1.1385 1 2017-07-06 Fastspring USD 200 1.1385 2 2017-07-09 Fastspring USD 100 1.1412 3 2017-07-10 EU EUR 100 1.0000 4 2017-07-10 PayPal USD 200 1.1387