Panda: Working With Date In Pandas
Sumber: https://galaxydatatech.com/2017/12/23/working-date-pandas/
Pandas provides powerful tools for working with dates and time data sets. Pandas datetime functionality heavily depends on the NumPy datetime64 and timedelta64 dtypes. Working with date in pandas is very important especially in the analysis of time related data sets. Date and time often comes in different format where some format contains numbers, some contain a combination of numbers and text. Before the modeling of data the date must be in the standard format so as to give accurate predictions. In this post we will learn how to work with date in pandas. Working With Date In Pandas
Python provides flexible methods for working with dates. Below, we are going to see how to manipulate date-time data in Pandas. Creating Date Range
To create ranges of dates in Pandas we use date_range() function. The pandas.date_range(start=None, end=None, periods=None, freq=’D’, tz=None, normalize=False, name=None, closed=None, **kwargs) function returns a DateTimeIndex. Below are some of the parameters in the function.
start : This argument specifies the start datetime-like string to be passed. If not specified the default is None. end : This argument specifies the end datetime-like string to be passed. If not specified the default is None. periods : This argument specifies the number of periods to be generated. tz : This argument specifies the string time zone name for returning localized DatetimeIndex. name :This argument specifies the name of the resulting DatetimeIndex. freq : This specifies a Date Offset, the default being ‘D’ for calendar daily.
For more information about other parameters in the date_range function you can visit the Pandas date_range documentation here.
import pandas as pd date_rng=pd.date_range('2017-12-01', periods=5) print(date_rng)
Output
DatetimeIndex(['2017-12-01', '2017-12-02', '2017-12-03', '2017-12-04', '2017-12-05'], dtype='datetime64[ns]', freq='D')
Creating a Series From date_range()
import pandas as pd date_rng=pd.date_range('2017-12-01', periods=5,name='Date') s=pd.Series(date_rng) print(s)
Output
0 2017-12-01 1 2017-12-02 2 2017-12-03 3 2017-12-04 4 2017-12-05 Name: Date, dtype: datetime64[ns]
Changing The Frequency
The default frequency is ‘D’ for day of calendar. We can change it to other offsets like M for month, and Y for year. Below example changes the frequency to month.
import pandas as pd date_rng=pd.date_range('2017-12-01', periods=5,name='Date',freq='M') s=pd.Series(date_rng) print(s)
Output
0 2017-12-31 1 2018-01-31 2 2018-02-28 3 2018-03-31 4 2018-04-30 Name: Date, dtype: datetime64[ns]
bdate_range()
This function returns the business date from Monday to Friday. It excludes Saturday and Sunday.
import pandas as pd date_rng=pd.bdate_range('2017-12-01', periods=7,name='Date') s=pd.Series(date_rng) print(s)
Output
0 2017-12-01 1 2017-12-04 2 2017-12-05 3 2017-12-06 4 2017-12-07 5 2017-12-08 6 2017-12-11 Name: Date, dtype: datetime64[ns]
Converting a String to a Date
import pandas as pd s=pd.Series(['2017/12/01','2017/12/02','2017/12/03','2017/12/04','2017/12/05']) print("String \n ",s) date=pd.to_datetime(s) print("\n\nDate \n ",date)
Output
String 0 2017/12/01 1 2017/12/02 2 2017/12/03 3 2017/12/04 4 2017/12/05 dtype: object Date 0 2017-12-01 1 2017-12-02 2 2017-12-03 3 2017-12-04 4 2017-12-05 dtype: datetime64[ns]
Changing The DateFormat
We can easily specify the datetime format using the format argument. Below example convert the date format from ‘Y-m-d’ to ‘Y-d-m’
import pandas as pd s=pd.Series(['2017/12/01','2017/12/02','2017/12/03','2017/12/04','2017/12/05']) print("String \n ",s) date=pd.to_datetime(s, format='%Y/%d/%m') print("\n\nDate \n ",date)
Output
String 0 2017/12/01 1 2017/12/02 2 2017/12/03 3 2017/12/04 4 2017/12/05 dtype: object Date 0 2017-01-12 1 2017-02-12 2 2017-03-12 3 2017-04-12 4 2017-05-12 dtype: datetime64[ns]
Converting Epoch Time To Timestamps
Pandas supports the conversion of epoch time to Timestamp or DatetimeIndex. Below example converts an epoch time to a DatetimeIndex.
import pandas as pd date_epoch=pd.to_datetime([1512086400000000000, 1512172800000000000, 1512259200000000000, 1512345600000000000, 1512432000000000000]) print(date_epoch)
Output
DatetimeIndex(['2017-12-01', '2017-12-02', '2017-12-03', '2017-12-04', '2017-12-05'], dtype='datetime64[ns]', freq=None)
Converting Timestamp to Epoch Time.
We can convert time from timestamp format to epoch time using the view function.
import pandas as pd date_rng=pd.date_range('2017-12-01', periods=5,name='Date') print("DatetimeIndex \n ",date_rng) epoch=date_rng.view('int64') print("\n\nEpoch time \n ",epoch)
Output
DatetimeIndex DatetimeIndex(['2017-12-01', '2017-12-02', '2017-12-03', '2017-12-04', '2017-12-05'], dtype='datetime64[ns]', name='Date', freq='D')
Epoch time [1512086400000000000 1512172800000000000 1512259200000000000 1512345600000000000 1512432000000000000]
Extracting Month and Year From Date
Below example extracts the month and year from date and creates new columns for them.
import pandas as pd date_rng=pd.date_range('2017-12-01', periods=5,freq='M') df=pd.DataFrame(date_rng,columns=['Date']) print("Original DataFrame\n",df) #create new columns for month and year df['month']=df['Date'].dt.month df['year']=df['Date'].dt.year print("\n\nNew DataFrame\n",df)
Output
Original DataFrame Date 0 2017-12-31 1 2018-01-31 2 2018-02-28 3 2018-03-31 4 2018-04-30 New DataFrame Date month year 0 2017-12-31 12 2017 1 2018-01-31 1 2018 2 2018-02-28 2 2018 3 2018-03-31 3 2018 4 2018-04-30 4 2018
Let’s use lambda and apply function to extract year from date
import pandas as pd date_rng=pd.date_range('2017-12-01', periods=5,freq='M') df=pd.DataFrame(date_rng,columns=['Date']) print("Original DataFrame\n",df) df['Date'] = df['Date'].apply(lambda x: x.strftime('%Y')) print("\nYear\n",df['Date'])
Output
Original DataFrame Date 0 2017-12-31 1 2018-01-31 2 2018-02-28 3 2018-03-31 4 2018-04-30 Year 0 2017 1 2018 2 2018 3 2018 4 2018 Name: Date, dtype: object
Now, let’s extract the month and the year using lambda and apply function
import pandas as pd date_rng=pd.date_range('2017-12-01', periods=5,freq='M') df=pd.DataFrame(date_rng,columns=['Date']) print("Original DataFrame\n",df) df['Date'] = df['Date'].apply(lambda x: x.strftime('%B-%Y'))
Output
Original DataFrame Date 0 2017-12-31 1 2018-01-31 2 2018-02-28 3 2018-03-31 4 2018-04-30 Month and Year 0 December-2017 1 January-2018 2 February-2018 3 March-2018 4 April-2018 Name: Date, dtype: object
Subtracting Dates in Pandas
In the example below we subtract the two dates and create a new column on the DataFrame. By default the difference is in days.
import pandas as pd date_1=pd.date_range('2017-12-01', periods=5,freq='M') date_2=pd.date_range('2018-03-05', periods=5,freq='M') df=pd.DataFrame([date_1,date_2],index=['A','B']).T print("Original DataFrame\n",df) #subtract A from B df['difference']=df['B']-df['A'] print("\n\nNew DataFrame\n",df)
Output
Original DataFrame A B 0 2017-12-31 2018-03-31 1 2018-01-31 2018-04-30 2 2018-02-28 2018-05-31 3 2018-03-31 2018-06-30 4 2018-04-30 2018-07-31 New DataFrame A B difference 0 2017-12-31 2018-03-31 90 days 1 2018-01-31 2018-04-30 89 days 2 2018-02-28 2018-05-31 92 days 3 2018-03-31 2018-06-30 91 days 4 2018-04-30 2018-07-31 92 days
Pandas Timedelta
Timedelta are differences in times, expressed in days, hours, minutes, or seconds. The timedelta can be either positive or negative. Creating Timedelta
Using String
We can use string to create a timedelta as shown below.
import pandas as pd time_delta=pd.Timedelta('15 days 30 minutes 45 seconds') print(time_delta)
Output
15 days 00:30:45
Using Integer
We can create a timedelta by passing an integer in the timedelta function as shown below.
import pandas as pd time_delta=pd.Timedelta(15,unit='d') print(time_delta)
Output
15 days 00:00:00
Converting Text to Timedelta
We can easily convert any valid timedelta text to a timedelta format by using to_timedelta() function. The example below demonstrates how to convert a valid timedelta text DataFrame to a timedelta column and appending it to the original DataFrame.
import pandas as pd df=pd.DataFrame(['15 days 30 minutes 45 seconds','20 days 10 minutes 55 seconds','29 days 50 minutes 03 seconds','30 days 24 minutes 07 seconds','31 days 59 minutes 59 seconds'],columns=['Timedelta Text']) print("Original DataFrame\n",df) df['Timedelta']=pd.to_timedelta(df['Timedelta Text']) print("\n\nTimedelta DataFrame\n",df)
Output
Original DataFrame Timedelta Text 0 15 days 30 minutes 45 seconds 1 20 days 10 minutes 55 seconds 2 29 days 50 minutes 03 seconds 3 30 days 24 minutes 07 seconds 4 31 days 59 minutes 59 seconds
Timedelta DataFrame Timedelta Text Timedelta 0 15 days 30 minutes 45 seconds 15 days 00:30:45 1 20 days 10 minutes 55 seconds 20 days 00:10:55 2 29 days 50 minutes 03 seconds 29 days 00:50:03 3 30 days 24 minutes 07 seconds 30 days 00:24:07 4 31 days 59 minutes 59 seconds 31 days 00:59:59
Working With Missing Date In Pandas
Sometimes we are presented with data that has missing datetime values. Most machine learning and data analysis models don’t work well with missing values. Handling missing datetime values is very important when cleaning data. Missing values in datetime data type is represented as NAT. Handling NAT is similar to handling of the NaN.
Creating a DataFrame with Missing Datetime Values.
import pandas as pd import numpy as np date_rng=pd.date_range('2017-11-01', periods=10,freq='M') df=pd.DataFrame(date_rng,columns=['Date']) df.loc[2,3,7,8]=np.nan print(df)
Output
Date 0 2017-11-30 1 2017-12-31 2 NaT 3 NaT 4 2018-03-31 5 2018-04-30 6 2018-05-31 7 NaT 8 NaT 9 2018-08-31
We can perform all the missing values operations like dropna, and fillna on NaN as well as on NAT .
Conclusion
Time is an important variable in research and statistical inference. Working with datetime data in Pandas is an important task as most of the data are analyzed in relation to time. There are two approaches of working with dates in data analysis, one of the option is to use Python datetime functionality and the other option is to use the Pandas date functionality. We have covered the basics of working with date in Pandas in this post. More advanced date functionality will be covered in other posts. What’s Next
Now that you have basics of working with date in Pandas, in the next post we will learn about the Pandas Group By functionality for splitting, applying and combining data.