How to convert from Pandas' DatetimeIndex to DataFrame in PySpark?

user1761806

I have the following code:

# Get the min and max dates
minDate, maxDate = df2.select(f.min("MonthlyTransactionDate"), f.max("MonthlyTransactionDate")).first()
d = pd.date_range(start=minDate, end=maxDate, freq='MS')    

tmp = pd.Series(d)
df3 = spark.createDataFrame(tmp)

I have checked tmp and a I have a pandas dataframe of a list of dates. I then check df3 but it looks like lit's just an empty list:

++ 
|| 
++ 
|| 
|| 
|| 
|| 
|| 
|| 
|| 
||

What's happening?

neeraj bhadani

In your case d is DatetimeIndex. What you can do is create pandas DataFrame from DatetimeIndex and then convert Pandas DF to spark DF. PFB Sample code.

1. Create DatetimeIndex

import pandas as pd
d = pd.date_range('2018-12-01', '2019-01-02', freq='MS')

2. Create Pandas DF.

p_df = pd.DataFrame(d)

3. Create Spark DataFrame.

spark.createDataFrame(p_df).show()

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How to convert a dictionary to dataframe in PySpark?

How do I properly set the Datetimeindex for a Pandas datetime object in a dataframe?

How do i slice pandas Series with DatetimeIndex and put it in a DataFrame by rows?

Pandas: How to extract Datetime ranges from DatetimeIndex

Pandas - how to convert RangeIndex into DateTimeIndex

Pandas DatetimeIndex to dataframe

How to create a nested dictionary from pandas dataframe and again convert it to dataframe?

How to convert pandas dataframe to pyspark dataframe which has attribute to rdd?

Slicing Pandas DataFrame with DatetimeIndex

Convert a pandas dataframe to a PySpark dataframe

Conflicting DatetimeIndex in pandas DataFrame

Interpolating from a pandas DataFrame or Series to a new DatetimeIndex

How to convert from pandas dataframe to a dictionary

Convert TimeIndex of dataframe to DateTimeIndex in place

Convert Variable Type from DataFrame to DatetimeIndex

How to fill values from a pandas dataframe to another dataframe with different datetimeindex

How to remove microseconds from DateTimeIndex in dataframe in Python?

Creating pandas DatetimeIndex in Dataframe from DST aware datetime objects

How to convert the expression iloc from pandas to Pyspark Dataframe?

How to change a DateTimeIndex in a pandas dataframe to all the same year?

Pandas DataFrame from DateTimeIndex to minutes or hours incremental counter

Use date field from MongoDB list as DatetimeIndex in Pandas DataFrame

How to elegantly create a pyspark Dataframe from a csv file and convert it to a Pandas Dataframe?

How to update the value of DatetimeIndex of a single row in a pandas DataFrame?

Pyspark DataFrame - How to convert one column from categorical values to int?

How to convert string value to DateTimeIndex for pandas range selection using between?

Pandas how to inner merge DateTimeIndex of a DataFrame with a date column of another DataFrame?

Convert MultiIndex to DatetimeIndex in grouped dataframe

How to convert json from url into pandas dataframe?