Spark Cassandra connector with Java for read

Requirement :- I have persisted data in cassandra and on hourly basis i need to calculate some score based on updates happening to records.I see data coming correctly by using the show() method on dataset

Below code to read data :-

Dataset<DealFeedSchema> dealFeedSchemaDataset = session.read()      .format(Constants.SPARK_CASSANDRA_SOURCE_PATH)      .option(Constants.KEY_SPACE, Constants.CASSANDRA_KEY_SPACE)      .option(Constants.TABLE, Constants.CASSANDRA_DEAL_TABLE_SPACE)      .option(Constants.DATE_FORMAT, "yyyy-MM-dd HH:mm:ss")      .schema(DealFeedSchema.getDealFeedSchema())      .load()      .as(Encoders.bean(DealFeedSchema.class)); dealFeedSchemaDataset.show(); 

output of show is below:

+-------+----------+-------------+--------------------+-----------+------------+----------+------------------------+---------------+-----------+-------------------+-------------------+----------+------------+-------------------+----------------+-------------------+-------------+----------+--------------------+----------+-------------------------+---------------+----------------+---------------+--------------+--------------+-----+ |deal_id| deal_name|deal_category|           deal_tags|growth_tags|deal_tag_ids|deal_price|deal_discount_percentage|deal_group_size|deal_active|    deal_start_time|        deal_expiry|product_id|product_name|product_description|product_category|product_category_id|product_price|hero_image|      product_images| video_url|video_thumbnail_image_url|deal_like_count|deal_share_count|deal_view_count|deal_buy_count|weighted_score|boost| +-------+----------+-------------+--------------------+-----------+------------+----------+------------------------+---------------+-----------+-------------------+-------------------+----------+------------+-------------------+----------------+-------------------+-------------+----------+--------------------+----------+-------------------------+---------------+----------------+---------------+--------------+--------------+-----+ |      4|7h12349961|          mqw|[under999, under3...|         []|          []|    4969.0|                    null|       95166551|          1|2020-07-08 14:48:57|2020-07-18 14:48:57|4725457233|  kao62ggnm7|         32h64e356z|      jnnh29zr1f|               null|       6651.0|86kk7s34yr|[dSt4P79, i4WXOHb...|d6tag27924|               4j1l36lp17|           null|            null|           null|          null|          null| null|  

So here’s weired thing that happens when i use map/foreach on dealFeedSchemaDataset the data seems not correct i get the column value of deal_start_time as current system time something like below, not sure how this gets changed.

even below line gives same issue:

dealFeedSchemaDataset.select(       functions.col("deal_start_time")).as(Encoders.bean(DateTime.class)) .collectAsList().forEach(schema -> System.out.println(schema)); 
2020-07-10T20:21:47.895+05:30 

can someone help me with what am i doing wrong?

Add Comment
0 Answer(s)

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.