Click here to Skip to main content
15,887,135 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Input data:
Python
{"id": "188325809", "version": "HEAD", "created_at": "2019-08-08T19:32:04Z", "contact": "{\"id\": \"188325809\", \"phone\": [{\" number\": \"77730455\", \" code\": \"353\", \" altNo\": false}, {\" number\": \"77730466\", \" code\": \"353\", \" altNo\": false}], \"fax\": [{\"faxNumber\": \"77730998\", \" code\": \"353\"}, {\"faxNumber\": \"77730889\", \" code\": \"353\"}]}"}

I
Use Pyspark to
Flatten it out like ,
Python
{id: "188325809", "version": "HEAD","created_at": "2019-08-08T19:32:04Z","number": "77730455", "code": "353","altno":"false","faxNumber": "77730998","code": "353"}
{id: "188325809", "version": "HEAD","created_at": "2019-08-08T19:32:04Z","number": "77730466", "code": "353","altno":"false","faxNumber": "77730889","code": "353"}


What I have tried:

Python
ds.printSchema()


Python
root
 |-- created_at: string (nullable = true)
 |-- id: string (nullable = true)
 |-- contact: string (nullable = true)
 |-- version: string (nullable = true)



Python
df.withColumn('contact',explode(split('contact','number'))).show()

+--------------------+---------+--------------------+-------+
|          created_at|     id|        contact       |version|
+--------------------+---------+--------------------+-------+
|2019-08-08T19:32:04Z|188325809|{"id": "1883258...  |   HEAD|
|2019-08-08T19:32:04Z|188325809|": "77730455", "i...|   HEAD|
|2019-08-08T19:32:04Z|188325809|": "77730466", "i...|   HEAD|
+--------------------+---------+--------------------+-------+


Python
df.withColumn('contact',explode(split('contact',' '))).show()


+--------------------+---------+--------------------+-------+
|          created_at|     id  |        contact     |version|
+--------------------+---------+--------------------+-------+
|2019-08-08T19:32:04Z|188325809|            {"id":  |   HEAD|
|2019-08-08T19:32:04Z|188325809|        "188325809",|   HEAD|
|2019-08-08T19:32:04Z|188325809|        "phone":    |   HEAD|
|2019-08-08T19:32:04Z|188325809|        [{"number": |   HEAD|
|2019-08-08T19:32:04Z|188325809|         "77730455",|   HEAD|
|2019-08-08T19:32:04Z|188325809|          "code":   |   HEAD|
|2019-08-08T19:32:04Z|188325809|              "353",|   HEAD|
|2019-08-08T19:32:04Z|188325809|            "altno":|   HEAD|
|2019-08-08T19:32:04Z|188325809|             false},|   HEAD|
|2019-08-08T19:32:04Z|188325809|         {"number": |   HEAD|
|2019-08-08T19:32:04Z|188325809|         "77730466",|   HEAD|
|2019-08-08T19:32:04Z|188325809|          "code":   |   HEAD|
|2019-08-08T19:32:04Z|188325809|              "353",|   HEAD|
|2019-08-08T19:32:04Z|188325809|          "altno":  |   HEAD|
|2019-08-08T19:32:04Z|188325809|            false}],|   HEAD|
|2019-08-08T19:32:04Z|188325809|              "fax":|   HEAD|
|2019-08-08T19:32:04Z|188325809|      [{"faxNumber":|   HEAD|
|2019-08-08T19:32:04Z|188325809|         "77730998",|   HEAD|
|2019-08-08T19:32:04Z|188325809|          "code":   |   HEAD|
|2019-08-08T19:32:04Z|188325809|             "353"},|   HEAD|
+--------------------+---------+--------------------+-------+

only showing top 20 rows
Posted
Updated 18-Mar-20 14:06pm
v2

1 solution

Quote:
only showing top 20 rows

What is in your input that is not in the 20 rows of output ?
Describe your problem with more details.
 
Share this answer
 
v2
Comments
sslearnsda 19-Mar-20 0:04am    
The input is in the form of JSON string. deeply nested. The problem is to read the string and parse it to create a flattened structure. I need help to parse this string and implement a function similar to "explode" in Pyspark.
Patrice T 19-Mar-20 1:42am    
Use Improve question to update your question.
So that everyone can pay attention to this information.

You need to explain what is wrong in all the data you show.
You need to show what you expect.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900