Input data:
{"id": "188325809", "version": "HEAD", "created_at": "2019-08-08T19:32:04Z", "contact": "{\"id\": \"188325809\", \"phone\": [{\" number\": \"77730455\", \" code\": \"353\", \" altNo\": false}, {\" number\": \"77730466\", \" code\": \"353\", \" altNo\": false}], \"fax\": [{\"faxNumber\": \"77730998\", \" code\": \"353\"}, {\"faxNumber\": \"77730889\", \" code\": \"353\"}]}"}
I
Use Pyspark to
Flatten it out like ,
{id: "188325809", "version": "HEAD","created_at": "2019-08-08T19:32:04Z","number": "77730455", "code": "353","altno":"false","faxNumber": "77730998","code": "353"}
{id: "188325809", "version": "HEAD","created_at": "2019-08-08T19:32:04Z","number": "77730466", "code": "353","altno":"false","faxNumber": "77730889","code": "353"}
What I have tried:
ds.printSchema()
root
|-- created_at: string (nullable = true)
|-- id: string (nullable = true)
|-- contact: string (nullable = true)
|-- version: string (nullable = true)
df.withColumn('contact',explode(split('contact','number'))).show()
+--------------------+---------+--------------------+-------+
| created_at| id| contact |version|
+--------------------+---------+--------------------+-------+
|2019-08-08T19:32:04Z|188325809|{"id": "1883258... | HEAD|
|2019-08-08T19:32:04Z|188325809|": "77730455", "i...| HEAD|
|2019-08-08T19:32:04Z|188325809|": "77730466", "i...| HEAD|
+--------------------+---------+--------------------+-------+
df.withColumn('contact',explode(split('contact',' '))).show()
+--------------------+---------+--------------------+-------+
| created_at| id | contact |version|
+--------------------+---------+--------------------+-------+
|2019-08-08T19:32:04Z|188325809| {"id": | HEAD|
|2019-08-08T19:32:04Z|188325809| "188325809",| HEAD|
|2019-08-08T19:32:04Z|188325809| "phone": | HEAD|
|2019-08-08T19:32:04Z|188325809| [{"number": | HEAD|
|2019-08-08T19:32:04Z|188325809| "77730455",| HEAD|
|2019-08-08T19:32:04Z|188325809| "code": | HEAD|
|2019-08-08T19:32:04Z|188325809| "353",| HEAD|
|2019-08-08T19:32:04Z|188325809| "altno":| HEAD|
|2019-08-08T19:32:04Z|188325809| false},| HEAD|
|2019-08-08T19:32:04Z|188325809| {"number": | HEAD|
|2019-08-08T19:32:04Z|188325809| "77730466",| HEAD|
|2019-08-08T19:32:04Z|188325809| "code": | HEAD|
|2019-08-08T19:32:04Z|188325809| "353",| HEAD|
|2019-08-08T19:32:04Z|188325809| "altno": | HEAD|
|2019-08-08T19:32:04Z|188325809| false}],| HEAD|
|2019-08-08T19:32:04Z|188325809| "fax":| HEAD|
|2019-08-08T19:32:04Z|188325809| [{"faxNumber":| HEAD|
|2019-08-08T19:32:04Z|188325809| "77730998",| HEAD|
|2019-08-08T19:32:04Z|188325809| "code": | HEAD|
|2019-08-08T19:32:04Z|188325809| "353"},| HEAD|
+--------------------+---------+--------------------+-------+
only showing top 20 rows