Click here to Skip to main content
13,764,207 members
Rate this:
 
Please Sign up or sign in to vote.
See more:
Hi,
I have multiple files which I need to format them to be one JSON file:

Here is an example of a file which I have:
{"t": "test", "title": "test", "content": "test"}
{"t": "test2", "title": "test", "content": "test2"}


What I need is to be like:
[
{"t": "test", "title": "test", "content": "test"},
{"t": "test2", "title": "test2", "content": "test2"}
]


It will look like this in the end (Note that the below JSON similar to the Above, second, one):
[{
		"t": "test",
		"title": "test",
		"content": "test"
	},
	{
		"t": "test2",
		"title": "testw",
		"content": "test2"
	}
]


What I have tried:

I have the below python code:

import io
import os
import json

def wrap_files_in_dir(dirname):


    data = {}

    list_of_reviews = []


    for filename in os.listdir(dirname):
        file_path = os.path.join(dirname, filename)
        if os.path.isfile(file_path):
            with io.open(file_path, 'r', encoding='utf-8', errors='ignore') as rfile:
                contents = rfile.read()
                list_of_reviews.append(contents)




    with io.open('AppStoreReviews.json', 'w', encoding='utf-8' , errors='ignore') as wfile:
        data["reviews"] = list_of_reviews
        wfile.write(unicode(json.dumps(data, ensure_ascii=False)))


if __name__ == '__main__':
    wrap_files_in_dir('/Users/Jack/PycharmProjects/MyProject')

print("Your Reviews marged and converted to JSON")



I know that I'm missing some code here which enter to each file in my directory.. or could it be something else?
Can someone help me with this?

the problem is that the code does create me a JSON file, BUT, the JSON is without any arrays and doesn't have any "," between my objects.
Posted 4 days ago
Updated 2 days ago
v3
Comments
CHill60 4 days ago
   
What is the problem?
Jack Raif 4 days ago
   
I don't know what am I missing in my code.
Jack Raif 4 days ago
   
Let me be more specific, the problem is that the code do create me a JSON file, BUT, the JSON is without any arrays and doesn't have any "," between my objects.
Patrice T 3 days ago
   
Show actual result
Richard MacCutchan 2 days ago
   
See my updated solution.
Rate this: bad
 
good
Please Sign up or sign in to vote.

Solution 1

The following code produces (I hope) what you are looking for:
import io
import os
import json

def wrap_files_in_dir(dirname):
    data = {}
    list_of_reviews = []

    for filename in os.listdir(dirname):
        file_path = os.path.join(dirname, filename)
        if os.path.isfile(file_path):
            with io.open(file_path, 'r', encoding='utf-8', errors='ignore') as rfile:
                review = json.load(rfile)
                list_of_reviews.append(review)

    with io.open('AppStoreReviews.json', 'w', encoding='utf-8' , errors='ignore') as wfile:
        json.dump(list_of_reviews, wfile, indent=2)

if __name__ == '__main__':
    wrap_files_in_dir('C:\\Users\\rjmac\\Documents\\_Python\\JFiles')

print("Your Reviews marged and converted to JSON")
  Permalink  
v4
Comments
Jack Raif 4 days ago
   
Great! but Im Still missing an "," between my objects..
Richard MacCutchan 4 days ago
   
OK, I give up. Where is it?
Jack Raif 4 days ago
   
I think I know what im missing, but i need help implementing it:

list_of_reviews = [] DONE

for file in directory: - DONE

for line in file: - MISSING

review = line.parsfromjson () - MISSING
list_of_reviews.append(review) - MISSING
Richard MacCutchan 4 days ago
   
Reading a file line by line is easy, see section 7.2.1 at 7. Input and Output — Python 3.7.1 documentation[^].

The last two lines that you say are missing should be easy to add in the reading loop. Although I am not sure what you are trying to parse from JSON, it all depends on the data you are processing.
Richard MacCutchan 3 days ago
   
See my updated solution.
Jack Raif 3 days ago
   
Thanks, Richard, but my files still start like that: ["{
I granted access for you so you can see the files content before running the script:

https://drive.google.com/drive/folders/1YUnesBEI8ZzDOVHmKt6MU3wuAWfUnBVf?usp=sharing
Richard MacCutchan 3 days ago
   
Not the files that you uploaded. For example file reviews-20181023.log contains the following:
{"t": "23/Oct/2018:09:40:39 -07:00", "title": "A great resource undercut by terrible management", "content": "I found Houzz as a resource for inspiration pictures in building a new house, but I quickly discovered that it held even more value in the forums, especially those that originated in GardenWeb, which were filled with long-time members and experienced professionals who gave their advice for free. Unfortunately, Houzz\u2019s terrible management decisions have driven away many of those people. They\u2019re destroying much of what made the site and app unique and valuable to me as a homebuilder.", "title_en": "A great resource undercut by terrible management", "content_en": "I found Houzz as a resource for inspiration pictures in building a new house, but I quickly discovered that it held even more value in the forums, especially those that originated in GardenWeb, which were filled with long-time members and experienced professionals who gave their advice for free. Unfortunately, Houzz\u2019s terrible management decisions have driven away many of those people. They\u2019re destroying much of what made the site and app unique and valuable to me as a homebuilder.", "id": "3335244003", "voteSum": "0", "voteCount": "0", "rating": "2", "version": "18.10.0", "author": "KrisABS123", "country": "US", "store": "iTunes", "app_name": "houzz"}
Jack Raif 3 days ago
   
Yes, and still the script adds an array like that: [" to the beginning and the end "] instead just adding the arrays and remain the objects inside: [{}]
Richard MacCutchan 3 days ago
   
Sorry, but I don't understand what you are saying. When I checked the inputs and what is generated in the output file it is correctly creating a single array with the other lines inside, in the form:
[{"first line ..."}, {"next line ..."}, {"and so on ..."}]

What exactly are you getting?
Jack Raif 3 days ago
   
Strange, my output looks like this:

["{"first line..."}", "{"Next line..."}", "{"and so on..."}"]
Richard MacCutchan 3 days ago
   
Well that is correct. When you use json.load to read it back it will be deserialised correctly.
Jack Raif 3 days ago
   
Do you have any solution?
Richard MacCutchan 3 days ago
   
Sorry, but I don't really understand what your problem is, or what you are trying to do. The data is correct as far as I can see.
Jack Raif 3 days ago
   
After I marge the files to one JSON file im uploading that JSON to DropBox and then use a Google Sheet (=ImportJSON()) to view the reviews.


The issue is that the script we are talking about creating the whole objects as a string inside the array:

["{"first line..."}", "{"Next line..."}", "{"and so on..."}"]

Richard MacCutchan 3 days ago
   
I think your problem is that you are reading the log files as text strings. But the content is already encoded as JSON, so you should be using json.load to read them.
Jack Raif 3 days ago
   
Is my code is correct?

def wrap_files_in_dir(dirname):




list_of_reviews = []


for filename in os.listdir(dirname):
file_path = os.path.join(dirname, filename) ###!
if os.path.isfile(file_path):
with io.open(file_path, 'r', encoding='utf-8', errors='ignore') as rfile:
for line in rfile:
list_of_reviews.append(line.rstrip())



with io.open('AppStoreReviews.json', 'w', encoding='utf-8', errors='ignore') as wfile:
wfile.write(unicode(json.dumps(list_of_reviews, ensure_ascii=False)))


if __name__ == '__main__':
wrap_files_in_dir('/Users/Projects/PycharmProjects/json')
Jack Raif 3 days ago
   
Can it be generated like this?

[{
"t": "17/Oct/2018:05:55:58 -07:00",
"title": "Publicit\u00e9 mensong\u00e8re",
"content": "Je voulais tester l\u2019app pour sa fonction de r\u00e9alit\u00e9 augment\u00e9e permettant de visualiser les meubles dans notre int\u00e9rieur. Cette fonction est compl\u00e8tement diff\u00e9rente de la version montr\u00e9e dans la vid\u00e9o et compl\u00e8tement inutile ! On ne peut pas tourner autour des objets, les meubles sont mal d\u00e9tour\u00e9s... \ntr\u00e8s d\u00e9\u00e7ue",
"title_en": "False advertising",
"content_en": "I wanted to test the app for its augmented reality feature to visualize the furniture in our interior. This function is completely different from the version shown in the video and completely useless ! You can't turn around the objects, the furniture is poorly d\u00e9tour\u00e9s... \nvery disappointed",
"id": "3312163433",
"voteSum": "0",
"voteCount": "0",
"rating": "1",
"version": "18.10.0",
"author": "Charlotte V.",
"country": "FR",
"store": "iTunes",
"app_name": "test"
},
{
"t": "17/Oct/2018:05:55:58 -07:00",
"title": "Publicit\u00e9 mensong\u00e8re",
"content": "Je voulais tester l\u2019app pour sa fonction de r\u00e9alit\u00e9 augment\u00e9e permettant de visualiser les meubles dans notre int\u00e9rieur. Cette fonction est compl\u00e8tement diff\u00e9rente de la version montr\u00e9e dans la vid\u00e9o et compl\u00e8tement inutile ! On ne peut pas tourner autour des objets, les meubles sont mal d\u00e9tour\u00e9s... \ntr\u00e8s d\u00e9\u00e7ue",
"title_en": "False advertising",
"content_en": "I wanted to test the app for its augmented reality feature to visualize the furniture in our interior. This function is completely different from the version shown in the video and completely useless ! You can't turn around the objects, the furniture is poorly d\u00e9tour\u00e9s... \nvery disappointed",
"id": "3312163433",
"voteSum": "0",
"voteCount": "0",
"rating": "1",
"version": "18.10.0",
"author": "Charlotte V.",
"country": "FR",
"store": "iTunes",
"app_name": "test"
}
]
Rate this: bad
 
good
Please Sign up or sign in to vote.

Solution 2

The easiest fix for this is to pre-pend a comma on every line but the first.
A method to do this would be to set a variable to know if the current line is the first, and then implement an if...then block

Unfortunately, I know Python just as much as I know Klingon... so you will probable need to massage the changes to the code changes I made
# create boolean variable for "IsFirst"
IsFirst = true

for filename in os.listdir(dirname):
   file_path = os.path.join(dirname, filename)
      if os.path.isfile(file_path):
         with io.open(file_path, 'r', encoding='utf-8', errors='ignore') as rfile:

            # If IsFirst, add contents as is, and set IsFirst to false
            if (IsFirst == true) 
               contents = rfile.read()
               IsFirst = false
            # if NOT IsFirst, prepend a comma to the contents

            else 
               contents = "," + rfile.read()

            list_of_reviews.append(contents)
  Permalink  
Comments
Richard MacCutchan 4 days ago
   
The JSON system will put the commas in the correct places.
MadMyche 4 days ago
   
Well for whatever reason, this is not happening according to the OP
Richard MacCutchan 4 days ago
   
Works fine in my test. And as so often, we do not get the full details.
Richard MacCutchan 4 days ago
   
And if you look at the results shown in the original questions, all the commas are there in the correct places.
MadMyche 4 days ago
   
You mean this one?

Here is an example of a file which I have:
{"t": "test", "title": "test", "content": "test"}
{"t": "test2", "title": "test", "content": "test2"}
Richard MacCutchan 4 days ago
   
Well I can't be sure as the question is not clear. I presume that is the text from the input file(s). S/He reads each line of the files and creates a list by repeatedly calling list_of_reviews.append(contents).

Like I said, when I create this it produces the correct result. It is not clear exactly what the OP's code produces.
MadMyche 4 days ago
   
I'm not either, and not knowing python does not help. Perhaps it is file parsing is a string, maybe OP needs a " date = json.load(file) " line
Richard MacCutchan 4 days ago
   
No, I think the creation of the list is working fine, as I checked with my test program. And also the generated JSON came out correctly. We need some more, and more specific, details from the OP.
Jack Raif 4 days ago
   
Ok, I'm adding the directory of the files which I have, maybe you will determine where is the issue?

https://drive.google.com/drive/folders/1YUnesBEI8ZzDOVHmKt6MU3wuAWfUnBVf?usp=sharing
Jack Raif 4 days ago
   
Those are the files which I want to marge to one JSON
Rate this: bad
 
good
Please Sign up or sign in to vote.

Solution 3

Found the Solution:

list_of_reviews = []

   for filename in os.listdir(dirname):
       if not filename.startswith('.'):
           file_path = os.path.join(dirname, filename)
           if os.path.isfile(file_path):
        with io.open(file_path, 'r', encoding='utf-8', errors='ignore') as rfile:
                   lines = rfile.readlines()
                   for line in lines:
                       line = line.rstrip()
                       review = json.loads(line)
                       list_of_reviews.append(review)


THANKS To every one which helped!
  Permalink  
v2
Comments
Richard MacCutchan 2 days ago
   
Why do the lines, just to convert a string to json, when you can load into json directly? See my updated solution.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Top Experts
Last 24hrsThis month


Advertise | Privacy | Cookies | Terms of Service
Web01-2016 | 2.8.181113.4 | Last Updated 11 Nov 2018
Copyright © CodeProject, 1999-2018
All Rights Reserved.
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100