Click here to Skip to main content
15,886,799 members
Please Sign up or sign in to vote.
4.00/5 (1 vote)
See more:
I'm trying to read google ngram csv files.
But I found that the contents which I read out are different from the decribed on google website.
The website:http://books.google.com/ngrams/datasets[^]

The contents I read out like:

# 1574 1 1 1
# 1584 6 6 1
# 1614 1 1 1
# 1631 115 100 1


The description of google website like:

circumvallate 1978 313 215 85
circumvallate 1979 183 147 77


well, why the '#' instead of the words?

help me!!!
thanks!!
Posted

1 solution

The description is merely an example of the contents of one of the file types, it is not an absolute description of every file. The first item in each line is an ngram, which is composed of n tokens, as clearly described in the link you posted above.
 
Share this answer
 
Comments
alohaking 23-Apr-12 4:20am    
well,if you read carefully on website,you can see the description:
"As an example, here are the 30,000,000th and 30,000,001st lines from file 0 of the English 1-grams (googlebooks-eng-all-1gram-20090715-0.csv.zip):
circumvallate 1978 313 215 85
circumvallate 1979 183 147 77
"
And the cotents I read out is from the same file.
Richard MacCutchan 23-Apr-12 4:27am    
Which lines did you read?
alohaking 23-Apr-12 4:30am    
thank you very much:)
alohaking 23-Apr-12 4:29am    
Ok, I made it.As you said.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900