Click here to Skip to main content
15,746,107 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi guys,

I have a thousands rows with 106 columns. The first column (chromosome and location) just contains a chromosome and location but can be duplicated whereas the rest of the columns range from 1-105 in which it correspond to the sample number. If the sample has a certain chromosome and location then, I want to add the number one to that cell so that at the end I will calculate the sum of each sample that has one in it. The problem I am having tough time to program in Python is how can I write this to a file if the same key appear more than once of different sample. How can I add the number one to that cell so I can get the sum later on.

Thanks a lot in advance,

The code I have so far is found below:

 with open(os.path.join(file_out+".txt"),'w') as outpt:

 dic = defaultdict(list)
  outpt.write("chrom_pos"+"\t"+"\t".join(samp_num)+ "\t"+"\n")
  for k ,val in dic.iteritems():      # k is the chromosome:location. val is the sample number 1 out 105
    for  v in val:     
        outpt_TSS.write(int(k)*("\t")+ str(1)+'\n')   # This will have duplicates chrome_pos and I don't want that, I want one chrome_pos with number ones corresponding to multiple samples.

1 solution

write val to a new array and with next, verify if already exist in that list then skip.
Share this answer

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900