Click here to Skip to main content
15,898,134 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hello everyone,

I have a Pandas data frame with an index that resembles something like this:

ENSG000005768.17

I would like to remove the decimal and everything after it.

What I have tried:

So far, I have tried the following:

data_frame = data_frame['Gene_Id'].index.strip('.*')

I am very new to Pandas. Any help would be greatly appreciated. Thanks.
Posted
Updated 9-Jan-21 4:23am

1 solution

Use the Python string split method:
Python
parts = data_frame['Gene_Id'].index.split('.')
data_frame = parts[0]
 
Share this answer
 
Comments
B. Copeland 9-Jan-21 11:05am    
Thanks for your reply. I tried your method and got the following traceback:

KeyError Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2888 try:
-> 2889 return self._engine.get_loc(casted_key)
2890 except KeyError as err:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Gene_Id'

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last)
<ipython-input-12-f2c9b0ad6f0a> in <module>
----> 1 parts = data_frame['Gene_Id'].index.split('.')
2 data_frame = parts[0]
3
4
5

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2900 if self.columns.nlevels > 1:
2901 return self._getitem_multilevel(key)
-> 2902 indexer = self.columns.get_loc(key)
2903 if is_integer(indexer):
2904 indexer = [indexer]

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2889 return self._engine.get_loc(casted_key)
2890 except KeyError as err:
-> 2891 raise KeyError(key) from err
2892
2893 if tolerance is not None:

KeyError: 'Gene_Id'

My data frame looks like this:

Capture — ImgBB[^]
Richard MacCutchan 9-Jan-21 11:20am    
The message is saying that "Gene_Id" is not a valid key. You need to look at the content of the data_frame variable at that point. It may be an idea to use a different variable name for the result of the field extraction.
B. Copeland 9-Jan-21 12:26pm    
Thanks for the help. I don't know where I've gone wrong. When I printed 'data_frame.head()' I thought that the Gene_Id entry was the index, but I guess that I am wrong. Thanks again for the help.
B. Copeland 10-Jan-21 12:13pm    
Thanks again...based on your suggestion, I finally figured it out.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900