Click here to Skip to main content
15,888,047 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi all,

I load a pd df that has a column containing file name versions and I need to extract the version (removing the letters).
Column:
Filename
filename vs2.0
filename vs2.1.1
filename vs2.2.1
filename vs2.3.3

Desired:
2.0
2.1.1
2.2.1
2.3.3



Thanks!

What I have tried:

I tried:
df['VS'] = df['VS'].str.extract('(\d+\.\d+)', expand=False) 
but I get the first 2 digits of the version, and not the thirds.
Posted
Updated 18-Jan-23 21:35pm

Try this:
RegEx
(\d+(\.\d+)+)


If you are going to use regular expressions, you need a helper tool. Get a copy of Expresso[^] - it's free, and it examines and generates Regular expressions.
 
Share this answer
 
Your expression says capture one or more digits, followed by a period. followed by one or more digits. So it will only capture 2.0, 2.1 etc.

Try this:
    (\d+(\.\d+)*)
      ^ ^ ^  ^ ^
      1 2 a  b c
1. One or more digits followed by
2. A group containing
  a. A single period (dot) followed by
  b. One or more digits
  c. Repeated any number of times
 
Share this answer
 
v2
Comments
Mihnea Nichifor 18-Jan-23 13:46pm    
It still gives the same output without the last ".digit"
Richard MacCutchan 19-Jan-23 4:27am    
It works for me in Python, and elsewhere, using the standard regex package. Check the pandas documentation to see if they use a different syntax.
Richard MacCutchan 19-Jan-23 5:06am    
I just tied this in pandas and managed to get the correct result. Try the following:
df['VS'] = df['VS'].str.extract('(\d+(\.\d+)*)', expand=False)[0]

Without the [0] at the end it returns two fields for some reason, the second being the final dot and digit.
Try this:

df['VS'] = df['VS'].str.extract("([\d.\d.\d]+)")
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900