How do I replace a string of special characters in a file using (ubuntu) bash

Question

0.00/5 (No votes)

See more:

Hi All,

I am having a lot of problems to do something, which I think should be trivial, but I fail to find the right solution.

The problem is this: I have a csv text file ( without any control over the content ) with some lines in it that end with "\t\t\r\n" whereas they should end with "\r\n" instead. The extra '\t' characters cause problems when trying to import the text file in MySQL.

I thought SED could take care of that but I have not found anything that really works. Hours of Googling has not helped either.

Any help/suggestion would be very much appreciated.

What I have tried:

I have gotten far enough to be able to find a method to declare two strings as follows:

mytest=$'\t\t\r\n'
mytest=$'\r\n'

When I echo those string to a file as follows:

echo "mytest"+"mytest2" > bin.txt

I get the bin.txt file and sure enough it has the expected content. ( "\t\t\r\n+\r\n" ).

What I can't find until now is to get the sed command to properly use the strings to replace the occurrence of mytest with the content of mytest2 in a file.

Posted 13-Nov-20 2:29am

fd9750

Updated 13-Nov-20 3:29am

Add a Solution

2 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Richard MacCutchan · Accepted Answer · 2020-11-13T03:03:00

Solution 1

Try something like:

cat sourcefile | sed 's/\t//g' > destfile

That should replace every occurrence of '\t' with nothing, effectively removing them.

Posted 13-Nov-20 3:03am

Richard MacCutchan

Comments

k5054 13-Nov-20 9:21am

Not sure why you would use cat when sed knows how to read from a file

sed -e 's/\t//g' sourcefile > destfile

You could also do this in place:

sed -i -e 's/\t//g' sourcefile

fd9750 13-Nov-20 9:29am

Hi,
I have tried innumerable variants on that but it never succeeded.
In the mean time I have found a way to do it with a python script.

Richard MacCutchan 13-Nov-20 9:30am

I tried it with my suggestion and it worked fine.

k5054 13-Nov-20 9:32am

works for me too ... Maybe the op actually has '\' followed by 't' to replace, rather than tabs?

Richard MacCutchan 13-Nov-20 9:33am

I wondered about that too.

Richard MacCutchan 13-Nov-20 9:30am

Mainly because I could not remember all the options for sed, and could not be bothered to look them up.

fd9750 · Accepted Answer · 2020-11-13T03:29:00

#!/usr/bin/python
# replace.py
import sys

# Replace string in a file (in place)
match=b'\t\t\r\n'
replace=b'\r\n'
filename='MyTestFile.txt'

print ("Replacing strings in",filename)

with open(filename,"rb") as f:
  data = f.read().replace(match,replace)

with open(filename,"wb") as f:
  f.write(data)

The trick is to open the file as a binary file, specify binary match and replace strings and write te file back as a binary file: works like a charm.