sed / awk complex line replacement

I want to replace thousands lines like this, but I’m having a hard time trying to make it work, also I have 2 variables $time and $date condition, to not make it global.:

Example: <!-- 2020-07-06 16:45:00 WEST / 1594050300 --> <row><v>5.0000000000e+00</v></row>

To replace: <!-- 2020-07-06 16:45:00 WEST / 1594050300 --> <row><v>NaN</v></row>

I tried with sed:

sed -i '<!-- 2020-07-06 16:45:00 WEST \/ 1594050300 --> <row><v>5.0000000000e+00<\/v><\/row>.*/<!-- 2020-07-06 16:45:00 WEST \/ 1594050300 --> <row><v>NaN<\/v><\/row>/' dump_teste.xml 

sed: -e expression #1, char 1: unknown command: `<‘

Also with awk:

awk '{gsub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1' tmp.txt     awk: cmd. line:1: {gsub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1     awk: cmd. line:1:                                                     ^ syntax error     awk: cmd. line:1: {gsub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1     awk: cmd. line:1:                                                                               ^ syntax error     awk: cmd. line:1: {gsub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1     awk: cmd. line:1:                                                                                                                                         ^ syntax error     awk: cmd. line:1: {gsub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1     awk: cmd. line:1:                                                                                                                                                      ^ syntax error     awk: cmd. line:1: {gsub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1     awk: cmd. line:1:                                                                                                                                                                ^ unterminated string     awk: cmd. line:1: {gsub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1     awk: cmd. line:1:                                                                                                                                                                ^ syntax error 

or

awk '{sub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1' tmp.txt awk: cmd. line:1: {sub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1 awk: cmd. line:1:                                                    ^ syntax error awk: cmd. line:1: {sub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1 awk: cmd. line:1:                                                                              ^ syntax error awk: cmd. line:1: {sub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1 awk: cmd. line:1:                                                                                                                                        ^ syntax error awk: cmd. line:1: {sub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1 awk: cmd. line:1:                                                                                                                                                     ^ syntax error awk: cmd. line:1: {sub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1 awk: cmd. line:1:                                                                                                                                                               ^ unterminated string awk: cmd. line:1: {sub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1 awk: cmd. line:1:                                                                                                                                                               ^ syntax error 
Add Comment
2 Answer(s)

As per your need below is a command to replace number with NAN in file considering all lines that fall in time range irrespective of order in which lines appear.

set date from and till variables and then below command  while IFS= read -r in; do out="$(echo "$in" | awk '{print $2}')" && outtime="$(echo "$in" | awk '{print $3}')" && sed -i "/"$out" "$outtime"/ s/<v>.*<\/v>/<v>NAN<\/v>/" dumpteste.xml; done <<< "$(sort -k3 -k4 -k5 dumpteste.xml | awk -v date="$date" -v from="$from" -v till="$till" '$2 == date && $3 >= from && $3 <= till' | tac)" 

Example of above command

cat dumpteste.xml         #original file <!-- 2020-07-06 16:45:00 WEST / 1594050300 --> <row><v>5.0000000000e+00</v></row> <!-- 2020-07-06 16:47:00 WEST / 1594050300 --> <row><v>5.0000000000e+00</v></row> <!-- 2020-07-06 17:47:00 WEST / 1594050300 --> <row><v>5.0000000000e+00</v></row> <!-- 2020-07-06 16:45:00 WEST / 1594050300 --> <row><v>5.0000000000e+00</v></row> <!-- 2020-07-06 16:48:00 WEST / 1594050300 --> <row><v>5.0000000000e+00</v></row> <!-- 2020-07-06 17:45:00 WEST / 1594050300 --><row<v>5.0000000000e+00</v></row> <!-- 2020-08-06 16:45:00 WEST / 1594050300 --> <row><v>5.0000000000e+00</v></row>    date=2020-07-06 from=16:45:00 till=17:45:00 
 Output   cat dumpteste.xml      #after change  <!-- 2020-07-06 16:45:00 WEST / 1594050300 --> <row><v>NAN</v></row> <!-- 2020-07-06 16:47:00 WEST / 1594050300 --> <row><v>NAN</v></row> <!-- 2020-07-06 17:47:00 WEST / 1594050300 --> <row><v>5.0000000000e+00</v></row> <!-- 2020-07-06 16:45:00 WEST / 1594050300 --> <row><v>NAN</v></row> <!-- 2020-07-06 16:48:00 WEST / 1594050300 --> <row><v>NAN</v></row> <!-- 2020-07-06 17:45:00 WEST / 1594050300 --> <row><v>NAN</v></row> <!-- 2020-08-06 16:45:00 WEST / 1594050300 --> <row><v>5.0000000000e+00</v></row> 

See for dates 2020-07-06, when time range 16:45:00-17:45:00 is provided lines with time 16:45,16:48,16:47,17:45 are changed. For time 16:45 but date 2020-08-06 it did not changed as date not matches.

Also if you need to enter date in range then define four variables: date, enddate, from, till. And execute below command

date=2020-07-06 enddate=2020-08-06 from=16:45:00 till=17:45:00 while IFS= read -r in; do out="$(echo "$in" | awk '{print $2}')" && outtime="$(echo "$in" | awk '{print $3}')" && sed -i "/"$out" "$outtime"/ s/<v>.*<\/v>/<v>NAN<\/v>/" du*; done <<< "$(sort -k3 -k4 -k5 du* | awk -v date="$date" -v from="$from" -v till="$till" -v enddate="$enddate" '$2 >= date && $2 <= enddate && $3 >= from && $3 <= till' | tac)" 

Above command will help you in your task of changing the values provided with date and time in range Hope this is enough?

Shorter Version: 1). With time range

date=2020-07-06 && from=16:45:00 && till=17:45:00 && gawk -i inplace -v date="$date" -v from="$from" -v till="$till" '$2 == date && $3 >= from && $3 <= till {gsub(/<v>[^<]*/, "<v>nan<")}1' dumpteste.xml 

2). With both date and time range

date=2020-07-06 && from=16:45:00 && till=17:45:00 && enddate=2020-08-06 && awk -v date="$date" -v from="$from" -v till="$till" -v enddate="$enddate" '$2 >= date && $2 <= enddate && $3 >= from && $3 <= till {gsub(/<v>[^<]*/, "<v>nan<")}1' dumpteste.xml 
Add Comment

The command you are trying is not having s option thats why it gives error.

sed -i 's/<!-- 2020-07-06 16:45:00 WEST \/ 1594050300 --> <row><v>5.0000000000e+00<\/v><\/row>.*/<!-- 2020-07-06 16:45:00 WEST \/ 1594050300 --> <row><v>NaN<\/v><\/row>/g' dumpteste.xml 

or

sed -i 's/<v>.*<\/v>/<v>NAN<\/v>/g' dumpteste.xml 

You are having two variable $date and $time and want to match lines having those variables and then apply sed. Do following:

sed "/"$date" "$time" .*<\/row>/ s/<v>.*<\/v>/<v>NAN<\/v>/g" dumpteste.xml 

In above command if line is

<!-- 2020-07-06 16:45:00 WEST / 1594050300 --> <row><v>5.0000000000e+00</v></row>``` And date and time variable are date='2020-07-06' time='16:45:00'  then only line containg that date and time will be edited by sed.   Did it solved your problem? 
Add Comment

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.