Remove unwanted text from a .txt file

Question

Justin Rosenthal 2022-1-14

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1629210-remove-unwanted-text-from-a-txt-file

评论： Justin Rosenthal 2022-1-14

trace.txt

Hello,

I have a .txt file that I want to simplify into a matrix. I want to only keep the numbers in the first row and I want to keep the key words in the last column such as init-hold, pre-leak-test, etc. This means that every line starting with a # should be deleted. I have nearly accomplished this however, I do not know how to remove the text that occurs after the comma behind hold-end. I have attached the file I am working with along with my current code. Any help would be greatly appreciated.

trace = 'trace.txt';
str = fileread(trace);
trace = regexp(str,'^(\d+)\.?\d*:[^:]+:\s+([^\n]+)','tokens','lineanchors');
trace = vertcat(trace{:});

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Voss 2022-1-14

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1629210-remove-unwanted-text-from-a-txt-file#answer_874840

trace.txt

Original code with output:

trace = regexp(fileread('trace.txt'),'^(\d+)\.?\d*:[^:]+:\s+([^\n]+)','tokens','lineanchors');
trace = vertcat(trace{:})
trace = 29×2 cell array
    {'0'     }    {'init-hold←'                                                                                               }
    {'0'     }    {'pre-leak-test←'                                                                                           }
    {'10'    }    {'leak-test←'                                                                                               }
    {'3034'  }    {'evac←'                                                                                                    }
    {'3094'  }    {'pre-ramp←'                                                                                                }
    {'3154'  }    {'ramp←'                                                                                                    }
    {'3180'  }    {'hold←'                                                                                                    }
    {'28381' }    {'hold-end, theta1 = 2.89537933246 +/- 0.00364583333333 hr, theta2 = 6.98260416667 +/- 0.00364583333333 hr←'}
    {'28381' }    {'evac←'                                                                                                    }
    {'28681' }    {'pre-ramp←'                                                                                                }
    {'28741' }    {'ramp←'                                                                                                    }
    {'28764' }    {'hold←'                                                                                                    }
    {'89964' }    {'hold-end, theta1 = 2.92489959614 +/- 0.003125 hr, theta2 = 13.2959027778 +/- 0.003125 hr←'                }
    {'89964' }    {'evac←'                                                                                                    }
    {'90264' }    {'pre-ramp←'                                                                                                }
    {'90324' }    {'ramp←'                                                                                                    }
    {'90347' }    {'hold←'                                                                                                    }
    {'151548'}    {'hold-end, theta1 = 2.47468712338 +/- 0.00319444444444 hr, theta2 = 12.7754861111 +/- 0.00319444444444 hr←'}
    {'151548'}    {'evac←'                                                                                                    }
    {'151848'}    {'pre-ramp←'                                                                                                }
    {'151908'}    {'ramp←'                                                                                                    }
    {'151931'}    {'hold←'                                                                                                    }
    {'213131'}    {'hold-end, theta1 = 2.24675072007 +/- 0.00319444444444 hr, theta2 = 13.2186111111 +/- 0.00319444444444 hr←'}
    {'213131'}    {'evac←'                                                                                                    }
    {'213432'}    {'pre-ramp←'                                                                                                }
    {'213492'}    {'ramp←'                                                                                                    }
    {'213514'}    {'hold←'                                                                                                    }
    {'258874'}    {'hold-end, theta1 = 2.2511608259 +/- 0.00305555555556 hr, theta2 = 11.2265972222 +/- 0.00305555555556 hr←' }
    {'258874'}    {'done←'                                                                                                    }

To make the second token on each line stop at any comma (so as to remove the text that occurs after the comma behind "hold-end"):

trace = regexp(fileread('trace.txt'),'^(\d+)\.?\d*:[^:]+:\s+([^\n,]+)','tokens','lineanchors');
trace = vertcat(trace{:})
trace = 29×2 cell array
    {'0'     }    {'init-hold←'    }
    {'0'     }    {'pre-leak-test←'}
    {'10'    }    {'leak-test←'    }
    {'3034'  }    {'evac←'         }
    {'3094'  }    {'pre-ramp←'     }
    {'3154'  }    {'ramp←'         }
    {'3180'  }    {'hold←'         }
    {'28381' }    {'hold-end'      }
    {'28381' }    {'evac←'         }
    {'28681' }    {'pre-ramp←'     }
    {'28741' }    {'ramp←'         }
    {'28764' }    {'hold←'         }
    {'89964' }    {'hold-end'      }
    {'89964' }    {'evac←'         }
    {'90264' }    {'pre-ramp←'     }
    {'90324' }    {'ramp←'         }
    {'90347' }    {'hold←'         }
    {'151548'}    {'hold-end'      }
    {'151548'}    {'evac←'         }
    {'151848'}    {'pre-ramp←'     }
    {'151908'}    {'ramp←'         }
    {'151931'}    {'hold←'         }
    {'213131'}    {'hold-end'      }
    {'213131'}    {'evac←'         }
    {'213432'}    {'pre-ramp←'     }
    {'213492'}    {'ramp←'         }
    {'213514'}    {'hold←'         }
    {'258874'}    {'hold-end'      }
    {'258874'}    {'done←'         }

To also avoid including the carriage returns in the second token:

trace = regexp(fileread('trace.txt'),'^(\d+)\.?\d*:[^:]+:\s+([^\n\r,]+)','tokens','lineanchors');
trace = vertcat(trace{:})
trace = 29×2 cell array
    {'0'     }    {'init-hold'    }
    {'0'     }    {'pre-leak-test'}
    {'10'    }    {'leak-test'    }
    {'3034'  }    {'evac'         }
    {'3094'  }    {'pre-ramp'     }
    {'3154'  }    {'ramp'         }
    {'3180'  }    {'hold'         }
    {'28381' }    {'hold-end'     }
    {'28381' }    {'evac'         }
    {'28681' }    {'pre-ramp'     }
    {'28741' }    {'ramp'         }
    {'28764' }    {'hold'         }
    {'89964' }    {'hold-end'     }
    {'89964' }    {'evac'         }
    {'90264' }    {'pre-ramp'     }
    {'90324' }    {'ramp'         }
    {'90347' }    {'hold'         }
    {'151548'}    {'hold-end'     }
    {'151548'}    {'evac'         }
    {'151848'}    {'pre-ramp'     }
    {'151908'}    {'ramp'         }
    {'151931'}    {'hold'         }
    {'213131'}    {'hold-end'     }
    {'213131'}    {'evac'         }
    {'213432'}    {'pre-ramp'     }
    {'213492'}    {'ramp'         }
    {'213514'}    {'hold'         }
    {'258874'}    {'hold-end'     }
    {'258874'}    {'done'         }

To also include the decimal point and the digits to the right of it in the first token:

trace = regexp(fileread('trace.txt'),'^(\d+\.?\d*):[^:]+:\s+([^\n\r,]+)','tokens','lineanchors');
trace = vertcat(trace{:})
trace = 29×2 cell array
    {'0.0'      }    {'init-hold'    }
    {'0.25'     }    {'pre-leak-test'}
    {'10.5'     }    {'leak-test'    }
    {'3034.0'   }    {'evac'         }
    {'3094.25'  }    {'pre-ramp'     }
    {'3154.5'   }    {'ramp'         }
    {'3180.75'  }    {'hold'         }
    {'28381.0'  }    {'hold-end'     }
    {'28381.0'  }    {'evac'         }
    {'28681.25' }    {'pre-ramp'     }
    {'28741.5'  }    {'ramp'         }
    {'28764.0'  }    {'hold'         }
    {'89964.25' }    {'hold-end'     }
    {'89964.25' }    {'evac'         }
    {'90264.5'  }    {'pre-ramp'     }
    {'90324.75' }    {'ramp'         }
    {'90347.75' }    {'hold'         }
    {'151548.0' }    {'hold-end'     }
    {'151548.0' }    {'evac'         }
    {'151848.25'}    {'pre-ramp'     }
    {'151908.5' }    {'ramp'         }
    {'151931.5' }    {'hold'         }
    {'213131.75'}    {'hold-end'     }
    {'213131.75'}    {'evac'         }
    {'213432.0' }    {'pre-ramp'     }
    {'213492.25'}    {'ramp'         }
    {'213514.25'}    {'hold'         }
    {'258874.5' }    {'hold-end'     }
    {'258874.5' }    {'done'         }