Text comparison of phrases in two data-frames and getting the output at matching phrases with sequence and index

Two datasets df and df1 are in columns in row-wise split, but separated by fullstop ‘.’ as complete sentence. I want to match the dataset phrases which are present in both and and get the dataset at the matching sentences with the index of superset df.

I can only make if the text is plain, but not in the column-wise. If the spaCy or nlp with language model can help to handle this issue?

df:  index ID-0 ID-1 text 0 4 20 This 1 6 8 is  2 8 6 an  3 12 15 apple 4 29 9. 5 45 5 The 6 56 8 apple 7 60 10 is  8 62 15 sweet 9 65 2 . 10 66 1 This  11 68 2 is 12 70 6 very 13 73 4 good 14 75 1 fruit 15 76 3 . 16 78 1 I  17 82 0 like 18 90 6 to  19 95 8 eat 20 99 2 apple 21 100 0 .  df1  idx text 1 The 2 apple 3 is  4 sweet 5 . 6 I  7 like 8 to  9 eat 10 apple 11 .  output:  index ID-0 ID-1 text     5 45 5 The 6 56 8 apple 7 60 10 is  8 62 15 sweet 9 65 2 . 16 78 1 I  17 82 0 like 18 90 6 to  19 95 8 eat 20 99 2 apple 21 100 0 . 
Add Comment
1 Answer(s)

Should be pretty simple:

df_new = df[df.text.isin(df1.text)] 
Add Comment

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.