Sitz Blogz Sitz Blogz - 1 month ago 6
Python Question

Split column and format the column values

I am trying to format one column data. I can find options to split the columns as it has

,
in between but I am not able to format it as shown in output.

Input

TITLE,Issn
NATURE REVIEWS MOLECULAR CELL BIOLOGY,"ISSN 14710072, 14710080"
ANNUAL REVIEW OF IMMUNOLOGY,"ISSN 07320582, 15453278"
NATURE REVIEWS GENETICS,"ISSN 14710056, 14710064"
CA - A CANCER JOURNAL FOR CLINICIANS,"ISSN 15424863, 00079235"
CELL,"ISSN 00928674, 10974172"
ANNUAL REVIEW OF ASTRONOMY AND ASTROPHYSICS,"ISSN 15454282, 00664146"
NATURE REVIEWS IMMUNOLOGY,"ISSN 14741741, 14741733"
NATURE REVIEWS CANCER,ISSN 1474175X
ANNUAL REVIEW OF BIOCHEMISTRY,"ISSN 15454509, 00664154"
REVIEWS OF MODERN PHYSICS,"ISSN 00346861, 15390756"
NATURE GENETICS,ISSN 10614036



  1. Split the issn column to two columns as it has
    ,

  2. Delete the word ISSN from column only

  3. leave behind numbers After 4 digits put a
    -



Expected output is

TITLE,Issn
NATURE REVIEWS MOLECULAR CELL BIOLOGY,1471-0072, 1471-0080
ANNUAL REVIEW OF IMMUNOLOGY,0732-0582, 1545-3278
NATURE REVIEWS GENETICS,1471-0056, 1471-0064
CA - A CANCER JOURNAL FOR CLINICIANS,1542-4863, 0007-9235
CELL,0092-8674, 1097-4172
ANNUAL REVIEW OF ASTRONOMY AND ASTROPHYSICS,1545-4282, 0066-4146
NATURE REVIEWS IMMUNOLOGY,1474-1741, 1474-1733
NATURE REVIEWS CANCER, 1474-175X
ANNUAL REVIEW OF BIOCHEMISTRY,1545-4509, 0066-4154
REVIEWS OF MODERN PHYSICS,0034-6861, 1539-0756
NATURE GENETICS,1061-4036


Any suggestion with pandas are appreciated .. Thanks in advance

Answer

First, split out numbers and add dashes to them. Use the handy map function:

df_split_num = df['Issn'].map(lambda x: x.split('ISSN ')[1].split(', '))
df_dash_num = df_split_num.map(lambda x: [num[:4] + '-' + num[4:] for num in x])

Next, create a new data frame with the split out issn numbers and place it back into the original data frame:

df_split_issn = pd.DataFrame(data=list(df_dash_num), columns=['Issn1', 'Issn2'])
df[['Issn1', 'Issn2']] = df_split_issn
del df['Issn']