Ebuka Ebuka - 1 month ago 10
Python Question

Fastest way to compare list item against a junk of text or string in python

I have a python list

['Yahoo Search - Yahoo Search Marketing', 'Yahoo Site Explorer - Yahoo! Search Marketing Ambassador', 'Yamaha - Yamaha DM2000', 'Yamaha Digital Consoles - Yamaha M7CL', 'Yamaha PM5D - Yammer', 'YAML - YMS', 'Yantra - Yard', 'Yard Management - Yard Signs', 'Yard Work - Yardi', 'Yardi Enterprise - Yardi Property Management', 'Yardi Property Management Software - Yardi Voyager', 'Yarn - Yaskawa', 'Year End Accounts', 'Year End Accounts - Year End Close', 'Year End Closing - Year-end', 'Year-end Close', 'Year-end Close - Year-end Closing', 'Yearbook - Yearly', 'Yeast - Yeast two-hybrid', 'Yellow Belt - Yellow Book', 'Yellow Pages - Yelp', 'Yeoman - Yiddish', 'Yield - Yield Enhancement', 'Yield Management', 'Yield Management - Yields', 'Yieldstar - Yii', 'Yin Yoga - Yodeling', 'Yoga', 'Yoga - Yoga', 'Yoga Instruction - Yoga Nidra', 'Yogurt - Yoruba', 'Young Adult - Young Adult Literature', 'Young Adult Services - Young Adults', 'Young Adults', 'Young People - Young Professionals', 'YourKit - Yourdon', 'Youth Activism - Youth Advocacy', 'Youth At Risk - Youth Culture', 'Youth Development', 'Youth Development - Youth Education', 'Youth Empowerment - Youth Engagement', 'Youth Entrepreneurship - Youth Groups', 'Youth Justice - Youth Leadership', 'Youth Leadership Training - Youth Marketing', 'Youth Media - Youth Mentoring', 'Youth Mentoring', 'Youth Ministry', 'Youth Ministry - Youth Organizations', 'Youth Outreach - Youth Participation', 'Youth Programming - Youth Programs', 'Youth Services - Youth Work', 'YouTube', 'YouTube - YouTube API', 'YSlow - YUI Library', 'YUM - Yacc', 'Z-Print', 'Z-Wave', 'Z/OS', 'Z/VM', 'Z1', 'Z1U', 'Z7', 'Z80', 'Zabbix', 'Zachman', 'Zainet', 'Zambia', 'ZBrush', 'ZEBB', 'Zebra', 'Zebrafish', 'Zedo', 'Zeiss', 'ZEKE', 'Zemax', 'Zen', 'Zen Shiatsu', 'ZenCart', 'Zend', 'Zend Certified Engineer', 'Zend Framework', 'Zend Server', 'Zend Studio', 'Zendesk', 'Zenger Miller', 'Zenoss', 'Zenworks', 'Zeolites', 'Zephyr', 'Zephyr Style Advisor', 'Zero Balancing', 'Zero Defects', 'Zero Waste', 'Zero-based Budgeting', 'ZeroMQ', 'Zeta Potential', 'Zetafax', 'Zeus', 'ZFS', 'Zig Ziglar', 'ZigBee', 'Zillow', 'Zilog', 'Zimbabwe', 'Zimbra', 'Zinc', 'Zines', 'Zip', 'Zip Drives', 'ZK', 'Zlib', 'ZLinux', 'Zmap', 'Zoho', 'Zombies', 'Zone Alarm', 'Zoning', 'Zoo', 'Zooarchaeology', 'Zoology', 'Zoom', 'Zoomerang', 'ZoomInfo', 'Zoomla', 'ZoomText', 'Zope', 'ZOS', 'Zotero', 'ZPL', 'Zprint', 'ZSeries', 'Zsh', 'Zuken', 'Zultys', 'Zulu', 'Zumba', 'Zumba Instruction', 'Zuora', 'ZVM', 'Zymography', 'Zynx', 'Zyxel']


I would like to match it against a text e.g

"CSS3 , interactive application , elegant web UI's; Exp: 4-8 years; As a web developer at TutorVista,(Pearson Group) you will be involved in building high scalable, rich interactive application and elegant web UI's with AJAX, PHP, FLEX and similar technologies.Skills 4+ years, Strong knowledge in PHP, MySQL, OOPS, MVC frameworks like CodeIgniter, Jquery, JavaScript/HTML5/CSS3 ("AJAX") experience.Significant development experience in server-side web programming such as Python, Perl, Shell scripting, Linux/Unix.Preferred Experience in the following areas User Interface Design, Responsive design, Layout and Interaction; Mobile Application Development (Android, Apple)."

What will be the fastest way to do this using python.
Note:


  1. The list can contain more than 1000 items hence I don't want to compare by iterating through individual items.


Answer

use something like this :

>>> r=re.compile('|'.join(l),re.IGNORECASE) 
>>> r.findall(s)

where l is the list of words and s is the string

you can measure time using timeit module or use this

Comments