Write a python program to finish a big-data processing task --- finding out most frequently used...

80.2K

Verified Solution

Question

Programming

Write a python program to finish a big-dataprocessing task --- finding out most frequently used words onWikipedia pages.

The execution of the program generates a list of distinct wordsused in the wikipedia pages and the number of occurrences of eachword on these web pages. The words are sorted by the number ofoccurrences in ascending order. The following is a sample of outputgenerated for 4 Wikipedia pages.

126 that
128 by
133 as
149 or
160 for
164 is
189 on
191 from
345 to
375 advertising
443 a
473 and
480 in
677 of
1080 the

Since there are a huge number of pages in Wikipedia, it is notrealistic to analyze all of them in short time on one machine. Inthe project, you need to analyze all the pages for the Wikipediaentries with two capital letters. For example, the Wikipedia pagefor entry \"AC\" is https://en.wikipedia.org/wiki/AC . Useurllib or urllib2 library to download a page.


A HTML page has HTML tags, which should be removed before theanalysis. Use BeautifulSoup library to convert a text fromHTML format to text format.

Answer & Explanation Solved by verified expert
4.3 Ratings (772 Votes)
from bs4 import BeautifulSoupComment import urllib3 import itertools import requests to install these requirements do pip3 install bs4urllib3requests def textToextractpassableValue textDict that0by0as0or0for0is0on0from0to0 we can add as many string we need to search in the above dictionary textList textDictkeys this creats a list of all the keys in textDict this for loops    See Answer
Get Answers to Unlimited Questions

Join us to gain access to millions of questions and expert answers. Enjoy exclusive benefits tailored just for you!

Membership Benefits:
  • Unlimited Question Access with detailed Answers
  • Zin AI - 3 Million Words
  • 10 Dall-E 3 Images
  • 20 Plot Generations
  • Conversation with Dialogue Memory
  • No Ads, Ever!
  • Access to Our Best AI Platform: Flex AI - Your personal assistant for all your inquiries!
Become a Member

Other questions asked by students