Scrape Websites

Apr 29, 2000

Reference : https://codeburst.io/how-to-scrape-websites-using-python-for-data-science-effd20dd3648

In this post, we are going to scrape a website to gather data of the top 300 APIs of the year 2019 from API World. The major reason for doing web scraping is it saves time and avoids manual data gathering and also allows you to have all the data in a structured form.

Requirements

As I always mention, getting started with web scraping is easy and it is divided into two simple parts-

1. Using a web scraping tool to make an HTTP request for data extraction.

2. Extracting important JSON data by parsing the scraped HTML data.

For web scraping, we are going to use certain python libraries & Tools

1. BeautifulSoup is a Python library for pulling data out of HTML and XML files.

2. Requests allows you to send HTTP requests very easily.

3. Scrapingdog — It is a web scraping tool.

Setup

Our setup is pretty simple. Just create a folder and install Beautiful Soup & requests. To create a folder and install libraries type below given commands. I am assuming that you have already installed Python 3.x.

mkdir scraper
pip install beautifulsoup4 pip install requests

Now, create a file inside that folder by any name you like. I am using scraping.py.

Firstly, you have to sign up for the scrapingdog API. It will provide you with 1000 FREE credits. Then just import Beautiful Soup & requests in your file. like this.

from bs4 import BeautifulSoup import requests Preparing to Scrape

Now, we have to read API documentation of Scrapingdog in order to use it. To make it easier for you we are going to use its most Basic API which is available here. For exploring more options you should read the complete documentation of this API. This will give you a clear idea of how this API works. Now, we will scrape API World for top APIs.

To gather data from API World you can inspect the page by right-clicking on the element of interest and select inspect. This brings up the HTML code where we can see the element that each field is contained within.

Since the data is stored in a table, it will be straight forward to scrape with just a few lines of code. This is a good example and a good place to start if you want to familiarize yourself with scraping websites, but bear in mind that it will not always be so simple!

All 300 results are contained within rows in <tr> elements and these are all visible on the one page. This will not always be the case and when results span over many pages you may need to either change the number of results displayed on a webpage or loop over all pages to gather all the information.

So, for now, we will extract HTML from Scrapingdog API and then we will use Beautifulsoup to generate JSON response which will contain company name, API name & category. Now in a single line, we will be able to scrape API World. For requesting an API I will use requests.

r = requests.get('https://api.scrapingdog.com/scrape?api_key=<your-api-key>&url=https://apiworld.co/awards/api-300-top-industry-innovations/').text

this will provide you with an HTML code of that target URL. Now, you have to use BeautifulSoup to parse HTML.

soup = BeautifulSoup(r,'html.parser')

Firstly, we have collected all the “tr” tag elements because it contains all the data. You can find this by right-clicking on any API row. That can be done by below python code.

allapis = soup.find_all("tr")
l={}
u=list()

Then we will start a loop to reach all the rows of each API using the length of the variable “allapis”. After starting a loop we have “td” tags where the text of “Company Name”, “API Name” & “Technology Category” are stored. So, we store these tags in a different variable after starting a for loop.

for i in range(0,len(allapis)):
                    try:
                        api = allapis[i].find_all("td")
                    except:
                        api=None

Now, you will notice that there is a sequence in “td” tags. You will find every first “td” tag to be “Company Name”, second will be “API name” & the last one will be “Category”. We will use this login in our code too.

for i in range(0,len(allapis)):
                    try:
                        api = allapis[i].find_all("td")
                    except:
                        api=None
                    try:
                        l["company"]=api[0].text.replace("\n","")
                    except:
                        l["company"]=None                    try:
                        l["api"]=api[1].text.replace("\n","")
                    except:
                        l["api"]=None                    try:
                        l["category"]=api[2].text.replace("\n","")
                    except:
                        l["category"]=None
                        
                    u.append(l)
                    l={}

Data Cleaning

We have used replace function because it contains unwanted characters such as footnote symbols that would be useful to remove.

We will delete the first item from the list because first “tr” tag has “th” tags instead of “td”, which we don’t need at this point. Finally, when we print the list “u” we get this.

{
    "Top 300": [
        {
            "category": "APIInfrastructure",
            "company": "Amio",
            "api": "Amio"
        },
        {
            "category": "APIInfrastructure",
            "company": "Authlete,Inc.",
            "api": "Authlete"
        },
        {
            "category": "APIInfrastructure",
            "company": "CiscoSystems",
            "api": "CiscoDevNet"
        },
        {
            "category": "APIInfrastructure",
            "company": "Fastly",
            "api": "terrium"
        },
        {
            "category": "APIInfrastructure",
            "company": "Postman",
            "api": "APIDevelopmentEnvironment"
        },
        {
            "category": "APIInfrastructure",
            "company": "TanganyGmbH",
            "api": "WalletasaService"
        },
        {
            "category": "APIManagement",
            "company": "DellBoomi",
            "api": "BoomiAPIManagement"
        },
        {
            "category": "APIManagement",
            "company": "GraviteeSource",
            "api": "Gravitee.ioAPIPlatform"
        },
        {
            "category": "APIManagement",
            "company": "IBM",
            "api": "APIConnect"
        },
        {
            "category": "APIManagement",
            "company": "KongInc.",
            "api": "Kong"
        },
        {
            "category": "APIManagement",
            "company": "LinkApi",
            "api": "APIManagementandIPaaS"
        },
        {
            "category": "APIManagement",
            "company": "MuleSoft",
            "api": "AnypointPlatform"
        },
        {
            "category": "APIManagement",
            "company": "RapidValueSolutions",
            "api": "End-to-endAPIintegrationandmanagementservices"
        },
        {
            "category": "APIManagement",
            "company": "Rebrandly",
            "api": "RebrandlyAPI[v1]"
        },
        {
            "category": "APIManagement",
            "company": "WSO2",
            "api": "WSO2APIManager"
        },
        {
            "category": "APIMiddleware",
            "company": "AloiInc",
            "api": "Aloi"
        },
        {
            "category": "APIMiddleware",
            "company": "APIGATE",
            "api": "APIGATEMint"
        },
        {
            "category": "APIMiddleware",
            "company": "BeAPI",
            "api": "APIChaining"
        },
        {
            "category": "APIMiddleware",
            "company": "Envia.com",
            "api": "EnviaShippingSolutions"
        },
        {
            "category": "APIMiddleware",
            "company": "MailTechnologies,Inc",
            "api": "DocuSendPostalAPI"
        },
        {
            "category": "APIMiddleware",
            "company": "PocketNetworkInc.",
            "api": "PocketNetwork"
        },
        {
            "category": "APIMiddleware",
            "company": "RedHatSoftware,Inc.",
            "api": "RedHatIntegration"
        },
        {
            "category": "APIMiddleware",
            "company": "ScaleDynamics",
            "api": "WarpJSserver"
        },
        {
            "category": "APIMiddleware",
            "company": "Site-Shot",
            "api": "RESTAPI"
        },
        {
            "category": "APIMiddleware",
            "company": "Teapot,LLC",
            "api": "Xilution"
        },
        {
            "category": "APIMiddleware",
            "company": "TheLinuxFoundation",
            "api": "EdgeXFoundry"
        },
        {
            "category": "APIMiddleware",
            "company": "Transposit",
            "api": "Transposit"
        },
        {
            "category": "APISecurity",
            "company": "42Crunch",
            "api": "42CrunchAPISecurityPlatform"
        },
        {
            "category": "APISecurity",
            "company": "Axiomatics",
            "api": "AxiomaticsPolicyServer"
        },
        {
            "category": "APISecurity",
            "company": "CritcalBlue",
            "api": "APPROOV"
        },
        {
            "category": "APISecurity",
            "company": "CryptoMove",
            "api": "CryptoMoveAPIs"
        },
        {
            "category": "APISecurity",
            "company": "CurityAB",
            "api": "CurityIdentityServer"
        },
        {
            "category": "APISecurity",
            "company": "ForumSystems",
            "api": "ForumSentryAPISecurityGateway"
        },
        {
            "category": "APISecurity",
            "company": "FXLabs,inc",
            "api": "APISec"
        },
        {
            "category": "APISecurity",
            "company": "IDFConnect,Inc.",
            "api": "SSO/Rest"
        },
        {
            "category": "APISecurity",
            "company": "monapi.io",
            "api": "IPAddressAnomalyAPI"
        },
        {
            "category": "APISecurity",
            "company": "OneLogin",
            "api": "OneLogin"
        },
        {
            "category": "APISecurity",
            "company": "SoftwareAG",
            "api": "Microgateway"
        },
        {
            "category": "AutomotiveAPIs",
            "company": "Allstate",
            "api": "AllstateRoadsideServicesRescueAPI"
        },
        {
            "category": "AutomotiveAPIs",
            "company": "DaimlerAG",
            "api": "Mercedes-BenzCarData"
        },
        {
            "category": "AutomotiveAPIs",
            "company": "InfiniteLoopDevelopmentLtd",
            "api": "vehicleregistrationapi.com"
        },
        {
            "category": "AutomotiveAPIs",
            "company": "SmartcarInc.",
            "api": "SmartcarAPI"
        },
        {
            "category": "AutomotiveAPIs",
            "company": "SmartMonkey.io",
            "api": "Flake"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "Adzerk",
            "api": "AdzerkAdServingAPIs"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "ClickTime",
            "api": "ClickTimeRESTAPIv2"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "Cloudmersive",
            "api": "CloudmersiveAPIs"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "CreditReportingServicesLLC",
            "api": "SmartAPI"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "DataDemograph",
            "api": "DataDemograph"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "DigitalOwlLtd",
            "api": "semantictextanalysis"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "Disarea,LLC",
            "api": "smartQAPI"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "eBay",
            "api": "eBayDeveloperEcosystem"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "ETNA",
            "api": "ETNATradingAPI"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "Feedier",
            "api": "Feedier"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "FlexRule",
            "api": "FlexRuleDecisionasaService"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "Guidebook,Inc.",
            "api": "GuidebookOpenAPI"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "HelloSign,aDropboxCompany",
            "api": "HelloSignAPI"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "Homebase",
            "api": "HomebasePublicAPI"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "Intuit",
            "api": "IntuitQuickBooksplatform:APIsforaccounting,payments,andpayroll"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "MaybeCapital",
            "api": "Kruch"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "Medallia",
            "api": "MedalliaExperienceCloud"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "Notarize,Inc.",
            "api": "NotarizeBusinessandRealEstateAPIs"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "Notificare",
            "api": "Notificare"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "Paperplane",
            "api": "Paperplane"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "Prisync",
            "api": "PrisyncAPIV2.0"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "Proposify",
            "api": "RESTfulAPI"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "Quik!",
            "api": "Quik!FormsAPI"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "Rossums.r.o.",
            "api": "DocumentManagementAPI"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "rspective",
            "api": "Voucherify"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "Saucepos",
            "api": "ChainReactive"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "Seametrixsoftware",
            "api": "SeametrixAPI"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "Sisense",
            "api": "SisenseAPIs"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "TurnTechnologies",
            "api": "BackgroundCheckAPI"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "Typeform",
            "api": "TypeformAPI"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "WingifySoftwarePvtLtd",
            "api": "VWOAPI"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "Ximilars.r.o.",
            "api": "Ximilar"
        },
        {
            "category": "BusinessSoftwareAPIs",
            "company": "Zenkit",
            "api": "ZenkitAPI"
        },
        {
            "category": "CommunicationsAPIs",
            "company": "2600Hz",
            "api": "KAZOO"
        },
        {
            "category": "CommunicationsAPIs",
            "company": "Agora.io",
            "api": "AgoraVoice&VideoSDK"
        },
        {
            "category": "CommunicationsAPIs",
            "company": "Agora.io",
            "api": "Realtimevoice,videoandinteractivestreaming"
        },
        {
            "category": "CommunicationsAPIs",
            "company": "Amio",
            "api": "Amio"
        },
        {
            "category": "CommunicationsAPIs",
            "company": "Arvia",
            "api": "ARpoweredremotevideoassistance"
        },
        {
            "category": "CommunicationsAPIs",
            "company": "Botdelive",
            "api": "PushNotificationand2FAviaWhatsapp,MessengerandTelegram"
        },
        {
            "category": "CommunicationsAPIs",
            "company": "ForkingSoftwareLLC",
            "api": "Mailsac"
        },
        {
            "category": "CommunicationsAPIs",
            "company": "KarixMobilePvtLtd",
            "api": "karix.IO-UnifiedAPIforSMSandWhatsApp"
        },
        {
            "category": "CommunicationsAPIs",
            "company": "karix.io",
            "api": "karix.io"
        },
        {
            "category": "CommunicationsAPIs",
            "company": "KPN",
            "api": "Speechtotext"
        },
        {
            "category": "CommunicationsAPIs",
            "company": "MatchMyThesisIVS",
            "api": "PicturaAPI"
        },
        {
            "category": "CommunicationsAPIs",
            "company": "MicroOceanTechnologiesS/B",
            "api": "MoceanAPI"
        },
        {
            "category": "CommunicationsAPIs",
            "company": "Numspy",
            "api": "Numspy"
        },
        {
            "category": "CommunicationsAPIs",
            "company": "Nylas",
            "api": "NylasUniversalAPIs"
        },
        {
            "category": "CommunicationsAPIs",
            "company": "Ribbon",
            "api": "Kandy"
        },
        {
            "category": "CommunicationsAPIs",
            "company": "SendBird",
            "api": "SendBird"
        },
        {
            "category": "CommunicationsAPIs",
            "company": "sms77.io",
            "api": "SMSAPI"
        },
        {
            "category": "CommunicationsAPIs",
            "company": "TeleSign",
            "api": "TeleSign"
        },
        {
            "category": "CommunicationsAPIs",
            "company": "Telnyx",
            "api": "RESTfulJSONAPI"
        },
        {
            "category": "CommunicationsAPIs",
            "company": "TheThingsIndustries",
            "api": "TheThingsNetwork"
        },
        {
            "category": "CommunicationsAPIs",
            "company": "Twilio",
            "api": "Twilio"
        },
        {
            "category": "CommunicationsAPIs",
            "company": "Vonage",
            "api": "Nexmo,TheVonageAPIPlatform"
        },
        {
            "category": "CommunicationsAPIs",
            "company": "Voxbone",
            "api": "VoiceAPI,SMSAPI,ProgrammableComplianceAPI"
        },
        {
            "category": "DataAPIs",
            "company": "AbacabLtd",
            "api": "BoltronApi"
        },
        {
            "category": "DataAPIs",
            "company": "AmplifyReach",
            "api": "NaturalLanguageUnderstanding(NLU)APIs"
        },
        {
            "category": "DataAPIs",
            "company": "ATTOMDataSolutions",
            "api": "RealEstate,Neighborhood,POIAPIs"
        },
        {
            "category": "DataAPIs",
            "company": "BoggioAnalytics",
            "api": "FootballPredictionAPI"
        },
        {
            "category": "DataAPIs",
            "company": "BoulevardAI",
            "api": "BoulevardForesight"
        },
        {
            "category": "DataAPIs",
            "company": "CDCWONDER",
            "api": "WONDER"
        },
        {
            "category": "DataAPIs",
            "company": "ChompFoodsLLC",
            "api": "Chomp"
        },
        {
            "category": "DataAPIs",
            "company": "Clearout",
            "api": "RESTful"
        },
        {
            "category": "DataAPIs",
            "company": "ClimaCell",
            "api": "MicroWeatherAPI"
        },
        {
            "category": "DataAPIs",
            "company": "CodeLineOy",
            "api": "MACaddressvendorlookup"
        },
        {
            "category": "DataAPIs",
            "company": "ContentSide",
            "api": "ContentSidePlateform"
        },
        {
            "category": "DataAPIs",
            "company": "DataLantern,Inc",
            "api": "DataLantern-dataisthenewAPI"
        },
        {
            "category": "DataAPIs",
            "company": "Datopian",
            "api": "REST"
        },
        {
            "category": "DataAPIs",
            "company": "DIGIrealitys.r.o.",
            "api": "Digireality.czoffers"
        },
        {
            "category": "DataAPIs",
            "company": "Edamam",
            "api": "Foodandnutritiondataplatform"
        },
        {
            "category": "DataAPIs",
            "company": "ElevationAPI",
            "api": "ElevationAPI"
        },
        {
            "category": "DataAPIs",
            "company": "EntityDigitalSportsPvtLtd",
            "api": "Rest"
        },
        {
            "category": "DataAPIs",
            "company": "FakeJSON",
            "api": "FakeJSON"
        },
        {
            "category": "DataAPIs",
            "company": "FoxyAI",
            "api": "FoxyAIAPI"
        },
        {
            "category": "DataAPIs",
            "company": "FullContact",
            "api": "FullContactEnrichAPI"
        },
        {
            "category": "DataAPIs",
            "company": "GeolakeLLC",
            "api": "GeolakeGeocodingAPIService"
        },
        {
            "category": "DataAPIs",
            "company": "Gnews",
            "api": "UnofficialGoogleNewsAPI"
        },
        {
            "category": "DataAPIs",
            "company": "HarvardLibraryInnovationLab",
            "api": "CaselawAccessProjectAPI"
        },
        {
            "category": "DataAPIs",
            "company": "HyperTrack",
            "api": "HyperTrack"
        },
        {
            "category": "DataAPIs",
            "company": "InstituteforSocialResearchandDataInnovation,UofMinnesota",
            "api": "IPUMSAPI"
        },
        {
            "category": "DataAPIs",
            "company": "IntelligenceNode",
            "api": "Infeed"
        },
        {
            "category": "DataAPIs",
            "company": "Interzoid",
            "api": "InterzoidAPIs"
        },
        {
            "category": "DataAPIs",
            "company": "Joursouvres",
            "api": "JSON"
        },
        {
            "category": "DataAPIs",
            "company": "LeadSquaredInc",
            "api": "LeadSquaredAPI"
        },
        {
            "category": "DataAPIs",
            "company": "LoctomeSportsLiveTracking",
            "api": "LoctomeAPIelevationservice"
        },
        {
            "category": "DataAPIs",
            "company": "LoctomeSportsLiveTracking",
            "api": "LoctomeElevationService"
        },
        {
            "category": "DataAPIs",
            "company": "LOTaDATA",
            "api": "CITYDASH.ai"
        },
        {
            "category": "DataAPIs",
            "company": "MakCorps-HotelPriceComparisonAPI",
            "api": "HotelPriceComparisonAPI"
        },
        {
            "category": "DataAPIs",
            "company": "MarkLogic",
            "api": "MarkLogicDataServices"
        },
        {
            "category": "DataAPIs",
            "company": "MaxPlanckInstituteofAnimalBehavior",
            "api": "MovebankRESTAPI"
        },
        {
            "category": "DataAPIs",
            "company": "mopinion",
            "api": "MopinionFeedbackDataAPI"
        },
        {
            "category": "DataAPIs",
            "company": "MovieQuotes",
            "api": "MovieQuotesAPI"
        },
        {
            "category": "DataAPIs",
            "company": "NationalResearchCouncilofItaly-InstituteofAtmosphericPollutionresearch(CNR-IIA)",
            "api": "GEOSSPlatformAPI"
        },
        {
            "category": "DataAPIs",
            "company": "Neobi",
            "api": "NeobiOpenCannabis"
        },
        {
            "category": "DataAPIs",
            "company": "NYCMayor'sOfficeforEconomicOpportunity",
            "api": "TheNYCBenefitsScreeningAPI"
        },
        {
            "category": "DataAPIs",
            "company": "OpenUp",
            "api": "OpenGazettesSouthAfrica"
        },
        {
            "category": "DataAPIs",
            "company": "OpenUp",
            "api": "vulekamali"
        },
        {
            "category": "DataAPIs",
            "company": "Over-UnderDigitalInc.",
            "api": "FootyStatsAPI"
        },
        {
            "category": "DataAPIs",
            "company": "PBDataServicesLLC",
            "api": "UpdateYourList.comRESTAPI"
        },
        {
            "category": "DataAPIs",
            "company": "PickpointioLTD",
            "api": "GeocodingserviceAPI"
        },
        {
            "category": "DataAPIs",
            "company": "PremierLeagueLiveScoresAPI",
            "api": "PremierLeagueLiveScoresAPI"
        },
        {
            "category": "DataAPIs",
            "company": "PUBG",
            "api": "PUBGDeveloperAPI"
        },
        {
            "category": "DataAPIs",
            "company": "RealtyMole",
            "api": "RentEstimateAPI"
        },
        {
            "category": "DataAPIs",
            "company": "RedisLabs",
            "api": "RedisEnterpriseProAPIs"
        },
        {
            "category": "DataAPIs",
            "company": "RoaringAppsAB",
            "api": "REST"
        },
        {
            "category": "DataAPIs",
            "company": "ScoreBat",
            "api": "ScoreBat"
        },
        {
            "category": "DataAPIs",
            "company": "scorelab",
            "api": "APIglobalwinescore"
        },
        {
            "category": "DataAPIs",
            "company": "ScraperAPI",
            "api": "ScraperAPI"
        },
        {
            "category": "DataAPIs",
            "company": "SearoutesS.A.S",
            "api": "searoutes.com"
        },
        {
            "category": "DataAPIs",
            "company": "SEOReviewTools",
            "api": "SEOContentAnalysisAPI"
        },
        {
            "category": "DataAPIs",
            "company": "SkimTechnologies",
            "api": "SkimEngine"
        },
        {
            "category": "DataAPIs",
            "company": "SocialAnimal",
            "api": "MostSharedContent/NewsAPI,InfluencerSearchAPI,ShareCountAPI"
        },
        {
            "category": "DataAPIs",
            "company": "SunsetWx",
            "api": "Sunburst"
        },
        {
            "category": "DataAPIs",
            "company": "SzymonDukla",
            "api": "HolidayAPI"
        },
        {
            "category": "DataAPIs",
            "company": "TheSensibleCodeCompany",
            "api": "PDFtableextractionAPI"
        },
        {
            "category": "DataAPIs",
            "company": "TheDataDB",
            "api": "TheCocktailDB"
        },
        {
            "category": "DataAPIs",
            "company": "TisaneLabs",
            "api": "TisaneAPI"
        },
        {
            "category": "DataAPIs",
            "company": "Tripomatics.r.o.",
            "api": "SygicTravelAPI"
        },
        {
            "category": "DataAPIs",
            "company": "WanderingLeafStudiosLLC",
            "api": "OpenBreweryDB"
        },
        {
            "category": "DataAPIs",
            "company": "WeatherbitLLC",
            "api": "WeatherAPI"
        },
        {
            "category": "DataAPIs",
            "company": "WordnikSociety",
            "api": "theWordnikAPI"
        },
        {
            "category": "DataAPIs",
            "company": "Xooa",
            "api": "XooaAPI"
        },
        {
            "category": "DevOpsAPIs",
            "company": "Arcentry,Inc.",
            "api": "Arcentry-DiagrammingAPI"
        },
        {
            "category": "DevOpsAPIs",
            "company": "CircleCI",
            "api": "CircleCIAPI"
        },
        {
            "category": "DevOpsAPIs",
            "company": "OhDear!",
            "api": "OhDear!API"
        },
        {
            "category": "DevOpsAPIs",
            "company": "PagerDuty",
            "api": "PagerDuty"
        },
        {
            "category": "DevOpsAPIs",
            "company": "Parasoft",
            "api": "ParasoftSOAtest"
        },
        {
            "category": "DevOpsAPIs",
            "company": "StackPath",
            "api": "EdgeInfrastructureAPIs"
        },
        {
            "category": "DevOpsAPIs",
            "company": "Tier1app",
            "api": "CrashanalysisAPI"
        },
        {
            "category": "EnterpriseAPIs",
            "company": "Activeledger",
            "api": "Activeledger/Activecore"
        },
        {
            "category": "EnterpriseAPIs",
            "company": "AiyoLabs",
            "api": "FlockSendConnect"
        },
        {
            "category": "EnterpriseAPIs",
            "company": "BitcoinAverage",
            "api": "BitcoinAverageEnterpriseWebsocketAPI"
        },
        {
            "category": "EnterpriseAPIs",
            "company": "ClustTechnologies",
            "api": "ClustAPI"
        },
        {
            "category": "EnterpriseAPIs",
            "company": "echoAR,Inc.",
            "api": "echoAR"
        },
        {
            "category": "EnterpriseAPIs",
            "company": "Kaleido",
            "api": "KaleidoAdministrativeAPI&KaleidoDeveloperAPI"
        },
        {
            "category": "EnterpriseAPIs",
            "company": "Kloudless",
            "api": "KloudlessUnifiedAPIs"
        },
        {
            "category": "EnterpriseAPIs",
            "company": "LandeCost.io",
            "api": "LandedCostCalculatorAPI/HSCodeSearchAPI"
        },
        {
            "category": "EnterpriseAPIs",
            "company": "MavatarTechnologiesInc.",
            "api": "mCartomnichannelmarketplaceandaffiliatesalesPaaS"
        },
        {
            "category": "EnterpriseAPIs",
            "company": "Moovit",
            "api": "MoovitTransitAPIs"
        },
        {
            "category": "EnterpriseAPIs",
            "company": "soajs",
            "api": "soajs"
        },
        {
            "category": "EnterpriseAPIs",
            "company": "Sterling",
            "api": "SterlingBackgroundScreening&IdentityAPI"
        },
        {
            "category": "EnterpriseAPIs",
            "company": "VerifileLimited",
            "api": "VerifileGlobalBackgroundCheckAPI"
        },
        {
            "category": "EnterpriseAPIs",
            "company": "Voucherify-rspective",
            "api": "Voucherify"
        },
        {
            "category": "FinanceAPIs",
            "company": "BraveNewCoin",
            "api": "BNCCryptoDataAPI's"
        },
        {
            "category": "FinanceAPIs",
            "company": "Českáspořitelna",
            "api": "OpenBanking"
        },
        {
            "category": "FinanceAPIs",
            "company": "CoinrankingB.V.",
            "api": "TheCoinrankingAPI"
        },
        {
            "category": "FinanceAPIs",
            "company": "DeBetaalfabriek",
            "api": "IBAN-API"
        },
        {
            "category": "FinanceAPIs",
            "company": "DeutscheBankAG",
            "api": "DeutscheBankAPIProgram"
        },
        {
            "category": "FinanceAPIs",
            "company": "FactSet",
            "api": "FactSet:Developer"
        },
        {
            "category": "FinanceAPIs",
            "company": "FinancialModelingPrep",
            "api": "FinancialModelingPrep"
        },
        {
            "category": "FinanceAPIs",
            "company": "FinbourneTechnology",
            "api": "LUSID"
        },
        {
            "category": "FinanceAPIs",
            "company": "Finicity",
            "api": "TradestreamandUltraFICO"
        },
        {
            "category": "FinanceAPIs",
            "company": "HavenLife",
            "api": "HavenLifetermlifeinsuranceAPI"
        },
        {
            "category": "FinanceAPIs",
            "company": "Hydrogen",
            "api": "HydrogenAtom"
        },
        {
            "category": "FinanceAPIs",
            "company": "Intrinio",
            "api": "IntrinioFinancialDataAPI"
        },
        {
            "category": "FinanceAPIs",
            "company": "KalendariumLLC",
            "api": "EarningsCalendar"
        },
        {
            "category": "FinanceAPIs",
            "company": "KuveytTürkParticipationBank",
            "api": "ASP.NETWebAPI2"
        },
        {
            "category": "FinanceAPIs",
            "company": "MutualFundAPI",
            "api": "MutualFundAPI"
        },
        {
            "category": "FinanceAPIs",
            "company": "Nomics",
            "api": "Nomics'CryptoMarketDataAPI"
        },
        {
            "category": "FinanceAPIs",
            "company": "Nordea",
            "api": "FXMarketOrderAPI,FXListedRatesAPI,bothbuiltonRestAPItechnology"
        },
        {
            "category": "FinanceAPIs",
            "company": "OCBC",
            "api": "Connect2OCBC"
        },
        {
            "category": "FinanceAPIs",
            "company": "PayJoyInc",
            "api": "LockAPI"
        },
        {
            "category": "FinanceAPIs",
            "company": "Shrimpy",
            "api": "ShrimpyUniversalCryptoExchangeTradingAPI"
        },
        {
            "category": "FinanceAPIs",
            "company": "TaxJar",
            "api": "TaxJarSmartCalcsAPI"
        },
        {
            "category": "FinanceAPIs",
            "company": "Totle",
            "api": "TotleAPI"
        },
        {
            "category": "FinanceAPIs",
            "company": "TradeStation",
            "api": "TradeStationWebAPI"
        },
        {
            "category": "FinanceAPIs",
            "company": "Xignite",
            "api": "MarketDataCloud"
        },
        {
            "category": "FinanceAPIs",
            "company": "Yapily",
            "api": "YapilyAPI"
        },
        {
            "category": "FinanceAPIs",
            "company": "YouNeedaBudget(YNAB)",
            "api": "TheYNABAPI"
        },
        {
            "category": "HealthAPIs",
            "company": "CanIEatItLimited",
            "api": "CanIEatIt?ProductandBarcodeAPI"
        },
        {
            "category": "HealthAPIs",
            "company": "Caremerge",
            "api": "CaremergeAPI"
        },
        {
            "category": "HealthAPIs",
            "company": "eHealthMeInc",
            "api": "eHealthMeAPI"
        },
        {
            "category": "HealthAPIs",
            "company": "PersonalRemedies",
            "api": "PersonalRemediesAPI"
        },
        {
            "category": "HealthAPIs",
            "company": "SikkaSoftwareCorp",
            "api": "SikkaONEAPI"
        },
        {
            "category": "HomeAPIs",
            "company": "Allow2",
            "api": "Allow2"
        },
        {
            "category": "HomeAPIs",
            "company": "RealtyMole",
            "api": "RealtyMolePropertyAPI"
        },
        {
            "category": "IoTAPIs",
            "company": "BSHHausgeräteGmbH",
            "api": "HomeConnect"
        },
        {
            "category": "IoTAPIs",
            "company": "SoundHoundInc.",
            "api": "Houndify"
        },
        {
            "category": "IoTAPIs",
            "company": "Temboo",
            "api": "APIToolkit&KosmosIoTSystem"
        },
        {
            "category": "MediaAPIs",
            "company": "Adobe",
            "api": "AdobeXDPlatform"
        },
        {
            "category": "MediaAPIs",
            "company": "BakuageCo.,Ltd.",
            "api": "AIMastering"
        },
        {
            "category": "MediaAPIs",
            "company": "BrighterToolsLtd",
            "api": "MediaMarkup"
        },
        {
            "category": "MediaAPIs",
            "company": "Cloudinary",
            "api": "CloudinaryMediaManagementAPI"
        },
        {
            "category": "MediaAPIs",
            "company": "Frame.io",
            "api": "Frame.ioDeveloperPlatform"
        },
        {
            "category": "MediaAPIs",
            "company": "GraphQL360",
            "api": "GraphQL360"
        },
        {
            "category": "MediaAPIs",
            "company": "InternetVideoArchive",
            "api": "Entertainment"
        },
        {
            "category": "MediaAPIs",
            "company": "LoremPicsum",
            "api": "LoremPicsum"
        },
        {
            "category": "MediaAPIs",
            "company": "MoodMe",
            "api": "FaceInsights"
        },
        {
            "category": "MediaAPIs",
            "company": "OpenShotStudios,LLC",
            "api": "OpenShotVideoEditingCloudAPI"
        },
        {
            "category": "MediaAPIs",
            "company": "PandaGeneralTrading",
            "api": "EthiopianMovieDatabases"
        },
        {
            "category": "MediaAPIs",
            "company": "Rocketium",
            "api": "RocketiumVideoAPI"
        },
        {
            "category": "MediaAPIs",
            "company": "Storyblocks",
            "api": "StoryblocksAPI"
        },
        {
            "category": "MediaAPIs",
            "company": "Svrf",
            "api": "SvrfAPI"
        },
        {
            "category": "MediaAPIs",
            "company": "Ziggeo",
            "api": "ZiggeoAPI"
        },
        {
            "category": "MicroservicesAPIs",
            "company": "BackendBox",
            "api": "BackendBox"
        },
        {
            "category": "MicroservicesAPIs",
            "company": "DILLILABSLLC",
            "api": "DilliEmailValidationAPI(DEVA)"
        },
        {
            "category": "MicroservicesAPIs",
            "company": "G-SquareSolutionsPvt.Ltd.",
            "api": "bigdator/textrator"
        },
        {
            "category": "MicroservicesAPIs",
            "company": "Marqeta",
            "api": "MarqetaDiVAAPI"
        },
        {
            "category": "MicroservicesAPIs",
            "company": "Rasterwise,LLC.",
            "api": "GetScreenshot"
        },
        {
            "category": "MicroservicesAPIs",
            "company": "TechfabricLLC",
            "api": "MicroservicesandRESTfulAPIs"
        },
        {
            "category": "Other:AIMiddleware",
            "company": "Intento,Inc.",
            "api": "IntentoAIMiddleware"
        },
        {
            "category": "Other:BlockchainAPI",
            "company": "FactomInc",
            "api": "HarmonyConnect"
        },
        {
            "category": "Other:BlockchainAPIs",
            "company": "VizLoreLLC",
            "api": "ChainRider"
        },
        {
            "category": "Other:CAPTCHASolverAPI",
            "company": "CAPTCHAs.IO",
            "api": "CAPTCHAs.IOOCR"
        },
        {
            "category": "Other:ContentManagementAPI",
            "company": "CrafterSoftware",
            "api": "CrafterCMSGraphQLServer"
        },
        {
            "category": "Other:CyberSecurity-DataAnalysis&Analytics",
            "company": "PacketTotalLLC",
            "api": "StaticNetworkAnalysis&AnalyticsEngine"
        },
        {
            "category": "Other:DataAPIs,MediaAPIs,HealthAPIs,FinanceAPIs,EnterpriseAPIs,",
            "company": "SummarizeBot",
            "api": "SummarizeBotAPIs"
        },
        {
            "category": "Other:DLTIOTATangleAPIforPayment/IOT/Data",
            "company": "deliontechnologies",
            "api": "delion.io"
        },
        {
            "category": "Other:E-CommerceAPIs",
            "company": "VIOLET",
            "api": "VIOLETAPI"
        },
        {
            "category": "Other:eCommerceAPI",
            "company": "Nexway",
            "api": "MONETIZE&CONNECT"
        },
        {
            "category": "Other:ElectronicsignatureAPI",
            "company": "SignRequest",
            "api": "SignRequestAPI"
        },
        {
            "category": "Other:Extensionsandintegrations",
            "company": "Sketch",
            "api": "Sketch"
        },
        {
            "category": "Other:GreentechAPI",
            "company": "Cloverly",
            "api": "CloverlyAPI"
        },
        {
            "category": "Other:History",
            "company": "VedicAPIs",
            "api": "VedicAPIs"
        },
        {
            "category": "Other:HospitalityandtravelAPI",
            "company": "Zodomus",
            "api": "Zodomus"
        },
        {
            "category": "Other:IdentityandUserManagement",
            "company": "FusionAuth",
            "api": "FusionAuth"
        },
        {
            "category": "Other:IdentityVerification/ComplianceAPIs",
            "company": "Trulioo",
            "api": "GlobalGateway"
        },
        {
            "category": "Other:InsuranceAPIs",
            "company": "CoverWallet",
            "api": "CoverWalletAPI"
        },
        {
            "category": "Other:IPGeolocationandThreatDataAPI",
            "company": "Ipregistry",
            "api": "Ipregistry"
        },
        {
            "category": "Other:LocationAPIs",
            "company": "Foursquare",
            "api": "PlacesAPI"
        },
        {
            "category": "Other:LocationAPIs",
            "company": "TomTom",
            "api": "TomTomMapsAPIs"
        },
        {
            "category": "Other:MachineLearning-TextAnalyticsAPIs",
            "company": "Converseon",
            "api": "Conversus.AI"
        },
        {
            "category": "Other:MachineLearningAPIHosting",
            "company": "Algorithmia",
            "api": "Algorithmia"
        },
        {
            "category": "Other:MappingAPI",
            "company": "TargomoGmbH",
            "api": "TargomoAPI"
        },
        {
            "category": "Other:NaturalLanguageProcessing",
            "company": "CodeqLLC",
            "api": "CodeqNaturalLanguageProcessingAPI"
        },
        {
            "category": "Other:NaturalLanguageProcessing",
            "company": "Twinword,Inc.",
            "api": "TwinwordAPI"
        },
        {
            "category": "Other:NaturalLanguageProcessing/Generation/Understanding",
            "company": "UnFound.ai",
            "api": "UnFound.ai"
        },
        {
            "category": "Other:NewsAPI",
            "company": "SpaceflightNewsAPI",
            "api": "SpaceflightNewsAPI"
        },
        {
            "category": "Other:NotSpecifed",
            "company": "Catchy",
            "api": "WeareanAPIMarketingcompany"
        },
        {
            "category": "Other:NotSpecifed",
            "company": "Notificare",
            "api": "Notificare"
        },
        {
            "category": "Other:NotSpecifed",
            "company": "Socure",
            "api": "SocureID+solution"
        },
        {
            "category": "Other:OnlineMarketing(SearchEngineOptimization/SEO)",
            "company": "seobilityGmbH",
            "api": "SEOAPIs"
        },
        {
            "category": "Other:PDFDocumentToolsAPI",
            "company": "iLovePDF",
            "api": "iLovePDF™APIRest"
        },
        {
            "category": "Other:Real-TimeAPIManagement",
            "company": "PushTechnologyLtd.",
            "api": "DiffusionReal-timeAPIManagementPlatform"
        },
        {
            "category": "Other:RobotAPIs",
            "company": "MistyRobotics",
            "api": "MistyRoboticsDevelopmentPlatform"
        },
        {
            "category": "Other:RouteOptimizationAPI",
            "company": "OnTerraSystems",
            "api": "RouteSavvyRouteOptimizationAPI"
        },
        {
            "category": "Other:ScreenshotAPI",
            "company": "Netcube",
            "api": "ApiFlash"
        },
        {
            "category": "Other:SearchAPIs",
            "company": "SocialSearcher",
            "api": "SocialMediaSearch&MonitoringAPI"
        },
        {
            "category": "Other:SecureDigitalTransport:21+verticalmarkets",
            "company": "Botdoc",
            "api": "Botdoc"
        },
        {
            "category": "Other:SmartGarden,environmentmonitoringandagriculture",
            "company": "FlowerChecker",
            "api": "plantidentificationAPI"
        },
        {
            "category": "Other:SocialMedia",
            "company": "GetYourPet,LLC",
            "api": "GetYourPetAPI"
        },
        {
            "category": "Other:Socialmedia",
            "company": "ZorangInc",
            "api": "JavaAPI"
        },
        {
            "category": "Other:SportsAPI",
            "company": "Decathlon",
            "api": "SportsTrackingData"
        },
        {
            "category": "Other:SportsAPIs",
            "company": "CompughterTechnologies,LLC",
            "api": "VersusSportsSimulator"
        },
        {
            "category": "Other:SportsAPIs",
            "company": "FantasyFootballNerd",
            "api": "FantasySportsAPI"
        },
        {
            "category": "Other:TextToSpeechAPI",
            "company": "SCDEVISSOFTWARESRL",
            "api": "CloudPronouncer"
        },
        {
            "category": "Other:TravelAPI",
            "company": "TravelgateX",
            "api": "TravelgateX.Theglobalmarketplaceforthetraveltrade."
        },
        {
            "category": "Other:TravelRecommendationEngine",
            "company": "Tripian",
            "api": "TripianAPI"
        },
        {
            "category": "Other:UrbanAPI",
            "company": "BoulevardAI",
            "api": "BoulevardForesight"
        },
        {
            "category": "PaymentAPIs",
            "company": "BlockChyp",
            "api": "BlockChyp"
        },
        {
            "category": "PaymentAPIs",
            "company": "Cardknox",
            "api": "CardknoxAPI"
        },
        {
            "category": "PaymentAPIs",
            "company": "Cardknox",
            "api": "CardknoxRecurringPayments"
        },
        {
            "category": "PaymentAPIs",
            "company": "Fiserv",
            "api": "DigitalPaymentsSDK"
        },
        {
            "category": "PaymentAPIs",
            "company": "PaywithBoltLtd",
            "api": "PaywithBolt"
        },
        {
            "category": "PaymentAPIs",
            "company": "PayPal",
            "api": "DisputesAPI"
        },
        {
            "category": "PaymentAPIs",
            "company": "Payway,Inc",
            "api": "PaywayWS"
        },
        {
            "category": "PaymentAPIs",
            "company": "Stronghold",
            "api": "StrongholdPlatformAPI"
        },
        {
            "category": "PaymentAPIs",
            "company": "Uviba",
            "api": "UvibaPayments"
        }
    ]
}

Summary

This brief tutorial on web scraping with python has outlined:

1. Connecting to a webpage.

2. Parsing HTML using BeautifulSoup

3. Looping through the soup object to find elements

4. Performing some simple data cleaning

Using Scrapingdog API we were able to complete our scraping task in just 5 minutes of coding.

Thank you for reading! If you enjoyed my article then please hit the like button and feel free to comment & ask me anything.

You can follow me on Medium for more articles, follow me on Twitter.

Last updated