Extracting Data with PHP Curl

Php

1149

April 25, 2024

6 Minutes

Curl is a library used on the server side of PHP to visit URLs and send and receive data. It has a simple API and supports many protocols (HTTP, HTTPS, FTP, FTPS, etc.). Curl is widely used to make server-side HTTP requests and is an ideal tool for data extraction, especially web scraping.

Before using Curl, you must ensure that the Curl plugin is enabled in the php.ini file. You should also make sure that the Curl PHP extension is installed on your server.

How to Make a Curl Request?

First, let's pull a simple HTML page by making a request to a website with Curl:

<?php
   // Requesting a website with Curl
   $ch = curl_init();
   curl_setopt($ch, CURLOPT_URL, "http://www.example.com");
   curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
   $output = curl_exec($ch);
   curl_close($ch);

   // Printing the captured data to the screen
   echo $output;
?>

The above code pulls an HTML page from "http://www.example.com" and prints it to the screen. A Curl instance is created with the curl_init() function and various options are determined with the curl_setopt() function. Finally, the request is made with the curl_exec() function and the output is received.

Web Scraping, Using Curl

It is quite common to use Curl for data scraping purposes. For example, you can use Curl to pull a specific set of data from a website. In the example below, let's scrape the titles from Wikipedia's home page and print them on the screen:

<?php
   // Capturing Wikipedia's home page with Curl
   $ch = curl_init();
   curl_setopt($ch, CURLOPT_URL, "https://tr.wikipedia.org/wiki/Ana_Sayfa");
   curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
   $output = curl_exec($ch);
   curl_close($ch);

   // Finding titles and printing them to the screen
   preg_match_all('/<h2>(.*?)<\/h2>/', $output, $matches);
   foreach ($matches[1] as $match) {
       echo $match . "<br>";
   }
?>

The example above pulls the titles from Wikipedia's home page and prints them to the screen. With the preg_match_all() function, we find the titles with a regular expression (regex) and print them to the screen.

You can access this page faster by scanning the QR code.