Curl is a library used on the server side of PHP to visit URLs and send and receive data. It has a simple API and supports many protocols (HTTP, HTTPS, FTP, FTPS, etc.). Curl is widely used to make server-side HTTP requests and is an ideal tool for data extraction, especially web scraping.
Before using Curl, you must ensure that the Curl plugin is enabled in the php.ini file. You should also make sure that the Curl PHP extension is installed on your server.
First, let's pull a simple HTML page by making a request to a website with Curl:
<?php
// Requesting a website with Curl
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.example.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
curl_close($ch);
// Printing the captured data to the screen
echo $output;
?>
The above code pulls an HTML page from "http://www.example.com" and prints it to the screen. A Curl instance is created with the curl_init()
function and various options are determined with the curl_setopt()
function. Finally, the request is made with the curl_exec()
function and the output is received.
It is quite common to use Curl for data scraping purposes. For example, you can use Curl to pull a specific set of data from a website. In the example below, let's scrape the titles from Wikipedia's home page and print them on the screen:
<?php
// Capturing Wikipedia's home page with Curl
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://tr.wikipedia.org/wiki/Ana_Sayfa");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
curl_close($ch);
// Finding titles and printing them to the screen
preg_match_all('/<h2>(.*?)<\/h2>/', $output, $matches);
foreach ($matches[1] as $match) {
echo $match . "<br>";
}
?>
The example above pulls the titles from Wikipedia's home page and prints them to the screen. With the preg_match_all()
function, we find the titles with a regular expression (regex) and print them to the screen.