When building a web scraper with Ruby, one of the most important tools at your disposal is the HTTP client. An HTTP client is a software library or framework that enables you to send and receive HTTP requests(GET, POST, PUT, etc) and responses to communicate with web servers.
With so many HTTP clients available in the Ruby ecosystem, it can be challenging to choose the best one for your project.
In this article, we’ll take a look at some of the best Ruby HTTP clients available and compare their features, performance, and ease of use. Whether you’re building a simple web scraper or a complex RESTful API, choosing the right HTTP client can make all the difference in your project’s success. So, without further ado, let’s dive in!
Factors on which rank will be decided for Ruby HTTP Clients
Let me just define what set of factors I am going to consider in order to rank Ruby HTTP clients in decreasing order.
- Performance– The library should be fast and lightweight. It should be able to handle a large number of concurrent requests without delaying the response.
- Documentation– Clear and price documentation is another factor to benchmark any library. It should be well written so that developers can jump-start their work asap.
- Community– The community should be large enough to cater to all the problems one might face while coding.
- Github Star– At last, we will also look at the number of stars a library has. The number will help us understand its quality and perceived utility.
For testing the speed we are going to make GET and POST requests with libraries and then test the timing.
For the GET request, we are going to use this API and for the POST request, we are going to use this API.
You have to create a dedicated folder in which we will keep our ruby file. I am naming the file as check.rb
. You can pick any name you like. To run the file you just have to open the folder in your terminal and type ruby check.rb
and then hit enter.
Our setup is complete let’s start testing the libraries.
HTTParty
It is a ruby gem that is made on top of Net::HTTP
library. It is super simple to use and comes with features like query parameters, request headers, and basic authentication. Let’s see how we can make a GET and a POST request with httparty
and measure the time taken by the library to implement the task.
For measuring the time taken we will use the Benchmark
library.
GET Request
require 'httparty' require 'benchmark' time = Benchmark.realtime do response = HTTParty.get('https://httpbin.org/get') puts response.body end puts "Request took #{time} seconds"
For this example, we have used realtime()
method provided by the Benchmark
library to measure the time taken by the request. It will return the number of seconds it took to complete the request.
Once I run this code I get Request took 0.398039 seconds
on the terminal. That means this library took 0.398
seconds to complete the task. Let’s make a POST request now.
POST request
require 'httparty' require 'benchmark' time = Benchmark.realtime do response = HTTParty.post('https://httpbin.org/post', body: { foo: 'bar' }) puts response.body end puts "Request took #{time} seconds"
Once I run this code I get Request took 0.435745 seconds
on the terminal. So, this means that the library took around 0.436
seconds to complete the request.
The documentation of the library is very well written and it explains each step with an example. Other than that you can find great tutorials from other websites on httparty
. This indicates the library has great community support.
HTTParty
can automatically parse response bodies in various formats, including JSON
, XML
, and YAML
, and return them as Ruby objects or hashes. Plus it can handle error messages by returning appropriate messages.
Overall any developer can kick-start his/her journey with this gem comfortably.
Faraday
This is another HTTP client that provides simple APIs for making HTTP connections with any web server. It has the capability to handle connection- timeout, errors, and it can even retry the request for you if the first connection could not go through successfully. The retry function is very helpful when it comes to web scraping. You can keep trying until a request status is 200
.
It also provides adaptors for Typhoeus
, Excon
and Net::HTTP
, it opens options for developers to choose an adaptor according to their own requirements.
Now let’s benchmark this library by making GET and POST requests.
require 'faraday' require 'benchmark' time = Benchmark.realtime do connection = Faraday.new('https://httpbin.org') response = connection.get('/get') puts response.body end puts "Request took #{time} seconds"
Once I run this code I get Request took 0.054039 seconds
on the terminal. That means this library took 0.054
seconds to complete the task. Let’s make a POST request now.
require 'faraday' require 'benchmark' time = Benchmark.realtime do connection = Faraday.new('https://httpbin.org') response = connection.post('/post', {foo: 'bar'}) puts response.body end puts "Request took #{time} seconds"
POST request with faraday took around 0.081 seconds. Well, the speed is just fantastic!
Apart from the speed the documentation of faraday is very well written. It explains every method it has to offer with an example. Faraday also uses a middleware architecture that allows you to modify requests and responses in a flexible and composable way. You can add or remove middleware to customize the behavior of your requests.
While scraping any website at scale you have to modify the headers on every new request for that faraday provides a simple way to set custom headers and options for your requests, such as authentication credentials, timeouts, and SSL settings.
When you search for faraday
on google, you will find many tutorials. This means that community support is also great for this library.
Overall, Faraday is a powerful and flexible library that can simplify the process of making HTTP requests and handling responses in your Ruby applications.
RestClient
It is another popular HTTP client library. With this library too you can make GET, POST, DELETE, etc requests to any http or https API endpoint.
RestClient also allows you to set a timeout for your requests, ensuring that your application doesn’t hang or become unresponsive if a request takes too long to complete.
Let’s see how this library performs with GET and POST requests.
require 'rest-client' require 'benchmark' time = Benchmark.realtime do response = RestClient.get 'https://httpbin.org/get' puts "Response code: #{response.code}" end puts "Request took #{time} seconds"
After running this code I am getting 0.173
seconds. Now, let’s see how this library performs with a POST request.
require 'rest-client' require 'benchmark' time = Benchmark.realtime do response = RestClient.post 'https://httpbin.org/post', { :param1 => 'value1', :param2 => 'value2' } puts "Response code: #{response.code}" end puts "Request took #{time} seconds"
It took around 0.1898
seconds to make the POST request.
Just like Faraday, RestClient also allows developers to set custom headers and parameters for HTTP requests, which makes it flexible and customizable for different use cases.
I did not find any major tutorials on RestClient and the documentation is not so well written.
Typhoeus
Typhoeus is a Ruby gem that can make parallel HTTP requests with ease. Since it is built on top of libcurl
library you can make asynchronous calls. It means you can make multiple API calls and then handle the response as they arrive.
Let’s check its performance with a GET request.
require 'typhoeus' require 'benchmark' time = Benchmark.realtime do response = Typhoeus.get('https://httpbin.org/get') puts "Response code: #{response.code}" puts "Response body: #{response.body}" end puts "Request took #{time} seconds"
So, it took around 0.1282
seconds to implement the request. Let’s check how it performs with a POST request.
require 'typhoeus' require 'benchmark' response_time = Benchmark.realtime do response = Typhoeus.post('https://httpbin.org/post', body: {foo: 'bar'}) puts "Response code: #{response.code}" puts "Response body: #{response.body}" end puts "Response time: #{(response_time * 1000).round(2)} ms"
The POST request took around 0.1153
seconds.
You will find the documentation of this library quite helpful. It explains everything right from installation to advanced methods with an example. You can even set the maximum concurrency of the request with it. By the way, the built-in limit of concurrency is 200.
If you are looking for a high-performance HTTP client then Typoeus could be one of the choices. Overall it’s a great library.
Excon
It is a pure Ruby HTTP client library that is built on top of the Ruby standard library Net::HTTP
. It can provide SSL/TLS encryptions, and streaming responses and you can make asynchronous parallel requests. Many famous Ruby frameworks like Fog and Chef also use this library.
Let’s check the performance of this library with a simple GET request.
require 'excon' require 'benchmark' url = 'https://httpbin.org/get' time = Benchmark.realtime do Excon.get(url) end puts "Time taken: #{time.round(2)} seconds"
So, it took around 0.23
seconds to make the GET request. Let’s perform a test with a POST request.
require 'excon' require 'benchmark' url = 'https://httpbin.org/post' payload = {key1: 'value1', key2: 'value2'} time = Benchmark.realtime do Excon.post(url, body: payload.to_json, headers: {'Content-Type' => 'application/json'}) end puts "Time taken: #{time.round(2)} seconds"
POST request took around 0.28
seconds.
The documentation is quite detailed which is great news for beginners. Excon is backed by a large community that keeps this library updated. Regular new updates are released to minimize any errors.
On the other hand, Excon does not come with built-in middleware for common tasks such as JSON parsing or logging. While this allows for greater flexibility, it may require more setup time. Excon has some advanced features that make the learning curve a bit steeper.
Results!!
Let’s compare all the stats and see who is the clear winner.
As you can see Faraday
is a clear winner in terms of speed. But it is in close competition with HttParty in terms of Stars marked on their GitHub repository. But overall Faraday is the winner due to its speed and great community support.
In terms of speed, HTTParty
is very slow in comparison with other libraries. But since it has great community support you can consider this library for smaller projects. You will find great tutorials on this library on the internet.
Conclusion
In this article, we examined the top five popular libraries in terms of their speed of execution and community support. And Faraday came out as the clear winner. But it does not mean that other libraries are not capable of building apps and scrapers. But it is advisable to use Faraday while building any web scraper as this library can speed up web scraping. Faraday is 87% faster than HTTParty which is just tremendous. Regular updates are made to this library to make this library even more powerful.
You can of course test them all at your end with the code snippets shared above. Speed will depend on the network but overall faraday will come out as the clear winner.
I hope you like this little tutorial and if you do then please do not forget to share it with your friends and on your social media.
Additional Resources
Here are a few additional resources that you may find helpful during your web scraping journey: