READING FROM DATABASES
12 Mar 2020
READING FROM DATABASES WITH PYTHON
Background – Reading from Databases with Python
This post will talk about several packages for working with databases using Python. We’ll start by covering pyodbc, which is one of the more standard packages used for working with databases, but we’ll also cover a very useful module called turbodbc, which allows you to run SQL queries efficiently (and generally faster) within Python.
pyodbc
pyodbc can be installed using pip:
Let’s start by writing a simple SQL query using pyodbc. To do that, we first need to connect to a specific database. In the examples laid out here, we will be using a SQLite database on my machine. However, you can do this with many other database systems, as well, such as SQL Server, MySQL, Oracle, etc. In the connection string, we specify the database driver (e.g. one of the database systems mentioned), the server, database name, and potentially username / password. If you have a trusted connection setup, then you can specify that (like in the first example below).
Reading SQL query with pandas
After we’ve made the connection, we can write a SQL query to retrieve data from this database. One way of doing that is using the pandas package. Below, we wrap the SQL code inside quotes as the first parameter of pd.read_sql. The second parameter contains our connection object.
Reading SQL query with pyodbc
Besides using pandas, we can execute a SQL query with pyodbc alone. In this case, we need to create a cursor object. A cursor is an object used to process the results of a SQL query.
Next, we can extract the results of the query by using the fetchall method. fetchall returns the rows of the query result into a list, rather than a data frame. Each element in the list corresponds to a row.
Alternatively, we can pull one row at a time using the fetchone method, like below. Each time we call this method, Python returns the next row in the query result. This can be handy when you’re dealing with a large dataset that you don’t want to bring entirely into memory at once.
For example, here we can print 10 rows, one by one, of the result dataset.
When we’re done with our database connection, it’s a good idea to close it, which we can do like this:
pypyodbc
pypyodbc is very similar to pyodbc, except that it is written in pure Python under the hood. It can also be installed via pip.
Our code above can be run exactly the same way, except we replace pyodbc with pypyodbc.
pypyodbc has a handful of methods that do not currently exist in pyodbc, such as the ability to more easily create Access database files.
turbodbc
Now that we’ve reviewed pyodbc, let’s talk about the turbodbc. On the surface, these packages have similar syntax. However, a primary advantage of turbodbc is that it is usually faster in extracting data than pyodbc. For example, turbodbc uses buffers to speed up returning multiple rows. The use of buffers, together with NumPy on the backend, combines to make the data type conversions faster between the database and the Python results.
It’s recommended to install turbodbc using conda install, like below. This will install all necessary dependencies across different platforms. If you don’t have conda, see here.
After that is done, the next step is to import turbodbc package.
Similar to the above examples, we can connect to our database using a one-liner command. We can use either the connection string, like previously, or directly specify the DSN (“data source name”).
After the connection is created, we can define a cursor object, like what we did earlier in this post.
Now, using the same syntax as with pyodbc, we can execute our same SQL query.
Lastly, let’s fetch all of the rows using the familiar fetchall methowed.
Conclusion
That’s it for this post! There’s much more to turbodbc and pyodbc. In a future post, we’ll explore their additional functionality.
Reference : http://theautomatic.net/2020/03/12/reading-from-databases-with-python/
Last updated