Solved a Major Engineering Issue This Week Related to How to Store Images in a Database and How To Display User Images Back To The User.
This week I solved an Engineering issue related to storing images in the database that not too many front end or back end website developers have been able to solve.
This topic of storing Images in the Database came up in the context of Advertising so I thought it was time to finally solve this issue.
There are 2 ways to store images in a Database, see METHOD 1 and METHOD 2 below and how to display those images back to the user client.
METHOD 1: Store your images into a common system folder.
METHOD 2: Store your images as actual user records in the database.
METHOD 1
I don’t recommend this method and many people on the internet will tell you this is the only way to go. Wrong.They just don’t know how to actually solve the issue.
The first and most common method is to store all images and files into a common folder system outside of the database. Why have a database if you store half of the user data outside of the database? That makes no sense at all yet half the internet seems to be implemented this way.
So called Internet Expert Developers will tell you to go with this method and they will argue vigorously for this method. Their main argument is “oh the overhead will be horrendous on the system”.ย
Sorry, the overhead turns out to be minimal. And if the system experiences slowness most likely the system is simply undersized: not enough memory usually or not enough something allocated to the system. The system log files will tell you what you need more of.
Method 2 as it turns out, requires mastery or understanding ofย a highly complex engineering level solution to get it to work.ย The actual solution as it turns out, is rather simple to implement in the end and is certainly easier to implement than method 1 with all it’s required work around fixes and requires almost no overhead at all.
The method 2 solution was beyond my learning curve a year ago so I went with Method 1 as far as storing images and user files into a common folder system in the database.
And all was fine until problems eventually started presenting themselves though system usage.
Method 1 works but it has many problems.
- First it’s not a very Scale-able option.
- Second, what’s the point of having a database if half the data in the database is stored outside of the database into a common system file system. That’s just a broken bad design.
- Users can’t maintain their own files. A system admin has to get involved with each user.
One reason is, they probably have not figured out how to actually store a record in the database and using a common file systemย is their work around.
When you store a file into the database, it is stored as a binary blob file the infrastructure all works great for getting the data from the user interface over and into the database.
Works great for storing the file but the real issue is reading that binary blob file and getting the original file type back out of the system.ย Of course you do store all the information about that file along with the blob. You at least know that was a doc file or a pdf file or a image file of some type. The only problem is, getting the original file back to the user. The infrastructure does not work so well in that direction, lots of issues.
So the work around for that is just store the original files into a common folder in the system file system someplace.
You will see this implemented when Year and Monthly numbered folders start appearing in the file system such as folders numbered like 2020 -> 01, 02, …, 08 for example.
To display those images back to the user, you have to record the system path where the browser can find that image or file. This path is stored in the user database as a record in the database.
It’s just crazy to store a path to the file you want to serve up to the user as a user record in the database but then store the file the user wants outside of the user database into the system file system. Why not just store the user file inside the database and let the database do the work databases were designed to do,ย manage user files and data.
It works but it has problems. For example, 2 users submit the file: resume.doc and both files go into the same common folder on the system. Uh-oh. Problems Will Rogers. Problems.
Two files with the same name will result in 1 file being overwritten and the resume of the first user will be lost to the second user.
So now you need an engineering work around for this new problem due to a common file system folder solution.
So to prevent file overwrites you need to give each file name a time stamp or hash of some kind of your own design so that the 2 files do not over write each other.
Well that worked for a while. But as soon as we started getting minimal user usage,we started getting copies of the same user file showing up in the system common folder directories with different time stamps and hash codes. For example, a user uploads an image for their profile. Great but then every time they press save, it creates a new copy of the same image into the file system with a new date time stamp and a new hash code added to the file name.
So the impact to system Memory from duplicate files can and would have eventually been an issue and a maintenance nightmare for me to maintain.
Sure PHP on the server side can reach into the file system and maybe do some file compare operations to see if the new user file just now being saved is identical in content to any of the files already in the file system, but with million of users already in the system, this system search operation every time someone presses the save button is another major impact on the system performance. All due basically to the poor design of storing user data outside of the user database.
So I let it run for a year like this. Method 1 ultimately proved problematic and I had a new need. I need to store clean image files into the database and I wanted the database do what it was designed to do, manage user data, manage the data cleanly and without requiring a bunch of bug fix patches and workarounds.
So it was time to revisit METHOD 2.
METHOD 2
METHOD 2: Store your images as actual user records in the database.
The second method involves storing images and files as individual user records inside the database itself.
With this method you no longer need to store a path to the file system where the user’s browser can find the image. You just send the user the image directly from the database.
I tried method 2 a year ago but as it turns out, METHOD 2 is a highly complex engineering problem with many piece parts and a year ago, I did not fully understand all the piece parts so after several failed prototype implementations I relented and went with the far simpler but more inferior and what proved eventually to be problematic METHOD 1. I was under a time crunch and Method 1 simply bought me some time while I figured out the real solution Method 2.
There are a lot of benefits with the Method 2 option that are not open to us with Method 1. The user themselves can now save, delete and update user files as needed and we can let the database do all that work for us.
And the user can now be in charge of doing any saving, deleting or updating of their files in the database without requiring a system admin to get involved.
Save, delete and update is not open to us with Method 1 and requires system admin intervention so Method 1 is a more costly solution.
To implement method 2, I only needed 2 days to solve all the issues and to implement this nice clean solution on Mapertunity.
With method 2, the user selects a file on the web page they are interacting with. That web page then sends the file from their PC over to the server. The server then reads that file from a temp directory into a PHP variable. So far Method 1 and Method 2 steps are the exact same.
Method 1 would now move that file to the common folder system and give it a time stamp and hash code and we would store the path to this file as a record in the user database. And that file is then mixed into a common folder with a million other user files all named “resume_timestamp_hash.doc. And only the system admin has access to the system folder system to be able to maintain these millions of resume files.ย
Talk about a needle in the haystack.
Hey admins, you don’t want that maintenance nightmare on your plate every day.ย Use Method 2 and let the database do all that work for you.
As part of Method 1 we can also capture all the information we know about that file and store it as a record in the user database along with the file itself. We might need that later on once we figure out method 2. But for Method 1, storing the file as a database record is not necessary. I just did it for my own personal “we might need that later on” just in case convenience.
Here is where the fork between Method 1 and Method 2 occur.
In Method 2 we now convert that file which is in binary form, to a text only format. So all the non-printable characters have been recoded as text.
We don’t have to convert the binary file to text encoding at this point, the database blob data type will store a binary file as a blob just fine but it helps out when serving up this image back to the user. So a bit of optimization going it at this step.
Also it depends on your datamodel in the the database.
For a single stand along image table, it’s ok to store it as binary and do the binary to text encoding just before serving the image up to the user.
My datamodel don’t allow me do to that just yet, I might change my datamodel down the road, but for now I do the encoding up front.
We simply store this text file in the database.
So what is the issue with Method 2. The root issue is, the server side PHP sends JSON encoded data back to the user.
JSON does not like binary data. JSON likes text data.
Files and Images are binary data. So the JSON transfer from the server will fail. Method 1 is the simple work around for this issue. Method 2 solves the binary issue with JSON.
JSON likes text data and the binary data type will cause JSON to throw errors. So the data transfers fail every time.
Convert the binary data to text data and the JSON encoding and server to client data transfer will then work fine.
So I used a step-by-stepย approach to solve the issues with Method 2.
Send a user file to the server. The webpage form input type file is optimized to communicate binary data to the server no problem. The problems occur in the opposite direction from the server to the client.
Capture all the file information on the server side.
Convert the binary file data to a text encoded file.
Store the file name, type and data as a user record in the data base not the file system. You no longer need to store a file system path as nothing is stored on the file system.
When the user request that file:
Read the user record.
Send the text file data to the user client.
Display the user image and specify what kind of binary-to-text encoding was used. The Browser does the rest of the work for me.
And you are good to go.
Doug Ayers
Co-Founder
Lead Software Engineer