Importing CSV files into a MySQL table

Splitting a CSV file into multiple MySQL tables

In a previous article, we explored how to import a CSV file into a table in MySQL. This is a continuation of that article, where we will explore how to split CSV files that contain data which is supposed to go into multiple tables.

Take the following table for example:

UsernameEmailContactLanguage Name
johndoejohn@example.com91234567English
janedoejane@website.com81234567Chinese
marysmith mary@smith.com 97654321 English
bobsmith bob@smith.com 87654321 Chinese
The Language Name column is redundant because it repeats across multiple columns.

If we are importing the data into a database, we may want to split the data across 2 tables to prevent data redundancy. The resulting table will look like this:

User IDUsernameEmailContactLang ID
1johndoejohn@example.com912345671
2janedoejane@website.com812345672
3marysmith mary@smith.com 97654321 1
4bobsmith bob@smith.com 87654321 2
A User ID column is introduced to act as the surrogate primary key.
Lang IDLanguage Name
1English
2Chinese
Lang ID is introduced here to act as the surrogate primary key.

The tables can be represented with the following schemas:

Users
PKUser_ID



FK
Username
User_Email
User_Phone
Lang_ID
Languages
PKLang_ID
Lang_Name

Article continues after the advertisement:


Importing the data into 2 tables

Before we begin, we will need to determine the order in which we will create the tables:

  1. Languages: This has to be created first, as the Users table has a foreign key that points to Lang Code.
  2. Users: When importing the users table, we will need to assign a Lang ID to it that represents the Language Name in the same column.

Importing distinct values into a table

For the Languages table, we only need to import each Language Name once (for a total of 2 rows). However, because there are 4 rows in our data, importing the data directly will cause 4 rows to be imported into our Languages table.

To ensure that the Language Name column does not duplicate itself, we can make the Lang_Name column for our table UNIQUE.

-- If the table has not yet been created.
CREATE TABLE `Languages` (
	`Lang_ID` int AUTO_INCREMENT,
	`Lang_Name` varchar(16) NOT NULL UNIQUE,
	PRIMARY KEY(`lang_id`)
)
-- If the table has already been created, we can alter it.
ALTER TABLE `Languages`
MODIFY COLUMN `Lang_Name` varchar(16) NOT NULL UNIQUE

Hence, when we import the file (using the SQL below), duplicate columns will be discarded.

LOAD DATA INFILE '/path/to/your.csv'
IGNORE INTO TABLE `languages`
FIELDS TERMINATED BY ',' 
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
( @a, @b, @c, `Lang_Name`) -- @a, @b and @c are just throwaway variables.

Note that you will have to add an IGNORE (highlighted above) at the beginning of your INTO TABLE clause, to tell SQL to ignore duplicate entries.

If you find duplicate entries when you run your code, or if the entries do not get inserted into the database, try changing the LINES TERMINATED BY clause to use \r\n instead.

Looking up the foreign key of another table

For the Users table, we will need SQL to look up the Lang_ID field of the Languages table when importing the rows. This can be accomplished by using a subquery in our LOAD DATA INFILE command:

LOAD DATA INFILE '/path/to/your.csv'
INTO TABLE `users`
FIELDS TERMINATED BY ',' 
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
( `Username`, `User_Email`, `User_Phone`, @Lang_Name)
SET `Lang_ID` = (SELECT `Lang_ID` FROM `Languages` WHERE `Lang_Name` = @Lang_Name);

Conclusion

Did the code above work for you? If they don’t, let us know in the comments below, so we can add them as solutions to this guide and help future readers.


Article continues after the advertisement:


There are 2 comments:

Leave a Reply

Your email address will not be published. Required fields are marked *

Note: You can use Markdown to format your comments.

This site uses Akismet to reduce spam. Learn how your comment data is processed.