Splitting a CSV file into multiple MySQL tables

In a previous article, we explored how to import a CSV file into a table in MySQL. This is a continuation of that article, where we will explore how to split CSV files that contain data which is supposed to go into multiple tables.

Take the following table for example:

Username	Email	Contact	Language Name
johndoe	john@example.com	91234567	English
janedoe	jane@website.com	81234567	Chinese
marysmith	mary@smith.com	97654321	English
bobsmith	bob@smith.com	87654321	Chinese

The Language Name column is redundant because it repeats across multiple columns.

If we are importing the data into a database, we may want to split the data across 2 tables to prevent data redundancy. The resulting table will look like this:

User ID	Username	Email	Contact	Lang ID
1	johndoe	john@example.com	91234567	1
2	janedoe	jane@website.com	81234567	2
3	marysmith	mary@smith.com	97654321	1
4	bobsmith	bob@smith.com	87654321	2

A User ID column is introduced to act as the surrogate p r i m a r y key.

Lang ID	Language Name
1	English
2	Chinese

Lang ID is introduced here to act as the surrogate primary key.

The tables can be represented with the following schemas:

Users
PK	User_ID
FK	Username User_Email User_Phone Lang_ID

Languages
PK	Lang_ID
	Lang_Name

Article continues after the advertisement:

Importing the data into 2 tables

Before we begin, we will need to determine the order in which we will create the tables:

Languages: This has to be created first, as the Users table has a foreign key that points to Lang Code.
Users: When importing the users table, we will need to assign a Lang ID to it that represents the Language Name in the same column.

Importing distinct values into a table

For the Languages table, we only need to import each Language Name once (for a total of 2 rows). However, because there are 4 rows in our data, importing the data directly will cause 4 rows to be imported into our Languages table.

To ensure that the Language Name column does not duplicate itself, we can make the Lang_Name column for our table UNIQUE.

-- If the table has not yet been created.
CREATE TABLE `Languages` (
	`Lang_ID` int AUTO_INCREMENT,
	`Lang_Name` varchar(16) NOT NULL UNIQUE,
	PRIMARY KEY(`lang_id`)
)

-- If the table has already been created, we can alter it.
ALTER TABLE `Languages`
MODIFY COLUMN `Lang_Name` varchar(16) NOT NULL UNIQUE

Hence, when we import the file (using the SQL below), duplicate columns will be discarded.

LOAD DATA INFILE '/path/to/your.csv'
IGNORE INTO TABLE `languages`
FIELDS TERMINATED BY ',' 
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
( @a, @b, @c, `Lang_Name`) -- @a, @b and @c are just throwaway variables.

Note that you will have to add an IGNORE (highlighted above) at the beginning of your INTO TABLE clause, to tell SQL to ignore duplicate entries.

If you find duplicate entries when you run your code, or if the entries do not get inserted into the database, try changing the LINES TERMINATED BY clause to use \r\n instead.

Looking up the foreign key of another table

For the Users table, we will need SQL to look up the Lang_ID field of the Languages table when importing the rows. This can be accomplished by using a subquery in our LOAD DATA INFILE command:

LOAD DATA INFILE '/path/to/your.csv'
INTO TABLE `users`
FIELDS TERMINATED BY ',' 
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
( `Username`, `User_Email`, `User_Phone`, @Lang_Name)
SET `Lang_ID` = (SELECT `Lang_ID` FROM `Languages` WHERE `Lang_Name` = @Lang_Name);

Conclusion

Did the code above work for you? If they don’t, let us know in the comments below, so we can add them as solutions to this guide and help future readers.