In a previous article, we explored how to import a CSV file into a table in MySQL. This is a continuation of that article, where we will explore how to split CSV files that contain data which is supposed to go into multiple tables.
Take the following table for example:
Username | Contact | Language Name | |
---|---|---|---|
johndoe | john@example.com | 91234567 | English |
janedoe | jane@website.com | 81234567 | Chinese |
marysmith | mary@smith.com | 97654321 | English |
bobsmith | bob@smith.com | 87654321 | Chinese |
If we are importing the data into a database, we may want to split the data across 2 tables to prevent data redundancy. The resulting table will look like this:
Lang ID | Language Name |
---|---|
1 | English |
2 | Chinese |
The tables can be represented with the following schemas:
Users | |
---|---|
PK | User_ID |
FK | Username User_Email User_Phone Lang_ID |
Languages | |
---|---|
PK | Lang_ID |
Lang_Name |
Article continues after the advertisement:
Importing the data into 2 tables
Before we begin, we will need to determine the order in which we will create the tables:
- Languages: This has to be created first, as the Users table has a foreign key that points to Lang Code.
- Users: When importing the users table, we will need to assign a Lang ID to it that represents the Language Name in the same column.
Importing distinct values into a table
For the Languages table, we only need to import each Language Name once (for a total of 2 rows). However, because there are 4 rows in our data, importing the data directly will cause 4 rows to be imported into our Languages table.
To ensure that the Language Name column does not duplicate itself, we can make the Lang_Name
column for our table UNIQUE
.
-- If the table has not yet been created.
CREATE TABLE `Languages` (
`Lang_ID` int AUTO_INCREMENT,
`Lang_Name` varchar(16) NOT NULL UNIQUE,
PRIMARY KEY(`lang_id`)
)
-- If the table has already been created, we can alter it.
ALTER TABLE `Languages`
MODIFY COLUMN `Lang_Name` varchar(16) NOT NULL UNIQUE
Hence, when we import the file (using the SQL below), duplicate columns will be discarded.
LOAD DATA INFILE '/path/to/your.csv'
IGNORE INTO TABLE `languages`
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
( @a, @b, @c, `Lang_Name`) -- @a, @b and @c are just throwaway variables.
Note that you will have to add an IGNORE
(highlighted above) at the beginning of your INTO TABLE
clause, to tell SQL to ignore duplicate entries.
If you find duplicate entries when you run your code, or if the entries do not get inserted into the database, try changing the LINES TERMINATED BY
clause to use \r\n
instead.
Looking up the foreign key of another table
For the Users table, we will need SQL to look up the Lang_ID
field of the Languages table when importing the rows. This can be accomplished by using a subquery in our LOAD DATA INFILE
command:
LOAD DATA INFILE '/path/to/your.csv' INTO TABLE `users` FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS ( `Username`, `User_Email`, `User_Phone`, @Lang_Name) SET `Lang_ID` = (SELECT `Lang_ID` FROM `Languages` WHERE `Lang_Name` = @Lang_Name);
Conclusion
Did the code above work for you? If they don’t, let us know in the comments below, so we can add them as solutions to this guide and help future readers.
Article continues after the advertisement: