Items to consider to ensure an efficient and successful ingestion process
Introduction
TaxBit’s Manage Data Module enables you to upload information regarding end customers and their activity into the TaxBit Dashboard. Exporting information from internal systems into a CSV file and reviewing it for accuracy before upload can be tricky. As your service provider and partner, TaxBit aims to make this process as simple as possible.
This guide provides a comprehensive list of key items to watch for as you prepare your data. It also offers various tips and strategies to ensure an efficient and successful data ingestion process.
General
The following tips apply across all data types (i.e., Accounts, Transactions, Form Data) and their available CSV templates.
Use the Latest Template
Before preparing any data, it’s recommended to download the data ingestion template from the TaxBit Dashboard (Manage Data -> Ingest). Otherwise, an error will be returned if a file is loaded with incorrect headers.
TaxBit will continue to add additional functionality to support the various tax and information reporting needs of our customers. New functionality may require updates and/or additions to our available ingestion templates. As a result, it’s always best to confirm you are leveraging the latest template before uploading a file.
It’s also good practice to confirm the relevant data points (e.g., name, TIN, address, etc.) are in the appropriate fields before uploading your file.
Remove Additional Columns or Rows
CSV files may sometimes contain blank columns or rows. When saving data as a CSV file, ensure you remove any superfluous columns or rows. Each row should have the same number of fields as the header row to maintain data consistency.
Additional rows in a CSV file might slow down the ingestion process, as the system needs to review each row before moving on. Extra columns can also lead to header validation errors or other issues that could prevent the file from being successfully ingested.
Additional columns will cause a header validation error
Additional rows will slow down the ingestion process
Confirm IDs for Accuracy
When managing data sorted by a unique ID, it's crucial to review the data thoroughly before ingestion. IDs, especially those that are numeric strings, can become corrupted if the data is opened in programs like Microsoft Excel or Apple Numbers. This corruption typically occurs because these applications might misinterpret numeric strings as numbers, potentially altering the data format.
Below are some common challenges we’ve seen customers face when preparing data.
Lost Characters
Microsoft Excel can accurately handle numbers up to 15 digits long. For numbers exceeding this length, the trailing digits may be replaced with zeros if the file is opened in Excel. Saving this file, even as a CSV, will result in the altered ID being recorded with zeros replacing the original characters at the end of the string.
TaxBit's Manage Data Module is designed to raise a warning for instances where a long string of numeric characters ends with a zero. This feature helps to maintain data integrity during the ingestion process. Since this situation is raised as a warning and not an error, this record can still be ingested if you choose to "Approve" it.
Scientific Notation
When numbers are opened in Microsoft Excel, they can sometimes default to scientific notation. If that file is then saved, even if saved as a CSV file, the newly saved CSV file may retain the string in scientific notation (e.g., “1.8+e10”), rather than the original string of numeric characters.
Unfortunately, even if the ID is re-converted to a number before saving, Microsoft Excel can often only reproduce a rounded version of the original string of numeric characters — again, typically ending with a “0”.
TaxBit’s Manage Data Module will raise a warning to highlight any IDs that appear to be in scientific notation format. Since this situation is raised as a warning and not an error, this record can still be ingested if you choose to "Approve" it.
Duplicate IDs
It is important to check your file for any cases of duplicate IDs. Since there is no way to determine which ID contains the correct information, any duplicate ID will be treated as an error. This is typically an indication that IDs have been malformed due to some manual manipulation. It also may just be an indication that the source data contains duplicate records that should be reviewed before being ingested.
Duplicate IDs are treated as errors within TaxBit's system, rendering a record "Invalid." As a result, TaxBit will not be able to ingest any records that contain duplicate IDs. This ensures the integrity and uniqueness of each record processed by the system.
Account Owner & Account Information
Below are some additional items to consider specifically when preparing Account Owner and/or Account information.
Check for Leading Zeros (e.g., TINs, Postal Codes, etc.)
When a CSV file is opened in Microsoft Excel or Apple Numbers, these applications may automatically remove any leading zeros from data originally exported from internal systems. If the file is then saved, even as a CSV, the leading zeros will not be present in the newly saved file. This alteration can lead to issues with data integrity and accuracy, particularly for data where leading zeros are significant, such as in certain ID numbers or codes.
This is of particular importance for the following two fields –
- TIN (tax_id)
- Address Postal Code (address_postal_code)
Removing a leading zero from these fields can result in TaxBit being unable to generate a tax deliverable (e.g., Form 1099) or kick-off our TIN matching process.
If you need to manipulate data within a CSV before uploading it through Manage Data, the information can be reviewed and edited within a text editing application, such as Notepad++ or Visual Studio Code. Doing this helps avoid some of the problems caused by working with Microsoft Excel and Apple Numbers. The information can be copied from a text editing application and copied into an excel file using the Microsoft Excel Text Import Wizard.
By importing the relevant information in “Text” format, the data can be pasted into the excel file exactly as it appears in the CSV file and will include any leading zeroes.
Special Characters
When dealing with source information that includes special characters (such as á, ã, ä, ǎ, etc.), it's important to review the CSV file to ensure these characters are retained as expected. Special characters can frequently become corrupted when the file is opened in Microsoft Excel or Apple Numbers, as these applications might not fully support all character types. This corruption can lead to data integrity issues, especially if the special characters are essential for the accuracy of the information.
If unaddressed, it’s possible that some unexpected characters will be seen through the TaxBit Dashboard, which may also make their way onto certain Tax Deliverables (e.g., Forms 1099). This is of particular importance for Name and certain Address fields.
For those experiencing this issue, one recommendation is to review the Encoding on the CSV file. This can be found by opening the CSV with a text editing application, such as Notepad++ or Visual Studio Code.
Review US Postal Code Format
In order for a US address to be considered complete, TaxBit expects the information to be in one of the following two formats:
- 5 digits (e.g., 98102)
- 5 digits, followed by 4 digits, separated by a dash (e.g., 98102-6065)
US postal codes provided outside of this format (e.g., “981026065”, “98102.6065”, “98102 6065”, etc.) will be ingested but will not be considered complete, which may result in TaxBit being unable to generate a tax deliverable (e.g., Form 1099).
US State & Country Fields
When providing US Address information for an Account Owner, TaxBit expects the State field to be in one of the following two formats:
- Full State name
- 2-Character State Code
For additional detail, please visit https://www.irs.gov/pub/irs-pdf/p5594.pdf
Similarly, when providing Country information for an Account Owner, TaxBit expects the relevant Country field (i.e., address_country and tax_country_code) to be in one of the following two formats:
- Full Country name
- 2-Character ISO Code
For information on ISO Codes, please visit https://www.iso.org/obp/ui/#search/code/.
US State or Country information being provided outside of this format will be ingested but will potentially be considered incomplete, which may result in TaxBit being unable to generate a tax deliverable (e.g., Form 1099).