Fixing Data Format with Python Generator
How can I fix the data format from a file using a Python generator?
How can I correctly convert JSON and Tab-separated values (TSV) lines into dicts in Python?
Why am I receiving an IndexError when trying to parse the data from the file?
Guidance on Fixing Data Format Issue:
The generator function name is gen_fix_data, not gen_fixed_data. Please update the function name to gen_fix_data.
To illustrate the code for fixing the data format, you can refer to the corrected version as shown below:
Illustrating the Program:
import json
import pandas as pd
def gen_fix_data(data_iterator):
for each_line in data_iterator:
if each_line.startswith('{'):
convertedDict = json.loads(each_line)
yield convertedDict
elif each_line.startswith('\t'):
values = each_line.replace('\n', '').split('\t')
Dict = {"company": values[0], "catch_phrase": values[1], "phone": values[2],
"timezone": values[3], "client_count": values[4]}
yield Dict
with open('assets/companies_small_set.data', 'r') as broken_data:
df = pd.DataFrame(data=gen_fix_data(broken_data))
print(df)