-
Notifications
You must be signed in to change notification settings - Fork 2
Resume #37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Resume #37
Conversation
Code to parse pdf into categories.
anthonyfabius
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed each files. The main things I noticed that need to be fixed:
- Some random artifacts are still present that we don't want
- Making the parsing output a json file and not a txt
- Removing the settings page changes
- Naming conventions of various files and variables
- Sparce comments in hard to understand areas
PDFParse/Hongwei_resume.pdf
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want to remove this
PDFParse/output.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output file should be JSON so that is is compatible with chrome local storage. So instead of a output.txt we should have a output.json. Later on we will wont even want an output file and just have it write straight to local storage. One way to format this text as json would be
{
"name": "HONGWEI LI (DILLON)",
"phone": "(917) 702-5588",
"email": "[email protected]",
"location": "New York, NY",
"summary": "A highly motivated student seeking challenging summer software internship opportunities.",
"education": [
{
"school": "Rensselaer Polytechnic Institute",
"location": "Troy, NY",
"degree": "B.S. in Computer Science",
"gpa": "3.26/4.0",
"graduation_date": "May 2026",
"relevant_courses": [
"Data Structures",
"Computer Science I",
"Business and Management"
]
},
{
"school": "Stuyvesant High School",
"location": "New York, NY",
"degree": "High School Diploma",
"gpa": "3.5/4.00",
"graduation_date": "Jun 2022",
"relevant_courses": [
"Computer Science",
"Technical Drawing",
"3D Modeling",
"Analog and Applied Electronics"
]
}
],
"professional_experience": [
{
"position": "CodePath Android Development",
"location": "New York, NY",
"dates": "Feb 2023 – Apr 2023",
"description": [
"Learn the basics of Android development including IDEs, Kotlin language, and debugging.",
"Design and architect a functional multi-screen mobile app in a group setting."
]
},
{
"position": "StuyPulse FIRST Robotics Team",
"location": "New York, NY",
"dates": "Sep 2019 – Jun 2022",
"description": [
"Designed and built prototypes of a feeding mechanism that allows the robot to collect and store balls for shooting.",
"Gained hands-on experience in machines such as drill press, laser cutter, bandsaw, and belt sander."
]
},
{
"position": "Technical Project",
"location": "New York, NY",
"dates": "Sep 2021 – Jun 2022",
"description": [
"Designed and built circuitry to adjust the LED and seven-segment display using IC chips, ceramic capacitors, resistors, potentiometers, and buttons.",
"Utilized AutoCAD to design a 3D water bottle jug and created a multi-view drawing with dimensions.",
"Laser cut a wooden, finger-jointed light box with engraved images and cut-outs."
]
},
{
"position": "SAT Prep Center",
"location": "New York, NY",
"dates": "Jul 2020 – Aug 2021",
"description": [
"Tutored multiple groups of 15 students to enhance their mathematical and English comprehension skills."
]
}
],
"leadership_development": [
{
"organization": "Society of Asian Scientists and Engineers (SASE)",
"location": "Troy, NY",
"role": "Member, Mentee",
"dates": "Sep 2022 – Present",
"description": [
"Developed professional experience and network through participating in various conferences, workshops, and social events.",
"Aiming to join the Public Relations Committee to raise awareness of social and service events to the community."
]
},
{
"organization": "Eighth Wonder Dance Club",
"location": "Troy, NY",
"role": "Treasurer",
"dates": "Sep 2022 – Present",
"description": [
"Handled all income and expenditure of the club while tracking the club funds in a financial report to organize and plan future club events.",
"Will be coordinating with school officials and monitoring events to request and raise funds for the club."
]
},
{
"organization": "PSAL Stunt & Cheer",
"location": "New York, NY",
"role": "Cheerleader and Back spotter",
"dates": "Sep 2018 – Jun 2022",
"description": [
"Participated in 5 cheerleading events to support the school’s sports games.",
"Supervised and mentored new stunt members to facilitate their transition to the team.",
"Successfully organized 3 in-school fundraisers for the team to participate in national competitions."
]
}
],
"skills": {
"technical": ["Python", "HTML", "Kotlin", "Scratch", "AutoCAD"],
"prototyping": ["3D printer", "laser cutter", "horizontal bandsaw", "drill press"],
"languages": ["Proficient in English and Mandarin"]
}
}Essentially we want to break it down into pieces so that when the AI model attempts to autofill something like work experience, it can take the json sections called "Professional Experience" and easily see what goes where as we have sections for the position, title, dates, description, etc. Also txt files aren't compatible with local storage and we definitely don't want to parse a resume every time we want to fill in data.
PDFParse/package.json
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there another package.json? Shouldn't you add these dependencies to the package.json in the working directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Git ignore this
PDFParse/pdfsort.js
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Git ignore this
PDFParse/pdfparse.ts
Outdated
| const textContent = data.text; | ||
|
|
||
| // Create output file path | ||
| const txtPath = path.join(__dirname, 'output.txt'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Output as json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entire modification is in the wrong branch. You probably want to include this in the SettingsPage branch and not this one.
parsePdf.ts
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parsePdf and pdfparse... probably some better ways to name these 2 files. As for content I am loosely understanding the workflow of this function and the ones it calls. Adding a single line comment to the top of all the pdf parsing files that explain what each do would be helpful.
PDFParse/pdfsort.ts
Outdated
| function isName(line: string, isFirstNonEmptyLine: boolean): string | null { | ||
| const trimmedLine = line.trim(); | ||
| return isFirstNonEmptyLine && trimmedLine.length > 0 ? trimmedLine : null; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the point of this function?
PDFParse/pdfsort.ts
Outdated
| // Check if the line is the name | ||
| const name = isName(line, isFirstNonEmptyLine); | ||
| if (name) { | ||
| parsedInfo.push({ category: 'Name', content: [name] }); | ||
| isFirstNonEmptyLine = false; // Set to false after finding the first non-empty line | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is "name" supposed to be. At first glance I keep thinking this is the name of the person but then I think its a position name or a section heading.
Changed the output to be inside a json file instead of a txt file
added some test cases, and made the final output into a json file
Got rid of the npm dependencies nested in PDF Parse. Also uncached all pdf parse files and git ignored the ones we don't care about.
Removed duplicate/useless files that are in wrong folders.
my bad 💀
instructions for ease of use of the program(s)
changed how the pdf parser works, by changing the sorting through keywords.
added more keywords to help parse
cut down on the amount of files by combining pdf parse and pdf sort
The code currently parses the string of the cover letter into the name and body only. Co-Authored-By: mik0lam <[email protected]> Co-Authored-By: MBtheOtaku <[email protected]>
updated instructions to reflect new code
Changed up the cover letter parse, is now a stable prototype, has name body and address sections. Co-Authored-By: MBtheOtaku <[email protected]> Co-Authored-By: hli2238 <[email protected]>
got rid of unnecessary files and lines of code
Added a prototype to pull the resume from the database
Modified the code to also implement the db function in the cover letter parser so the code can take in a file from the database instead of a manual inputted string Co-Authored-By: mik0lam <[email protected]> Co-Authored-By: MBtheOtaku <[email protected]>
Added a function to store the parsed info into the database. Co-Authored-By: mik0lam <[email protected]> Co-Authored-By: MBtheOtaku <[email protected]>
This function saves the parsed data into the database for future use just like in the cover letter parser.
Pull Request
Description
Please include a summary of the changes and the related issue (if applicable). Please also include relevant motivation and context. List any dependencies that are required for this change (if applicable).
I added a dropdown list of gender, Veteran status, Ethnicity, and Disabled status, which are usually asked by companies in the recruiting process.
Fixes # (issue)
Fixes issue 16 with the addition of more settings that could be inputted by the user.
Type of change
How Has This Been Tested?
Please describe the tests that you ran to verify your changes.
I ran the code in the extension, verifying the choices.
Checklist
Try to checkoff as much as possible if not everything!