0% found this document useful (0 votes)

7 views33 pages

Synthetic Data

The document discusses the application of the INSPIRe framework for advanced code generation using large language models (LLMs) to create a synthetic data generator for fictional product reviews. It outlines the six steps of the framework, emphasizing the importance of prompt engineering and iterative processes in code generation. The article serves as a practical guide for leveraging LLMs to automate coding tasks, highlighting the significance of clear objectives and structured prompts.

Uploaded by

s079868623456

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

7 views33 pages

Synthetic Data

Uploaded by

s079868623456

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 33

‘916/24, 240 PM ‘Advanced Code Generation With LLMs — Building a Synthetic Data Generator | by Nabil Alouant | Apr, 2024 | Towards Data Science archive.today S#ved trom hipsinowardsdsiasdence cvadvancod-cosogereraionsit-ine baie] search] ‘4p 2024 21:52:47 UTC OO Medium © seen F write ( )) signin + Me Advanced Code Generation With LLMs — Building a Synthetic Data Generator Applying the 6 steps of the INSPIRe framework to accelerate your code generation (ChatGPT-4 — Claude 3 — Gemini) @ Nabil Alouani Follow ID oisrodin towards DataSconce « Siminroad + Bhours 90 I've never written a Data Science project from start to finish. Yet anything you can do inside a Jupyter Notebook I can do too. Yes, really, anything. hitpssarchive.phvV2GnP- 199‘1624, 2:40PM Intps:ifarchive.phWVzGnP ‘Advanced Code Generation With LLMs — Builsing @ Synthetic Data Generatr | by Nabil Alouan | Apr, 2026 | Towards Data Science Bragging aside, this is one of the most significant transformations AI brought us in the last two years. LLMs turned code into a commodity; something you can use as a tool instead of a skill. Allyou need are five ingredients: 1. Data literacy. 2. A pinch of logic. 3. A knack for trial and error. 4, Prompt Engineering. 5. The INSPIRe framework. If you're wondering what the heck INSPIRe might be, you should read the first part of this series. You'll find an introduction to the framework and how to apply it. If you're feeling adventurous, however, the quick recap in the following section should suffice. ‘https:/medium.com/@nabil-alouani/6- In this article, we'll dive into a concrete example of the INSPIRe framework. We'll build a synthetic data generator that produces fictional reviews. Why synthetic data? Why fictional reviews? Synthetic data generation plays a key role in training and fine-tuning LLMs — and LLMs are one of the most sought-after technologies in today's market. These models excel at analyzing and interpreting product reviews and social media comments, capabilities highly valued by your clients. Touching on both topics is like lighting two candles with one match. From *writing* code to *generating* code ve argued Prompt Engineering is the hottest programming language of 2024. That's mainly because LLMs allow you to write code in plain English. Coding in natural language is the cornerstone of the INSPIRe framework. 2133‘1624, 2:40PM Intps:ifarchive.phWVzGnP ‘Advanced Code Generation With LLMs — Builsing @ Synthetic Data Generatr | by Nabil Alouan | Apr, 2026 | Towards Data Science INSPIRe is designed to help you automate code generation through ideation and prompt engineering. It's not some theory forcing its way into practice. It’s an iterative process born from trial and error. Here's a reminder of the six steps: + (W) Identify: Determine your goal and the requirements needed to achieve it. Begin with an elaborate first prompt. + (N) Narrate: Convert your instructions into code through clear short prompts. + (S) Screen: Review each code snippet and correct the errors. Test and adjust. + (P) Polish: Refine your code to improve it. Iterate as much as you need + (Integrate: Assemble your code snippets into a cohesive and elegant program, + (Re) Restart: When you're done (or when you hit a wall), start a new loop. When running INSPIRe, you want to break down your code into multiple snippets. For each snippet you generate, go through all of the steps. Rinse and repeat until a proud smile appears on your face. Here's why INSPIRe works: + LLMs learned to generate code from training on millions of code snippets. + Prompt Engineering allows you to “write better and better instructions until Al does exactly what you want.” + The iterative improvement built into INSPIRe makes it reliable and grounded in real-world applications. In short, you have an idea. You turn that idea into prompts and then, you turn your prompts into code. You test, adjust, and move forward, laying your code brick by brick. Ready for the example?‘1624, 2:40PM Intps:ifarchive.phWVzGnP ‘Advanced Code Generation With LLMs — Building a Synthetic Data Generator | by Nabil Alouani | Apr, 2024 | Towards Data Science Let’s create a synthetic data generator ‘The goal is to produce a synthetic dataset made of product reviews. Picture a CSV file with user names, email addresses, ratings, and comment We'll use two LLMs to achieve our goal: + ChatGPT-4: we'll use it to generate the code. + Mixtral-8x7B: The code we generate will “call” Mixtral-8x7B through API to create synthetic data. For simplicity, I won't display all of the code snippets I generated. You'll find a Jupyter Notebook with all the iterations in my Github profile linked at the end. The main goal is to show you what INSPIRe looks like in practice. You don't have to digest every single line of code. Focus on the logic of code generation using LLMs. Lets get started. 1. Identify Your first step is to understand what your code needs to accomplish, Once you grasp the problem, you're halfway to finding a solution. In the case of synthetic data generation, your goal has two levels of abstraction. + On the macro level, you want to generate synthetic data. + On the micro level, you want to generate code that achieves your macro goal (generating synthetic data), The idea is to express both of these goals inside a single prompt. Your first prompt is the most important one. It'll ground your model into a specific context that your LLM can refer to when generating code. Once you submit your first prompt, you'll start interacting with your model ina conversational manner. The quality of your first prompt influences how ‘well your LLM handles follow-ups.‘1624, 2:40PM Intps:ifarchive.phWVzGnP ‘Advanced Code Generation With LLMs — Building a Synthetic Data Generator | by Nabil Alouani | Apr, 2024 | Towards Data Science For the best start, your initial “Identify” prompt should explore the following key areas: + The objective of your code: what do you want to achieve? + The context: is it data generation, analysis, or processing? This helps the model pick the right packages and frameworks. + The requirements your model needs to learn/apply: such as a new syntax, a specific API, or rules to apply, like GDPR. If you feel hesitant when first expressing your goal, remember the best way to express a goal is to get out of your head. Start with a one-liner. “Write code that generates synthetic text data,” is not so bad. Now you have a first draft you can improve with a bit of work. Ignore the urge to dive into code generation right away because every minute you invest in “Identify” compounds into an exponential gain of time, Let's illustrate. We said our objective is to “Write code that generates synthetic text data.” As soon as you write the objective, questions will start popping into your head: + What type of data? + How many features? + What model do you want to use? Here we'll use the Mixtral-8x7B model through the Octoai API. + Is your LLM local or API-hosted? + Isthere a specific syntax you need to be aware of? Here we'll copy-paste a snippet from Octoai.cloud to teach the model how to call Mixtral-8x7B, + What version of Python are you using? + Which dependencies to take into account? You also want to write flexible prompts for increased efficiency. That way, you don't have to start over whenever you mess up an instruction‘1624, 2:40PM Intps:ifarchive.phWVzGnP ‘Advanced Code Generation With LLMs — Builsing @ Synthetic Data Generatr | by Nabil Alouan | Apr, 2026 | Towards Data Science Flexibility in prompt engineering is all about using placeholders . Placeholders allow you to turn a given prompt into a template you can adapt. ‘Too abstract? Here’s an example: tec le of 2 templated prompt during the IDENTIFY step] + Rote: ‘act Like an expert software engineer who specializes in «I Your role 4s to help ne achieve the following objective: Reason step by step to make sure you understand the user intentions before you t Make sure the code you write and/or edit is clear and wetl-coneented, Write coda that adheres to the bast practices Jn When you first respond, acknowledge the snstructions you've been given then ask + specifics: When the user indicates specific syntax and functions, make sure to renenber the Assume the user provides the correct syntax but alaays verify indentations, symb . + Format: Give a clear title to each cade snippet you generate: For exaaple you can title the first code snippet "Smippet #2 version 1." If you ecit “Snippet #1 version 2.0" then the output should be called "Snippetel If you move to a new function or piece of code, you should name 4¢ "Snippet#? ve When you interact with the user in natural Language, use Line breaks, titles, an + Inputs cobjective> = Generate synthetic text data using the Octoai API and the Mixtral~ “ = Python 3.21.5 ceore.specifics> = Here's the exact syntax you need to use to call the Getoai AF inport octoai from ectoai.client ‘import Client cLient = Client(Token) completion = cLient.chat. completions .create( message ‘ role": “systen", content": "You're a heloful assfstant!" b ‘ role": "user" content": "what's the best way to enjoy @ cold coffee brew?" , 1 odel="nixtral-Sx7b-instruct=fp16", mmax_tokens=2000,‘1624, 2:40PM ‘Advanced Code Generation With LLMs — Building a Synthetic Data Generator | by Nabil Alouani | Apr, 2024 | Towards Data Scisnce presence_penalty*o, omperature=6.1, output = json.dunps (complet ion.dict(), tnd peine(output) Notice placeholders like and and how the prompt contains values for each in a dedicated section called “Inputs.” ‘This means you can conserve the entire body of the prompt and tweak the “Input” section to switch the content variable: dynamic prompt. Now you got yourself a Quick note: if you don't like , you can opt for another one such * {placeholder_1} + #placeholder_2# + [placeholders] * iiplaceholder_4// Another quick note: Mixtral-8x7B is an open-source model that's available for free on HuggingFace. You can host it on your computer. But if you're GPU-poor like me, you can use an API like OctoAl’s to access the model for less than pennies. Let's continue to the second step. 2. Narrate Your second task is to break down your main goal into instructions written in plain English. You want to write a chain of logical steps that take your code from point A to point B, The smaller the step, the better. Be clear, specific, and redundant with your instructions. Intps:ifarchive.phWVzGnP 7133‘1624, 2:40PM Intps:ifarchive.phWVzGnP ‘Advanced Code Generation With LLMs — Building a Synthetic Data Generator | by Nabil Alouani | Apr, 2024 | Towards Data Science Why redundant? The longer you interact with your model, the longer the context window will get — and redundancy improves your model’ ability to “remember” past instructions. There are many techniques you can use to tell your LLM which steps it needs, to follow to generate the desired code. Here are a few examples: + Think from least to most: First you write your final goal. Then you break it down toa smaller version of itself. It like building an MVP of your idea and then building on it. + Reverse engineer your idea: Start from the end goal and build your reasoning backward. + Use the process of elimination: Write every instruction that comes to mind then remove the bad ones, Reorder what remains, and you're good. to go. + Use a mind map: What's a mind map? It’s a visual tool that allows you to brainstorm and organize your thoughts. Use a sheet of paper, an app, or a chat tab with your favorite LLM. Either way, build a set of step-by-step instructions to formulate your intentions. Here's an example using the least-to-most method (my favorite): Initial goals: Generate 2,009 rows of sync lati raw data that contain detailed re “Least-to-nost™ goal #0: Call the Wixtral-Gx7B sodel once to generate one row of The one row of data 1s a review about a specific product Coment: note that your initial goal is equivalent to the finat goal -- or at le You have to scale your goal down to 9 staple straightfonard instruction and bull Least-to-most™ goal #2: Call the Wixtral-Sk7B sodel once to generate one row of ‘The one row of data is a table that contains one row an Fiction: 2 Fictionat email adress = 3 Fictional rating 2 Fictional review about a product such as a vk he‘1624, 2:40PM Intps:ifarchive.phWVzGnP ‘Advanced Code Generation With LLMs — Building a Synthetic Data Generator | by Nabil Alouani | Apr, 2024 | Towards Data Scisnce “Least-to-nost™ goat #2: Call the Wixtral-Gx7B sodel once to generate 5 rows of The final output data is a table that contains Five row = 8 Fictional username = 3 Hetionat enasl acress = a Retional rating = a Fictional review about a product such as a "VR he to-nost™ goat 42: Call the Mixtral-Sx7B sodel 2 times (create 2 sinple le The Final output data is a table that contains 10 roes ~ Fictionat usernace = a fietional email adress = 8 fictional rating = 2 Fictional review about a product such 25 a "VR he MLeast-to-most™ goal #4: Call the mixtrat- (the idea is to turn sodel X tines (create 2 Loop base umber of rows into a vrariabl is 9 table that contains XAY rows The Final outoue = Fetionat usernase = 3 Fetionst emast adress = 3 fetiona = a fictional review about @ product such as a "VR he The “Narrate” step is like what your CS professor once told you. “First, you write your code in plain English, and then turn it into code.” The only difference here is you outsource the second step. For the data generation example, we can start with two simple assumptions 1. You're familiar with the syntax used to call the Mixtral-8x7B model through the OctoAl API. You probably copy-pasted it from the OctoAL website, but you also read it to understand which variables you can tweak. 2, You have Prompt Engineering knowledge (if not, subscribe to my newsletter) “THR VS ARSON MICAH FOU RHGW you can leverage the “system prompt” to give elaborate instructions to your model. The better your prompts, the more control you have over the output. Start by writing a basic prompt. Test it. Not bad, you tell yourself. But since you're familiar with prompting, you know you can do better so you decide to invest some time to improve your instructions. Suppose your initial “system prompt” and “user prompt” are as follows:‘1624, 2:40PM ‘Advanced Code Generation With LLMs — Building a Synthetic Data Generator | by Nabil Alouant | Apr, 2024 | Towards Data Scisnce 1 [Prompt exanple to generate synthetic text dats] systen_pronpt Based On the instructions and specific syntax T gave you before, write code that ‘The code aust generate one row of synthetic data Here are the features/coluens I want you to generate: = 2 Fletional user mane = 3 Fletional email adress = 2 Fictional rating = 8 Fetional review Lusor_prompt =" Generate 9 detailed review about a Fictional VR headset. ‘There are a dozen techniques you can apply to upgrade your prompts. One of them is meta-prompting. You basically work with your model to improve your instructions. I've shared a guide on meta-prompting in another article you'll find linked below. For now, let's fast forward and display the improved prompts: # Exanple of a syster prompt co-generated with an LLY ‘systes_promp ‘Act a5 an expert Data Scientist who specializes in creating synthetic data. + objective: create a table af Fictionst audience menbers reviewing various conp ‘The umber of audience menbers 1s specified by the user, + Table Columns: 1. Nane: First and Last nane of the audience mesber. Ensure ethoic diversity in 2. Email: Should foliow the format °[Firstnane] . [Lastname] @fakemait.co’ 43. Product Mane: Choose from VR headsets, Scartphones, Headsets, Mice, Keyboards 4 Product Category: Create a unique nane for 2 product in the chosen category. '5. Cooment: Provide a constructive conment about the product. include aspects su + Instructfone: ~ auctence Menber Details = Names should be diverse and creative, reflecting ethnic diversity to represe = enail addresses must use the specified format and be unique for each audienc = Product and Connent: = Each conment should be relevant to the product category and offer insights 4 = Fictionat product nanes should be imaginative and reflect the product catege » Example Entry: = Nane: Julia Patet = Email Address: Tulia.patel@fakenai com = Product Nane: Vistonduest 360 = Product category: VR headset = Detailed Comment: "The VisionQuest 360 offers an imersive experience with its Intps:ifarchive.phWVzGnP 10183‘1624, 2:40PM Intps:ifarchive.phWVzGnP ‘Advanced Code Generation With LLMs — Building a Synthetic Data Generator | by Nabil Alouani | Apr, 2024 | Towards Data Scisnce lser_prompt = "Create a table with § rows based on your instructions please. Mak Now you inject the “system prompt” and “user prompt” into your model and ask it to use them as variables. Here's what the code looks like: from octoai.client ‘import Client eLient = Client (Token) completion messages: ‘ient chat completions .create( role: "systes", ‘content: systen_prompt #systen_prompt is now a variable that conta h ‘ role": “user, content": user_prompt #Sane story with user prompt. ) h rnogel="nixtral-sx7b-instruct-fe1s", Inox_zokens=50, presence _penaltys0, emperaturer0.2, > output = json.dumps(comptetion.dict(), indent=2) print(output) You've run the code a few times and the results seem stable. But making sense of the output? That's a different story. ‘You're staring down a JSON object and with messy rows on the horizon. It’s time to roll up your sleeves to figure out how to parse your JSON output. 4 The output is a 250M object that Looks something Like this: ‘ "ie": "chatenpl-acTodsedbesesssaboseaiasTccfeses", icrosces": [ ‘ index" 9, rmessage”: { role": Massistant™, "content": " "None"; “Email Address!" ; "Product Category" ; ional Product Name" ; "Coement"\n "iuan Kin! $ "Juan. kimofakemail.com” ; “Smartphone”; GalaxyForce X10" ; “The GalaxyForce X19 has a sleek design and impressive camera quality. T would suggest adding a nore custonizable user interface for better personalization. \n "Sofia anned” ; "Sofia.ahmedgfakenaiL.con” ; "Headset ; +1133‘1624, 2:40PM Intps:ifarchive.phWVzGnP "SoundSerenity Pro" ; “The SoundSerenity Pro headset delivers clear audio ‘and active noise cancellation. A longer wireless range would enhance its bsabslity. "in| ‘Wateo Johnson" ; "Mateo.jahnson@fakensil.con* ; "Mouse" 5 "PrecisionPointer 5800" ; "The PrecisionPointer See has an ergonomic Gesign and smooth tracking. Adding customizable buttons for shortcuts would inprove its functionality."\n "Aisha Wilson" ; "Aisha.wilson@fakenatl.con™ ; "Loudspeaker"; "ThunderS00n Max" ; "The ThunderBoom Max offers powerful. sound and deep bass. Integrating voice assistant compatibility would make it a sore versatile device."\a "Liam chen" 5 "Liae-chen@fakemait.con” ; "Seren" ; sUltravision Evite" j "The Ultravision Elite has stunning visuals and vibrant colors. Reducing glare and reflection would further enhance the viewing experience."", “function.catt": null im sdetta": mutt, Finish_reason ? 1 created": 1710539587, ‘model: "mixtrat-ae7b-instruct-fe26", object: Mchat..comptet ion", system. fingerprint: null, usage"! ( ‘complet fon_tokens": 262, "prompt_tokens": 451, ‘otal tokens": 824 step" } } You want to parse your output into a table and then transform that table into a data frame. Notice how this is a new sub-objective that appeared during the “Narrate” phase. You can paste the JSON into a chat tab and ask your LLM to sprinkle some Python magic to parse Remember, your model is already aware of the context thanks to your first prompt. Now you can afford to interact with it using short crisp sentences. Here's an example of a short prompt you can use: [example of a short prompt during the "arrate” step) [Parsing the output 350M object] 1 ran the code and obtained a 3S0N object. Generate a code snippet that parses the JSON object to get the "content" of the Include necessary packages and clear coments. ‘Advanced Code Generation With LLMs — Building a Synthetic Data Generator | by Nabil Alouant | Apr, 2024 | Towards Data Science 12183‘1624, 2:40PM ‘Advanced Code Generation With LLMs — Building a Synthetic Data Generator | by Nabil Alouani | Apr, 2024 | Towards Data Science Intps:ifarchive.phWVzGnP 1 Here's an example output fron ChatsPT=4 Snport json #package needed to parse JSON objects Parse the aso Gata = json. Loads (output) Extract table content from the ‘content! Field ‘table content = data{'choices"}{0]['message']['content") print (table_content) You realize the content could still use some formatting. After all, you want a CSV format. You ask the model to handle the conversion for you. le of short prompt during the "Marrate” step] [converting the "content™ into a datafrane) I ran the code and obtained a string outsue. Generate a code snippet that converts the parsed data from Meantent into a data Include necessary packages and clear coments. # Here's an example output fron ChatPT-# fron jo import StringlO package needed to format your parsed JSON Turn your table into s pandas OataFrane # Use StringlO to simulate reading from 3 file parsed data = StringT0(eable_content) 4+ Read the data into » pandas DataFrane using "';" as the column separator Gf = pé.read_csv(parsed_data, sep=";") + Display the Datafrane prinetef) Better ight? Now that you have an elegant data frame, you can output it into an Excel/CSV table. Why output a file? It makes it easier to control the format of the output, especially when you start generating thousands of rows.‘1624, 2:40PM Intps:ifarchive.phWVzGnP ‘Advanced Code Generation With LLMs — Building a Synthetic Data Generator | by Nabil Alouant | Apr, 2024 | Towards Data Scisnce [example of 9 short prompt during the "Marrate" step] [converting the dratafrane into an Excel file] Generate a code snippet that converts the 4 Include necessary packages and clear coments. code and obtained » datafrane, cane into an Excel File stored in 1 Here's an example output fren ChatcPT=4 #ile_path = *your_file_path_here" Save the DataFrane to an Excel file Gf to_excol(Fite_path, index=True) prine(tDataFrane saved to {file_path}') The “Narrate” step is a conversation with your model. You tell it to generate one tiny code snippet at a time, The idea is to work with short feedback loops and run your tests as you go. Keep a Jupyter Notebook ready in one tab and an LLM chat in another. Test each snippet your model throws at you. Ifit clicks, great — lock it in and advance. If not, push back, clarify your needs, and try again. That's the dance of INSPIRe: you're not the coder, but the vigilant supervisor who verifies every step. It may appear like a slow process, but once you get used to INSPIRe, you'll start moving faster than your own fingertips. Another point I'l repeat until I'm hoarse is: always, always, always test your output. Which brings us to the next step. 3.Screen Ernest Hemingway said it best. “The first draft of anything is shit,” and your generated code is no exception. Think of “Screening” as less of a step and more of a habit. Start by running your generated code in a Jupyter Notebook to see what holds up and what falls apart. Encounter an error? Relay it to your LLM and see what fixes it suggests — then test those fixes too. This principle isn't just for code. 14133‘1624, 240 PM ‘Advanced Code Generation With LLMs — Builsing @ Synthetic Data Generatr | by Nabil Alouan | Apr, 2026 | Towards Data Science Your synthetic data needs the same scrutiny. LLMs aren't deterministic which means the same prompt can yield different results. That's your cue to test your prompts repeatedly, screening for inconsistencies in the output — the same way you do with your code. That's why I suggest you always save synthetic data samples in Excel files for quick visual checks on the fly. In short, assume nothing; question everything. With the data generation example, I've noticed three issues when verifying the code and the generated data: 1. The output format was unstable. Sometimes the separator would be a comma and some other times it would be a semi-colon. I updated the prompt to add a clear format instruction. 2, 1made a mistake in the system prompt. Instead of introducing a “Rating” feature, T introduced a “Product Name” feature. T replaced the latter with the former. 3. The maximum number of tokens was no longer enough to generate 5 rows of data because my system prompt kept getting bigger. I increased the max_tokens value to 1,000 to have a sizeable margin of safety. Here's what fixing the previous issues may look like after the “Screen” step: stmprovenents made inside the SYSTEM PROT wotice the 4th item in the "Table Columns" ané the "Example Entry" sections Notice the "Format" section systen_pronpt= ‘Act a5 an expert Oata Scientist who specializes in creating synthetic data sodjective: Create a table of Fictional audience embers reviewing various camp The number of audience members is specified by the user stable Columns: 2. Nase: First ond last nane of the audience esber. Encure ethoic eiversity in 2. Emaél Address: Should follow the format * [firstname]. Lastname] @fakenail.com” 3. Product reviewed: Choose from VR headsets, Smartphones, Headsets, Mice, Keybe = Product Rating: Assign a rating from 1 to 5 stars. Cooment: Provide a constructive conment about the product: clude aspects su ~ Austence Menber Details = Nanes should be diverse and creative, reflecting ethnic diversity te represe = Enail addresses must use the specified format and be unique for each audienc = Product and Comment: Intps:ifarchive.phWVzGnP 16189‘1624, 2:40PM Intps:ifarchive.phWVzGnP = Each comment should be relevant to the product category and offer insights i = Fictfonat product names shoulé be imaginative and reflect the product catege = The content should be structured as 2 table with sewicolon-separated values (C = Fach entry must contain the following columns in order: Name; Ena‘ Address; F = Product ratings should be expressed as integers fron 1 to 5, where 1 represent ‘serample Entry: = wane: Tulia Patet = Email Address: Iulia.patelefakonail.com Product Reviewed: VisionQuest 360 = Product Rating: = comment: "The Visionguest 360 offers an inmersive experience with its high-res User_prompt = "Create a table with § rows based on your instructions. Make the & improvenents made ‘inside the code Notice the max_tokens value fros octoat-cLient Inport CLient client = Client(Token) completion = client chat completions .create( messages: { role": “systen", content": systen_pronpt h ‘ role": “user”, content": user_prompt sixtral-BxTb-instruct=fo16", Imax_tokens=1090, # changed fron S00 to 1,000 presence penalty ‘tomperatur t0p_p-0.9, > ‘output = json. dumps (comptetion.dict(), indent=2) prine(eurput) You just got yourself a cleaner piece of code. Now you can take it one step farther. 4.Polish This is where the fun intensifies. Maybe your LLM skips a crucial comment. Perhaps the variables have boring names. What if the generated code doesn't handle errors? Now is your time to shine as a detective. Grab your magnifying glass and hunt down those imperfections, ready to correct and enhance. ‘Advanced Code Generation With LLMs — Building a Synthetic Data Generator | by Nabil Alouani | Apr, 2024 | Towards Data Scisnce 16133‘1624, 2:40PM Intps:ifarchive.phWVzGnP ‘Advanced Code Generation With LLMs — Builsing @ Synthetic Data Generatr | by Nabil Alouan | Apr, 2026 | Towards Data Science For me, polishing often means turning freshly generated code into dynamic functions. Another fun quest is error handling. The “Polish” step is like editing one of your drafts. The content is there, but a few tweaks can make it shine. You can either: + Edit yourself and test. Spot any errors? Consult your LLM to iron out syntax issues and misplaced indentations — those godforsaken indentations. + Make your LLM polish the code for you by giving it more natural Janguage instructions. Whichever route you choose, you'll emerge from “Polish” with a big smile. You're just one step closer to completing your code. If your code is achievable in one step, you're already there. In the case of data generation, I spent 80% of my time in the polishing phase. All of my manual edits were on the prompts, leaving code generation for my AL assistant. Important note: The “Polish” step is often significant enough to make you run a full INSPIRe loop inside it Every time you want to upgrade your code, consider the upgrade as a new side quest. Write a specific prompt, narrate instructions, and iterate until you generate a better version of your code. For the data generation example, I ran multiple INSPIRe loops when polishing to accomplish two objectives: A. Functional improvements inside the code. B. Handling errors. Here's a breakdown of what this process involved: A. Functional provements: + Instead of generating a new product for every review, I wanted to constrain my LLM to a fixed number of products. I created a list of fictional products across 10 different brands, all stored in a JSON object. My aim was to mimic the content of a realistic e-commerce site. + Lused the JSON object as a source of examples I can feed into the prompt. 17133‘1624, 240 PM ‘Advanced Code Generation With LLMs — Builsing @ Synthetic Data Generatr | by Nabil Alouan | Apr, 2026 | Towards Data Science + Iturned my static prompts into functions that produce dynamic prompts. Tintroduced variables and flexible samples. + I threw in a few improvements inside the prompt itself, such as an example of the expected output. + Ladded a “batch_size” variable to control how many rows of data I generate each time I run my prompts. This helped me vary the content of my output even more, + Eventually, I noticed that my model predominantly generated positive reviews. To balance this, I added a “sentiment” variable, allowing me to specify whether I wanted positive or negative reviews. Ifthis sounds complicated, it’s because it is complicated. These steps are the results of 15 iterations meant to improve the output. You cannot overinvest in the “Polish” step. Every time you examine your code and its output, you'll think of potential upgrades. Capture your ideas and try to bring them to life, On top of being useful, this is the most intellectually stimulating step. Now let's go back to the improvements we made. For simplicity, remember this: we put together dynamic prompts to diversify the output. We also turned the initial prompt into a dynamic function. ‘Take a look: Got generate_systen_prompt.sentinent(json file path, batchsize, sentine Generates a dynanic system pronpt for creating synthetic product reviews wit ‘Tris function loads product data from 2 250N file, selects a random subset © fang constructs » prompt directing the creation of synthetic reviews. Each re brand, description, and a user comment reflecting the specified sentiment (pe = json_file_path (str): Path to the ISOM file containing product data. = batch size (int): Nusber of products to include in the promot, = sentinent (str): Desires sentiment for the reviews (‘posivive', ‘negative’ = str: A formatted string containing the system prompt with instructions for convert the seneinent input to lowercase for standardized comparison. sentinent_lower = sentiment. Lower() Intps:ifarchive.phWVzGnP 10189‘1624, 2:40PM Intps:ifarchive.phWVzGnP ‘Advanced Code Generation With LLMs — Building a Synthetic Data Generator | by Nabil Alouant | Apr, 2024 | Towards Data Science Assign rating instruction based on the sentiment, $f sentinent_lower == "positive" rating instruction = "Ensure each review includes 2 positive sentinent, elif sentinent_lower == “negative” rating. instruction = "Ensure each review includes 2 negative sentinent, else: rating instruction = "Alternate between positive reviews (4 or $ stars, ' Load the product data from the specified I50N file with open(json_f"le_path, 'r') as Fite products data = json. lead(Fite) ¥ Select a randor set of preducts from the JSON file, based on the speciied selected_products = randon.sample(productsdata, min(baten_size, Len(product "Generate the dynamic system prompt 4 Notice the **Selected Products section, the (rating instructio} # Also notice the Example of Final Output section. systen_pronpt Act 98 an expert Data Scientist who specializes in ereating synthetic data, ssobjectiver Create a table of Fictional audience members reviewing the followin saselected Products: 4 "\n".Join(Cf**Brand: {product{"brand']}, sProduct: {product{"product']}, * seTaple Colunns 1. shane: First and tast name of the audience menber, Ensure ethnic diversity in 2. stmail Address: Should follow the format * [Firstname]. [Lastnaze]@fakenil-con 3. sProduct Reviewed: One of the selected products. 4: sProduct Rating: Assign 2 rating from 1 to § stars. 5. sComent: Provide 9 constructive coment sbout the product. Taelude aspects $ te Specific instructions: = shudience Henber Detasts: = Names should be diverse and creative, reflecting ethnic diversity te represe = Ensil addresses mist use the specified format and be unique for each audienc = sProduct Review: = Each comment should be relevant to the product reviewed and offer insights 4 = (rating instruction} seForast: = The content should be structured as 2 table with semicolen-separated values (C = Fach entry must contain the following colums in order: Name, Email Address, F = Product ratings shovld be expressed as integers fron 1 to S, where 1 represent sezxample Entry’ = Name: Hina Ahnes = eRsil Address! hina. aheedefokenas1.com = Product Reviewed: SoundSerene Pro = Product Rating: 5 = Comment: "The SoundSerene Pro headset delivers exceptional. sound quality and sstnanple of Pinal Output: NanejEnaiL Address;Product Reviened;Praduct Rating;Ce Hina Ansedshina.ahmedfakensil-con;SoundSerene Pro;5;The SoundSerene Pro headset Jessie Martinsjessie.martin@fakenail.con;Pixel XLtj1;The Pixel ALE struggles wit Win-31 Kinjmin-j.kimafakenail.com;Son‘cBoon X1;3;The Soniedoos X1 offers poner Peoro Martinez; pedro.nartinez@fakena‘t.comjGaningiear N24; The GaningGear M1 has Rachika Gupta; radhika.quptagrakenail.com;QuantusQuill R2}43The Quantumguill R2 f return system_prompt‘1624, 2:40PM Intps:ifarchive.phWVzGnP ‘Advanced Code Generation With LLMs — Building a Synthetic Data Generator | by Nabil Alouant | Apr, 2024 | Towards Data Scisnce 1 Usage example with a specified batch size Json_file_path = "c:/Users/Pwv628 /Desktop/09_NLP/octoal/inguts/products. json" bbatch_siza = 5 # Specify the number of products you want to ‘include in the pron ssentisent = "NEGATIVE systen_pronpt = gonerate_systen_pronot_sentiment(json.file_path, batchsize, ser print(systen_promt) Here's a small version of the JSON | used as a reference table. In real life, this ‘would mean I scrapped all the produets from the e-commerce website of my client. The real JSON has 50 products. ‘The sample displayed below has only 5 fictional products: ‘ products" [ rand: Nebulatech", oduct: "Nebula UR Headset Pro", description’: "® high-end VB headset for iamersive experiences." rand": "Astracesr", oduct: "astra Phone 02", lescription’: "A durable and reliable smartphone for everyday use." brand": "orionwares", "product": "Orion Surround Headset", description": " headset offering immersive audio with surround sound. h rand: ZephyrTeeh", oduct": "Zephyr Gamer House", description’: "A nouse designed for hardcore gamers with customizable but » rand": *Vortexlectronics", oduct": "Voreex Ergo Keyboard! lescription": "An ergonomic keyboard designed for long gating sessions." You can apply the same logic to the user_prompt. I kept it simple for practical reasons and added only the “batch_size” as a variable. The idea is to avoid dividing your LLM’s “attention” between two sets of instructions. Here's what the user_prompt looks like: 20133‘1624, 2:40PM ‘Advanced Code Generation With LLMs — Building a Synthetic Data Generator | by Nabil Alouant | Apr, 2024 | Towards Data Scisnce ef generates prompt (batch size) 4 Generate the dynamic user prompt create a table with (batch size} Fetional user reviews ba user_pranpt = # return user_prom 1 Usage example baten_size = § # This can be any number depending on your requirenents lser_prompt = generate user_pronpt (batch_size) prine(user prompt) B, Error handling Each time you enter an input, there’s room for errors. The key distinction between decent and quality code is its robustness to such issues. In the pre-LLM days, you'd have to meticulously examine each line of code to anticipate and address possible errors. ‘Today, you can crafta flexible error-handling prompt and tailor it to your needs. Here's an example I designed for the synthetic code generator: {step #4 Poutsi) {sub-task:Error-handting] [Flexible prompt for error-handing] sole: Act Like a softiare enginser specializing in Your task 1s to enhance to handle errors effectively + Guidetin Reason step-by-step to improve input_code> in to robust\ Ensure the code 1s well-commented, clear, and adheres to error handling best pra Start by ‘identifying potential error sources in and propose solutic specifics: Incorporate try-except blocks, error Logging, and recovery mechanisms as needed valicate syntax and error handLing logie meticulously. Formats Title modifications as "Error HandLing Enhancement vx.¥" where X is the snippet ” Inputs: Intps:ifarchive.phWVzGnP 21133‘1624, 2:40PM Intps:ifarchive.phWVzGnP ‘Advanced Code Generation With LLMs — Building a Synthetic Data Generator | by Nabil Alouani | Apr, 2024 | Towards Data Science = Python 3.11.5 input code> = [insert your code here! With such a prompt, you're not just fixing problems; you're preemptively building resilience into your code, making it smarter and more reliable — one snippet ata time. With this said, let's move to the next step where everything comes together. 5.Integrate Every code snippet you generate is a piece of a larger puzzle. When you start from scratch, you don't have anything to integrate. However, when you're enhancing existing code or constructing a multi-step program, the “Integrate” phase becomes essential Integration is more than just stitching code snippets together — it’s about knowing where you are and where you want to go. Sometimes integration means to write a new function. Maybe you need to tweak yourcodertormakeritfitwithiitsoldersiblingssPerhaps you want to» store some values in a local JSON object rather than a list. Either way, you'll be assembling puzzle pieces. In the example of data generation, I used INSPIRe 15+ times to write three different functions. In the “Integrate” phase I combined these functions into asingle program to forge a synthetic data generator: + Function#1 —generate_system_prompt_sentiment(json_file_path, current_batch_size, sentiment): It generates dynamic system prompts based on a sample of products selected from a JSON file. + Function#2— generate_user_prompt(current_batch_size): It creates dynamic user prompts that have the same “batch_size” as the system prompt. + Function#3 — generate_synthetic_dataset(dataset_size, batch_size, sentiment, json_file_path, output_file_path, Token): It calls the Mixtral- 8x7B model through the OctoAI API. It uses a loop the size of my desired dataset. In every iteration, Function#3 calls the other two functions. 22133,‘1624, 2:40PM Intps:ifarchive.phWVzGnP ‘Advanced Code Generation With LLMs — Builsing @ Synthetic Data Generatr | by Nabil Alouan | Apr, 2026 | Towards Data Science About 95% of the code you'll see below is generated by an LLM. The remaining 5% is me adding comments and changing variable names. Here's what the synthetic data generation function looks like: of generate_synthetic.dataset(dataset_size, batch size, sentinent, Json_ files Generates 2 synthetic dataset using the OctoAT APT. aes: datasot_size (int): The desired size of the dataset. bateh size (int)? The nusber of entries 0 generate in each bateh Sentiment (str): The sentinent of the dataset to be generated (‘positive’, Jeon_file_path (str): The path to the ISON File required for generating syst output file_path (str): The path where the generated dataset will be saved @ Returns: pandas.dataFrane: The final Oatarrane containing the synthetic dataset. # Placeholder for the aggregated results atteata = (1 W caleulate the totsl nusber of batches, rounding up to include partial bate total_batches = eath.ceil(dataset.size / batch_size) client = CUient(Token) Initialize the Oetont client for batch_num in range(totat_batches): agjast the batch size for the Last batch, if necessary current batch size * batch_size if (batch.num + 1) + batch.size <* datas # call the octoai APT to generate 9 batch of synthetic data completion = client. chat completions .create( message ‘role: "system", “content: generate_systen_pronpt sentient (j {role Huser", “content: generate_user_prompt current batch. 5 1 ode\="wixtral-seTb-instruct—fp16", # Pick the model you want from rmax_tokens*1500, 1 Mainum sunber of Cokens te generate in the comp + # No penatty for repeating information, attowin # Low creativity, the model will produce more dete ‘Use top 80% of probability distribution, balancing cre ) 1 Parsing the 3S0W output from the APE eutput = json.dusps(coepletion.dict(), indent=2) # Convert APE response data = json. loads(output) ¥ Parse the JSON string into a Python diction 4 extract table content from the ‘content! Feld ‘eable_content = datalchotees'}[6] ['message'] [' content) 4 convert the generated batch data to a DataFrane and append it to ‘aU. parsed_data = StringTo(table_content) batch df = pd.read_csv(parsed_data, sep-'5") 4 Chock if tho batch df has more roms than expected and adjust if neces Sf len(batch_df) > current_patch_size: bateh.af = batch.¢f flaeL:eurrent.patch_size] 4 Append the batch Datafrane to the List of alt date al Ldata.append(batch_4F) concatenate all batch DataFrames into s single Datarrane 23133‘1624, 2:40PM Intps:ifarchive.phWVzGnP ‘Advanced Code Generation With LLMs — Building a Synthetic Data Generator | by Nabil Alouani | Apr, 2024 | Towards Data Scisnce Final_Af = pd.concat(att_data, ignore, indexsTrue) 4 optional: Verify the Final row count matches the expected dataset size if Ten(Final_df) {= dataset_size: print(iWarning: Expected (dataset_size) rows, but got {len(Final_df)} r 1 Save the Final DataFrane to an Excel file ‘Final ef-to_excel (output. file_path, inéex=False) 1 Display an end-of-task message print(F"DataFrane with {dataset_size) rows saved to {output_fite.path}") return Finaldf Now, let's run the function. First you FALL in the inputs Token = "your ectoar token here” Lient = Client (Token) Gataset_size = 42 texaeple for 2 small sample batch size = 5 rexanple for a smatt sanple Json_File_path = "'your_file_path_here. json" ¥ preparing the input Sentiments and paths for two separate datasets, one with p ‘sentiment df negative> "negative" sentinent df_positive= “positive” output file path negative = "your_file_path_here/NEGATIVE_OF.xtsx" cutput_file_path positive = "your_File_path_here/POSITIVE_OF.xisx" Hore we generate our two datasets Gf_positive » generate synthetic dataset(dataset size, batch size, sentisent df Gf negative = generate_synthetic dataset (dataset_size, batch size, sentinent_éf_ 1 Now we combine our datasets conbined_df_finat = pd.concat([df_positive, éf negative), ignore. index=False) ‘conbsned_df_Finat. sarpte(19) and voila 6. Restart ‘The INSPIRe framework is a cyclic process. Once you complete a cycle, you start the next, but it’s crucial to take a moment to reflect before advancing. What's the next phase? Has your perspective shifted during the previous loop? Are there new ideas you want to introduce or old ones to discard? 24133‘1624, 2:40PM Intps:ifarchive.phWVzGnP ‘Advanced Code Generation With LLMs — Building a Synthetic Data Generator | by Nabil Alouani | Apr, 2024 | Towards Data Science Take this exampl : the initial 15 iterations focused on creating functions to generate synthetic data, starting with a modest sample of 42 rows. The 16th iteration aimed to scale the process up to 2,000 rows, during which encountered a few scalability issues. The result? More iterations. It took three evenings to finalize the synthetic data generator. From there, I moved on to creating various datasets, changing the parameters here and there. Later, I conducted several rounds of Exploratory Data Analysis (EDAs) using the INSPIRe framework. Here’s an EDA example: - Task: Conduct an analysis for Product Review Frequency + Steps: * Count the number of reviews for each product: This will show us how {frequently each product is reviewed, indicating its popularity or consumer interest. * Visualize the data: Create a plot to display the frequency of reviews for each product, making it easier to compare them visually. 11, Count the number of reviews for each product product.review.count = conbined_d*_Final['Product Reviewed! J .value_counts() #2, Visualize the data ple-Figure(Figsize- (12, 25)) ns. barplot (x=product_review.count.vatues, fe("Product Review Frequency") label ("hunber of Reviews") label ("Product Reviewed") pLELLight_layout() # Adjust layout to make fuIl use of space plt-show() oduct review count index) 25133,‘1624, 2:40PM ‘Advanced Code Generation With LLMs — Building a Synthetic Data Generator | by Nabil Alouani | Apr, 2024 | Towards Data Science Image taken om the authors Notebook. Here's another example: - Task: Perform text analysis on the “Comment” column and extract common themes or features mentioned in the reviews. Use a word cloud, as it is an effective visual tool for identifying the most frequent words. Steps: * Combine all comments: Concatenate all the text from the “Comment” column to have a single large text string. * Text preprocessing: Remove common stopwords (like “the”, “a”, “in") which don't add significant meaning to the text analysis. Intps:ifarchive.ph'VzGnP 26133,‘1624, 2:40PM ‘Advanced Code Generation With LLMs — Building a Synthetic Data Generator | by Nabil Alouani | Apr, 2024 | Towards Data Science Intps:ifarchive.phWVzGnP * Generate the word cloud: Create a visual representation of the most common ‘words in the comments.0 from wordeloud import WordCloud, STOPHOROS ingart aatplettib.pyplat 2s alt #1, Combine atl comments into a single text string alllcoments = ' '.Join(combined_éf_ final 'Cooment'].drapna()) #2. Text preprocessing - define stopwords to exclude from the analysis ‘stopwords = sot (STOPWORDS) #3. Generate the word clous wordcloud = WordCloud(width=800, height-490, background color='white’, stopwords # Plot the nord cloud ple. figure(Figsize(19, 5)) pltsinshon(wordctoud, ‘interpolations bitinear’) pltiaxis('orf") # Hide the axes ple.show() Imago takon from tho author's Notebook. Prompt Engineering = Programming INSPIRe is an organized display of what's possible when you know how to talk to LLMs. If you stick with it long enough, you'll turn into a magician. Cast the right words, and elegant code will appear. As with every magic trick, you want to steal it like an artist. Begin with the basic INSPIRe formula and practice until it becomes second nature. 27133‘1624, 2:40PM Intps:ifarchive.phWVzGnP ‘Advanced Code Generation With LLMs — Builsing @ Synthetic Data Generatr | by Nabil Alouan | Apr, 2026 | Towards Data Science Over time, you'll introduce personal tweaks and build your own version of the framework. That's how you might find yourself spending less time fussing over indentations and more time on what truly matters. The Al scene evolves every time you refresh your feed. Companies are rushing to build agents that can do what we do. Some cover their eyes in denial. Others are considering new career paths. The pragmatic approach? Embrace change and evolve with it. Use Al as a tool to augment your work and gain a competitive advantage. Not only will you save time and resources. But you'll also become familiar with the different use cases of AI models like GPT-4, Claude3, Mistral, and Gemini. You'll get a glimpse into the future that’s unfolding right now. Speaking of Tomorrow, one of the most worthwhile skills you can invest in is writing prompts. Contrary to popular belief, prompt engineering is not a straightforward task that comes naturally — it's quite the opposite. Al models have a different understanding of language and a different way of reasoning — which means you can't just hope for them to respond perfectly to half-baked instructions. You have to be intentional with your prompts. + Give your LLMs clean input tokens to limit the range of possible answe T is is why I use relatively long prompts. * Give your LLMs enough output tokens so they can “reason” and produce better results. This is why all my prompts have a “reason step-by-step” instructions. Steal the recipe. Get the dosage right. That's how you make your AI model work for you. If you haven't started prompting yet, pick up INSPIRe and get started today. Use it to generate your next code snippet: *+ (D Identify what you want to do and how to do it — write a prompt. + (N) Narrate the instructions that need to be converted into code. 20133,‘1624, 2:40PM Intps:ifarchive.phWVzGnP ‘Advanced Code Generation With LLMs — Building a Synthetic Data Generator | by Nabil Alouani | Apr, 2024 | Towards Data Science + (S) Screen every code snippet and correct errors. + (P) Polish your code to make it more elegant and dynamic. + () Integrate your code brackets into an elegant program. + (Re) Restart the process until you achieve the desired results/ Al won't replace you, but someone who uses AI will. Happy prompting. Want to get better at prompting? Y'm launching The Bald Prompter newsletter where you'll receive prompt engineering tips and one advanced prompt per week. ‘Subscribe: The Bald Prompter Hype-free emails about Al and Prompt Engineering-zero nonsense, maximum utility, Click to read The Bald Prompter, by. ‘abislousnisubetackcom The first 200 paid subscribers ($5 per month) will get a complete Prompt Engineering guide (priced at $60) as a welcome gift. GitHub repository: INSPIRo_fuse_cases/synthetic_data_genoration at main NabilAlouani/INSPIRe, 4a repository for synthetic data generation using LLMs - INSPIRe. fuse cases/synthetic.data_ generation at main cthurcor More from The Bald Prompter: {6 Steps to Make Al Write Your Python Code for You Use the INSPIRe framework to save time and gain a competitive ‘edge (ChatGPT-4— Claude 3 — Gemini) townrésdatasionea.com 29133‘1624, 2:40PM ‘Advanced Code Generation With LLMs — Building a Synthetic Data Generator | by Nabil Alouani | Apr, 2024 | Towards Data Science How to improve Any Prompt in Less Than S Minutes (Chat Ul and Intps:ifarchive.phWVzGnP Code) ‘Turn hal-baked sentences into expert-level prompts DataScionce —_artifelalnteligence_—_Prograrnming Deep Dives em a Nabil Alouani Written by Nabil Alouani ak Followers - Whiter fr Towards Data Slence Prompt Engineering o OO Engineering (Software Engineering (Prompts for LLMs) Engineering eas I 100% human-generated content | Weekiy mais: tos fnabelovan.substack.com More from Nabil Alouani and Towards Data Science 30133)‘1624, 2:40PM Intps:ifarchive.phWVzGnP ‘Advanced Code Generation With LLMs — Building a Synthetic Data Generator | by Nabil Alouant | Apr, 2024 | Towards Data Science ‘ZHow to Write Expert Prompts for ChatGPT (GPT-4) and Other Language Models 22 Nel Alan! in Towards Date Science How to Write Expert Prompts for ChatGPT (GPT-4) and Other. A beginner-trienaly guide to prompt engineering with LLMs 4 © 6aminread Novs,2028 eK Qe ti intro to LLM Agents with Langchain: Wher AG is Not Enough Intro to LLM Agents with Langchain: When RAG is Not... First-order principles of brain structure for Al assistants "min vead + Mari6,2096 9K Qo a L2:The Math Behind Neural Networks B, cisianLeo in Towards Data Sence ‘The Math Behind Neural Networks Dive into Neural Networks, the backbone of ‘moder Al, understand its mathematic, 4 = 20minveas Mor 29,2008 om 6 ti l2'An anime-style image of a Tunsian-looking female, aged 32, who is experiencing a ‘moment of sudden inspiration. Her face is lluminated with excitement, joy, witness, and intelligence, perfectly capturing the essence ofa groundbreaking idea. The setting is futurist, with advanced technology and Al elements integrated seamlessly inte the environment. Her attire combines moderr {ea AAR TUFRer BERS RREgantly iuaeeictne eas ears Less Than 5 Minutes (Chat Ul and... ‘Turn halt-baked sentences into expert-level prompts + - Sminread «Feb 5,206 Ga Qs ia Recommended from Medium 31133‘1624, 2:40PM BHow to Jumpstart Your Al Literacy 2 oan cibbs How to Jumpstart Your AlLiteracy All professionals in today’s workforce need to prepare for an Al-driven future I'you'e. + = Trinread » Sdaysa90 aw Q4 a Lists Predictive Modeling w/ Python \kChat@pr leratet 2istree ‘The Ferfect Prompt: A Prompt Engineering Cheat Sheet & Maxinitan Vogel in The Genertor ‘The Perfect Prompt: A Prompt Engineering Cheat Sheet Large language models can produce any sequence of characters. Literally ary. In any “Tminteas + 6

代码大模型
No ratings yet
代码大模型
18 pages
LLM's For Code Generation
No ratings yet
LLM's For Code Generation
31 pages
Project Seminar
No ratings yet
Project Seminar
12 pages
Sayiqa - AI Engineer
No ratings yet
Sayiqa - AI Engineer
4 pages
AI For TPMs EdgeUp Curriculum
No ratings yet
AI For TPMs EdgeUp Curriculum
12 pages
NAIPDC 2025 Bootcamp Slides - National AI Prompt Design Challenge Philippines
No ratings yet
NAIPDC 2025 Bootcamp Slides - National AI Prompt Design Challenge Philippines
93 pages
Applying LLMs To Threat Intelligence - by Thomas Roccia - Nov, 2023 - SecurityBreak
No ratings yet
Applying LLMs To Threat Intelligence - by Thomas Roccia - Nov, 2023 - SecurityBreak
25 pages
Generative AI Course Brochure
No ratings yet
Generative AI Course Brochure
11 pages
Code Generation 2305.10679v1
No ratings yet
Code Generation 2305.10679v1
13 pages
Agentic AI
No ratings yet
Agentic AI
12 pages
Brolly AI - Generative AI - Online Training
No ratings yet
Brolly AI - Generative AI - Online Training
13 pages
AI Engineer Roadmap
No ratings yet
AI Engineer Roadmap
22 pages
AI Professional Workshop
No ratings yet
AI Professional Workshop
32 pages
Resume Template For AI
No ratings yet
Resume Template For AI
4 pages
Lab Session1 25oct2024
No ratings yet
Lab Session1 25oct2024
29 pages
Datasheet Building LLM Applications With Prompt Engineering
No ratings yet
Datasheet Building LLM Applications With Prompt Engineering
3 pages
Generative AI Roadmap
No ratings yet
Generative AI Roadmap
36 pages
Gen AI 1
No ratings yet
Gen AI 1
9 pages
Skills Required For Generative AI
No ratings yet
Skills Required For Generative AI
3 pages
Generativeaiconamazonbedrock 231229150142 844d444e
No ratings yet
Generativeaiconamazonbedrock 231229150142 844d444e
48 pages
Master Catalog For GenAI Programs For LNW-19Jul2024
No ratings yet
Master Catalog For GenAI Programs For LNW-19Jul2024
9 pages
Guide 4 Prompt Engineering
No ratings yet
Guide 4 Prompt Engineering
1 page
AI-Augmented Software Development A Simple Roadmap To Start From Scratch
No ratings yet
AI-Augmented Software Development A Simple Roadmap To Start From Scratch
17 pages
LLM Project Guide
No ratings yet
LLM Project Guide
4 pages
Application of Large Language
No ratings yet
Application of Large Language
75 pages
Generative Ai Terminology
67% (3)
Generative Ai Terminology
26 pages
Code Agents
No ratings yet
Code Agents
24 pages
Large Language Models and Where To Use Them - Part 1
No ratings yet
Large Language Models and Where To Use Them - Part 1
12 pages
Pe 1
No ratings yet
Pe 1
5 pages
Code Generation With LLMs
No ratings yet
Code Generation With LLMs
59 pages
Prompt Engineering Book
No ratings yet
Prompt Engineering Book
4 pages
Genai
No ratings yet
Genai
26 pages
14 Key Skills To Master Large Language Models 1729745509
No ratings yet
14 Key Skills To Master Large Language Models 1729745509
17 pages
1998 - 1000 - DOC - AI-Powered Code Generation
No ratings yet
1998 - 1000 - DOC - AI-Powered Code Generation
5 pages
Planet, Code - PYTHON For LARGE LANGUAGE MODELS - A Beginners Handbook For Leveraging Llms Into Modern Development Workflows and Applications (2025)
No ratings yet
Planet, Code - PYTHON For LARGE LANGUAGE MODELS - A Beginners Handbook For Leveraging Llms Into Modern Development Workflows and Applications (2025)
254 pages
Summer Course Material
No ratings yet
Summer Course Material
52 pages
Ai Engineer Roadmap-1
No ratings yet
Ai Engineer Roadmap-1
3 pages
ML Project List
No ratings yet
ML Project List
3 pages
Ship A I To Production
No ratings yet
Ship A I To Production
13 pages
Huyenchip Com 2023 04 11 LLM Engineering HTML
No ratings yet
Huyenchip Com 2023 04 11 LLM Engineering HTML
13 pages
Tacn VD 1 4
No ratings yet
Tacn VD 1 4
6 pages
Nidhish Resume NC
No ratings yet
Nidhish Resume NC
1 page
(Coursera) GenAI
No ratings yet
(Coursera) GenAI
27 pages
LLM and Gen AI
No ratings yet
LLM and Gen AI
4 pages
On Llms-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey
No ratings yet
On Llms-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey
18 pages
Mastering LLMs and Generative AI
No ratings yet
Mastering LLMs and Generative AI
12 pages
Mastering: LLM's AND
No ratings yet
Mastering: LLM's AND
11 pages
Fine Tuning Techniques For Large Language Models LLMs
No ratings yet
Fine Tuning Techniques For Large Language Models LLMs
15 pages
AI Learning Roadmap
No ratings yet
AI Learning Roadmap
6 pages
Agentic Design1
No ratings yet
Agentic Design1
15 pages
Neurons To GenerativeAI V2 Roadmap
No ratings yet
Neurons To GenerativeAI V2 Roadmap
14 pages
Python AI ML LLM TrainingJun142024
No ratings yet
Python AI ML LLM TrainingJun142024
192 pages
LLM From Scratch
No ratings yet
LLM From Scratch
27 pages
Basic AI & ML Concepts Explained - LinkedIn
No ratings yet
Basic AI & ML Concepts Explained - LinkedIn
10 pages
Datateam Reading Book Club 2025
No ratings yet
Datateam Reading Book Club 2025
9 pages
LLM Applications
100% (1)
LLM Applications
1 page
An AI Engineer's Guide To Machine Learning and Generative AI - by Ai Geek (Wishesh) - Medium
No ratings yet
An AI Engineer's Guide To Machine Learning and Generative AI - by Ai Geek (Wishesh) - Medium
67 pages
Projects
No ratings yet
Projects
10 pages

pFad - Phonifier reborn

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Alternative Proxies:

Synthetic Data

Uploaded by

Synthetic Data

Uploaded by

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.